public inbox for linux-next@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd•com>
To: Bert Karwatzki <spasswolf@web•de>,
	"Mario Limonciello (AMD) (kernel.org)" <superm1@kernel•org>,
	linux-kernel@vger•kernel.org
Cc: linux-next@vger•kernel.org, regressions@lists•linux.dev,
	linux-pci@vger•kernel.org, linux-acpi@vger•kernel.org,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel•com>
Subject: Re: [REGRESSION 00/04] Crash during resume of pcie bridge
Date: Mon, 10 Nov 2025 14:33:07 +0100	[thread overview]
Message-ID: <0719d985-1c09-4039-84c1-8736a1ca5e2d@amd.com> (raw)
In-Reply-To: <ab51bd58919a31107caf8f8753804cb2dbfa791d.camel@web.de>

Hi Bert,

well sorry to say that but from your dumps it looks more and more like you just have faulty HW.

An SMU response of 0xFFFFFFFF means that the device has spontaneously fallen of the bus while trying to resume it.

My educated guess is that this is caused by a faulty power management, but basically it could be anything.

Regards,
Christian.

On 11/7/25 18:09, Bert Karwatzki wrote:
> Am Freitag, dem 07.11.2025 um 14:09 +0100 schrieb Bert Karwatzki:
>>
>> Testing:
>> v6.12			booted 13:00, 7.11.2025 no crash after 1h, 890 GPP0 events, 287 resumes
>>
>>
>> Bert Karwatzki
> 
> v6.12 crashed after 2h, 946 GPP0 events and 499 resumes. So there's no base
> for a bisection. 
> 
> But the crash from v6.14.11 gave this error in netconsole:
> 
> 2025-11-06T19:17:34.967439+01:00 T370;[drm] PCIE GART of 512M enabled (table at 0x00000081FEB00000).
> 2025-11-06T19:17:34.967439+01:00 T370;amdgpu 0000:03:00.0: amdgpu: PSP is resuming...#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:34.967588+01:00 T12;pci_bus 0000:03: Allocating resources#012 SUBSYSTEM=pci_bus#012 DEVICE=+pci_bus:0000:03
> 2025-11-06T19:17:35.143353+01:00 T370;amdgpu 0000:03:00.0: amdgpu: reserve 0xa00000 from 0x81fd000000 for PSP TMR#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:35.226021+01:00 T370;amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:35.237386+01:00 T370;amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available#012 SUBSYSTEM=pci#012
> DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:35.237386+01:00 T370;amdgpu 0000:03:00.0: amdgpu: SMU is resuming...#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:35.237386+01:00 T370;amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0,
> version = 0x003b3100 (59.49.0)#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:35.237386+01:00 T370;amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:35.509600+01:00 T370;amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:6 param:0x00000000 message:EnableAllSmuFeatures?#012
> SUBSYSTEM=pci#012 DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:35.509600+01:00 T370;amdgpu 0000:03:00.0: amdgpu: Failed to enable requested dpm features!#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:35.509600+01:00 T370;amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:35.509600+01:00 T370;amdgpu 0000:03:00.0: amdgpu: resume of IP block <smu> failed -121#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:35.509600+01:00 T370;amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-121).#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:03:00.0
> 2025-11-06T19:17:36.114889+01:00 C8;INFO: NMI handler (perf_event_nmi_handler) took too long to run: 35.314 msecs
> 2025-11-06T19:17:36.114889+01:00 C8;perf: interrupt took too long (275880 > 2500), lowering kernel.perf_event_max_sample_rate to 1000
> 2025-11-06T19:17:37.930799+01:00 C4;INFO: NMI handler (perf_event_nmi_handler) took too long to run: 152.914 msecs
> 2025-11-06T19:17:37.930799+01:00 C4;perf: interrupt took too long (1194640 > 344850), lowering kernel.perf_event_max_sample_rate to 1000
> 2025-11-06T19:17:38.939845+01:00 C14;INFO: NMI handler (perf_event_nmi_handler) took too long to run: 197.312 msecs
> 2025-11-06T19:17:38.939845+01:00 C14;perf: interrupt took too long (1541521 > 1493300), lowering kernel.perf_event_max_sample_rate to 1000
> 
> These 4 lines have not been recorded previously, so perhaps I have to look
> for a NULL pointer dereference in an error path:
> 
> 2025-11-06T19:17:42.571252+01:00 T1896;ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240827/evregion-301)
> 2025-11-06T19:17:42.571252+01:00 T1896;ACPI Error: Timeout from EC hardware or EC device driver (20240827/evregion-311)
> 2025-11-06T19:17:42.571252+01:00 T1896;ACPI Error: Aborting method \x5c_SB.PCI0.SBRG.EC.BAT1.UPBS due to previous error (AE_TIME) (20240827/psparse-529)
> 2025-11-06T19:17:42.571252+01:00 T1896;ACPI Error: Aborting method \x5c_SB.PCI0.SBRG.EC.BAT1._BST due to previous error (AE_TIME) (20240827/psparse-529) 
> 
> 
> Bert Karwatzki


  reply	other threads:[~2025-11-10 13:33 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-06 12:09 [REGRESSION 00/04] Crash during resume of pcie bridge Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 01/04] " Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 02/04] " Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 03/04] " Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 04/04] " Bert Karwatzki
2025-10-06 12:39 ` [REGRESSION 00/04] " Christian König
2025-10-06 16:22   ` Bert Karwatzki
2025-10-07  6:50     ` Bert Karwatzki
2025-10-07 21:33 ` Mario Limonciello
2025-10-13 16:29   ` Bert Karwatzki
2025-10-13 18:51     ` Mario Limonciello
2025-10-14 10:50       ` Christian König
     [not found]         ` <1853e2af7f70cf726df278137b6d2d89d9d9dc82.camel@web.de>
2025-10-31 13:38           ` Bert Karwatzki
2025-10-31 13:47             ` Bert Karwatzki
2025-10-31 18:35               ` Bert Karwatzki
2025-11-05 11:44                 ` Bert Karwatzki
2025-11-05 21:31                   ` Mario Limonciello (AMD) (kernel.org)
2025-11-07 13:09                     ` Bert Karwatzki
2025-11-07 17:09                       ` Bert Karwatzki
2025-11-10 13:33                         ` Christian König [this message]
2025-11-16 21:08                           ` Crash during resume of pcie bridge due to infinite loop in ACPICA Bert Karwatzki
2025-11-17 16:40                             ` Rafael J. Wysocki
2025-11-24 22:34                               ` Bert Karwatzki
2025-11-25 19:46                                 ` Rafael J. Wysocki
2025-11-27  0:08                                   ` Bert Karwatzki
2025-11-27 13:02                                     ` Rafael J. Wysocki
2025-11-28 20:47                                       ` Bert Karwatzki
2025-12-02 18:59                                         ` Rafael J. Wysocki
2025-12-02 19:53                                           ` Bert Karwatzki
2025-12-02 20:01                                             ` Rafael J. Wysocki
2025-12-05 10:05                                               ` Crash during resume of pcie bridge due to incorrect error handling Bert Karwatzki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0719d985-1c09-4039-84c1-8736a1ca5e2d@amd.com \
    --to=christian.koenig@amd$(echo .)com \
    --cc=linux-acpi@vger$(echo .)kernel.org \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linux-next@vger$(echo .)kernel.org \
    --cc=linux-pci@vger$(echo .)kernel.org \
    --cc=rafael.j.wysocki@intel$(echo .)com \
    --cc=regressions@lists$(echo .)linux.dev \
    --cc=spasswolf@web$(echo .)de \
    --cc=superm1@kernel$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox