From: Thomas Gleixner <tglx@linutronix•de>
To: Bert Karwatzki <spasswolf@web•de>,
Mateusz Guzik <mjguzik@gmail•com>,
Christian Brauner <brauner@kernel•org>
Cc: linux-kernel@vger•kernel.org, linux-next@vger•kernel.org,
linux-rt-devel@lists•linux.dev, linux-fsdevel@vger•kernel.org,
adobriyan@gmail•com, jack@suse•cz, viro@zeniv•linux.org.uk,
Sebastian Andrzej Siewior <bigeasy@linutronix•de>,
spasswolf@web•de, Alex Deucher <alexander.deucher@amd•com>,
amd-gfx@lists•freedesktop.org
Subject: Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
Date: Thu, 21 May 2026 12:17:15 +0200 [thread overview]
Message-ID: <878q9dvzh0.ffs@tglx> (raw)
In-Reply-To: <4f548d61b2dd12e01f401ce4b8c865f238f7b23c.camel@web.de>
On Thu, May 21 2026 at 11:20, Bert Karwatzki wrote:
> Am Donnerstag, dem 21.05.2026 um 11:09 +0200 schrieb Mateusz Guzik:
>
> with next-20260519 (no RT, no LOCKDEP) and got no crash so far (4 boots only though (next-20260619
> crashed in 2 out of 3 boots without RT)) but I get this warning on every boot:
>
> [ 2.793416] [ T331] ------------[ cut here ]------------
> [ 2.793433] [ T331] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> [ 2.793434] [ T331] WARNING: kernel/locking/mutex.c:625 at __mutex_lock+0x586/0x10c0, CPU#17: (udev-worker)/331
So either the mutex is corrupted or was never initialized.
> [ 2.793463] [ T331] Modules linked in: amdgpu(+) hid_generic usbhid drm_client_lib i2c_algo_bit drm_buddy hid drm_ttm_helper ttm drm_exec
> drm_suballoc_helper mfd_core drm_panel_backlight_quirks gpu_sched amdxcp drm_display_helper drm_kms_helper ahci libahci xhci_pci libata xhci_hcd drm nvme
> scsi_mod igc usbcore nvme_core scsi_common video nvme_keyring i2c_piix4 cec nvme_auth usb_common crc16 i2c_smbus wmi gpio_amdpt gpio_generic
> [ 2.793518] [ T331] CPU: 17 UID: 0 PID: 331 Comm: (udev-worker) Not tainted 7.1.0-rc4-next-20260519-rcunortlockdep-dirty #465 PREEMPT
> [ 2.793534] [ T331] Hardware name: ASUS System Product Name/ROG STRIX B850-F GAMING WIFI, BIOS 1627 02/05/2026
> [ 2.793547] [ T331] RIP: 0010:__mutex_lock+0x58d/0x10c0
> [ 2.793555] [ T331] Code: 4c 8b 4d 88 85 c0 0f 84 f8 fa ff ff 44 8b 15 ca 9b 81 00 45 85 d2 0f 85 e8 fa ff ff 48 8d 3d 1a 57 82 00 48 c7 c6 a6 51 9e 83
> <67> 48 0f b9 3a 4c 8b 4d 88 e9 cc fa ff ff 48 8b bd 78 ff ff ff e8
> [ 2.793579] [ T331] RSP: 0018:ffffa497016c3510 EFLAGS: 00010246
> [ 2.793588] [ T331] RAX: 0000000000000001 RBX: ffff88c33a4c2ad8 RCX: 0000000000000000
> [ 2.793598] [ T331] RDX: 0000000000000001 RSI: ffffffff839e51a6 RDI: ffffffff83de3c00
> [ 2.793609] [ T331] RBP: ffffa497016c35c0 R08: ffffffffc0a55d92 R09: 0000000000000000
> [ 2.793619] [ T331] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 2.793629] [ T331] R13: 0000000000000002 R14: ffffa497016c3550 R15: 0000000000268000
> [ 2.793641] [ T331] FS: 00007f1f32e5b9c0(0000) GS:ffff88d23b2ca000(0000) knlGS:0000000000000000
> [ 2.793653] [ T331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.793662] [ T331] CR2: 000055cdfa28f588 CR3: 0000000112e73000 CR4: 0000000000f50ef0
> [ 2.793673] [ T331] PKRU: 55555554
> [ 2.793678] [ T331] Call Trace:
> [ 2.793683] [ T331] <TASK>
> [ 2.793687] [ T331] ? lock_acquire+0xbe/0x2d0
> [ 2.793696] [ T331] ? init_mqd+0x122/0x190 [amdgpu]
> [ 2.793809] [ T331] ? lock_release+0xc6/0x2a0
> [ 2.793816] [ T331] ? init_mqd+0x122/0x190 [amdgpu]
> [ 2.793902] [ T331] init_mqd+0x122/0x190 [amdgpu]
> [ 2.793961] [ T331] init_mqd_hiq+0xd/0x20 [amdgpu]
> [ 2.794015] [ T331] kq_initialize.constprop.0+0x2b8/0x370 [amdgpu]
> [ 2.794071] [ T331] kernel_queue_init+0x3f/0x60 [amdgpu]
> [ 2.794125] [ T331] pm_init+0x6b/0x100 [amdgpu]
> [ 2.794178] [ T331] start_cpsch+0x1d6/0x270 [amdgpu]
> [ 2.794234] [ T331] kgd2kfd_device_init.cold+0x7b9/0xa1a [amdgpu]
> [ 2.794365] [ T331] amdgpu_amdkfd_device_init+0x190/0x260 [amdgpu]
amdgpu_amdkfd_device_init()
kgd2kfd_device_init() {
....
init_mqd()
mutex_lock(... profiler_lock); <- FAIL
mutex_init(...profiler_lock);
}
Seems the famous graphics CI failed to catch this...
Thanks,
tglx
---
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -744,6 +744,9 @@ bool kgd2kfd_device_init(struct kfd_dev
KGD_ENGINE_SDMA1);
kfd->shared_resources = *gpu_resources;
+ kfd->profiler_process = NULL;
+ mutex_init(&kfd->profiler_lock);
+
kfd->num_nodes = amdgpu_xcp_get_num_xcp(kfd->adev->xcp_mgr);
if (kfd->num_nodes == 0) {
@@ -936,9 +939,6 @@ bool kgd2kfd_device_init(struct kfd_dev
svm_range_set_max_pages(kfd->adev);
- kfd->profiler_process = NULL;
- mutex_init(&kfd->profiler_lock);
-
kfd->init_complete = true;
dev_info(kfd_device, "added device %x:%x\n", kfd->adev->pdev->vendor,
kfd->adev->pdev->device);
next prev parent reply other threads:[~2026-05-21 10:17 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 22:52 context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT Bert Karwatzki
2026-05-21 8:37 ` Thomas Gleixner
2026-05-21 8:53 ` Mateusz Guzik
2026-05-21 9:08 ` Sebastian Andrzej Siewior
2026-05-21 9:17 ` Mateusz Guzik
2026-05-21 9:09 ` Mateusz Guzik
2026-05-21 9:20 ` Bert Karwatzki
2026-05-21 9:25 ` Mateusz Guzik
2026-05-21 9:57 ` Bert Karwatzki
2026-05-21 10:17 ` Thomas Gleixner [this message]
2026-05-21 10:21 ` Bert Karwatzki
2026-05-21 10:33 ` Mateusz Guzik
2026-05-21 11:50 ` Bert Karwatzki
2026-05-21 12:01 ` Mateusz Guzik
2026-05-28 17:59 ` Bert Karwatzki
2026-05-29 17:20 ` Mateusz Guzik
2026-05-21 10:05 ` Thomas Gleixner
2026-05-21 10:13 ` Bert Karwatzki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878q9dvzh0.ffs@tglx \
--to=tglx@linutronix$(echo .)de \
--cc=adobriyan@gmail$(echo .)com \
--cc=alexander.deucher@amd$(echo .)com \
--cc=amd-gfx@lists$(echo .)freedesktop.org \
--cc=bigeasy@linutronix$(echo .)de \
--cc=brauner@kernel$(echo .)org \
--cc=jack@suse$(echo .)cz \
--cc=linux-fsdevel@vger$(echo .)kernel.org \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=linux-next@vger$(echo .)kernel.org \
--cc=linux-rt-devel@lists$(echo .)linux.dev \
--cc=mjguzik@gmail$(echo .)com \
--cc=spasswolf@web$(echo .)de \
--cc=viro@zeniv$(echo .)linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox