* context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
@ 2026-05-20 22:52 Bert Karwatzki
2026-05-21 8:37 ` Thomas Gleixner
2026-05-21 8:53 ` Mateusz Guzik
0 siblings, 2 replies; 18+ messages in thread
From: Bert Karwatzki @ 2026-05-20 22:52 UTC (permalink / raw)
To: Christian Brauner
Cc: Bert Karwatzki, linux-kernel, linux-next, linux-rt-devel,
linux-fsdevel, mjguzik, adobriyan, jack, viro,
Sebastian Andrzej Siewior, Thomas Gleixner
Since version next-20260518 (with PREEMPT_RT) I noticed that my debian stable/trixie system
would sometimes hang when booting displaying the following error message. After about ~1min
booting continues to a rescue shell where I could save the dmesg output (The output shown
here is not from next-20260519 but from a step in the bisection).
[ 2.900440] [ T709] ------------[ cut here ]------------
[ 2.900441] [ T709] Voluntary context switch within RCU read-side critical section!
[ 2.900441] [ T709] WARNING: kernel/rcu/tree_plugin.h:332 at rcu_note_context_switch+0x2ac/0x460, CPU#4: systemd-fstab-g/709
[ 2.900447] [ T709] Modules linked in: efivarfs autofs4 ext4 mbcache jbd2 hid_generic usbhid hid amdgpu drm_client_lib i2c_algo_bit drm_buddy drm_ttm_helper ttm drm_exec drm_suballoc_helper mfd_core drm_panel_backlight_quirks gpu_sched xhci_pci amdxcp drm_display_helper xhci_hcd drm_kms_helper ahci libahci drm libata usbcore nvme scsi_mod nvme_core igc video i2c_piix4 cec nvme_keyring i2c_smbus usb_common scsi_common crc16 nvme_auth wmi gpio_amdpt gpio_generic
[ 2.900456] [ T709] CPU: 4 UID: 0 PID: 709 Comm: systemd-fstab-g Not tainted 7.1.0-rc4-bisect-02057-g134bedf6b3e5 #452 PREEMPT_RT
[ 2.900457] [ T709] Hardware name: ASUS System Product Name/ROG STRIX B850-F GAMING WIFI, BIOS 1627 02/05/2026
[ 2.900458] [ T709] RIP: 0010:rcu_note_context_switch+0x2ac/0x460
[ 2.900459] [ T709] Code: ef e8 58 56 87 00 48 8b 55 28 b9 01 00 00 00 4c 89 ef c6 45 11 00 48 89 c6 e8 e0 99 ff ff e9 cd fd ff ff 48 8d 3d 84 8a de 00 <67> 48 0f b9 3a e9 89 fd ff ff a9 a0 20 00 00 0f 85 df 00 00 00 f6
[ 2.900460] [ T709] RSP: 0018:ffffb538c1e3fb98 EFLAGS: 00010002
[ 2.900461] [ T709] RAX: 0000000000000001 RBX: ffff9bd494a0db00 RCX: 0000000000000000
[ 2.900462] [ T709] RDX: 0000000000000000 RSI: ffffffffb27ad182 RDI: ffffffffb2d29400
[ 2.900462] [ T709] RBP: ffff9be33f326b00 R08: ffffeba64492bec0 R09: ffff9bd491ed1100
[ 2.900462] [ T709] R10: 0000000000000001 R11: ffffeba64492bec0 R12: 0000000000000000
[ 2.900462] [ T709] R13: 0000000000000000 R14: ffff9bd494a0db00 R15: ffffb538c1e3fcc0
[ 2.900463] [ T709] FS: 00007f0367aaf9c0(0000) GS:ffff9be38c475000(0000) knlGS:0000000000000000
[ 2.900464] [ T709] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.900464] [ T709] CR2: 00007f082baae1d4 CR3: 0000000111ff2000 CR4: 0000000000f50ef0
[ 2.900464] [ T709] PKRU: 55555554
[ 2.900465] [ T709] Call Trace:
[ 2.900466] [ T709] <TASK>
[ 2.900466] [ T709] ? __schedule+0x78/0xe50
[ 2.900469] [ T709] ? blk_finish_plug+0x23/0x40
[ 2.900472] [ T709] ? read_pages+0x17f/0x210
[ 2.900474] [ T709] ? schedule+0x22/0xd0
[ 2.900475] [ T709] ? io_schedule+0x41/0x60
[ 2.900476] [ T709] ? folio_wait_bit_common+0x10d/0x2f0
[ 2.900477] [ T709] ? filemap_invalidate_unlock_two+0x40/0x40
[ 2.900478] [ T709] ? filemap_fault+0x7a1/0xfc0
[ 2.900479] [ T709] ? __do_fault+0x30/0x90
[ 2.900480] [ T709] ? do_fault+0x3a9/0x5a0
[ 2.900481] [ T709] ? __handle_mm_fault+0x2c6/0x3a0
[ 2.900482] [ T709] ? handle_mm_fault+0xdc/0x2c0
[ 2.900483] [ T709] ? do_user_addr_fault+0x1e2/0x5f0
[ 2.900485] [ T709] ? exc_page_fault+0x49/0x70
[ 2.900486] [ T709] ? asm_exc_page_fault+0x26/0x30
[ 2.900487] [ T709] </TASK>
[ 2.900487] [ T709] ---[ end trace 0000000000000000 ]---
I bisected the error from v7.1-rc4 to next-20260519, declaring a
commit as GOOD when it survives 12 boots without displaying this error:
7.1.0-rc4-bisect-03214-ge373a2ca9f8f error on 5th boot BAD
7.1.0-rc4-bisect-01608-gbd4178634082 12 boots without error GOOD
7.1.0-rc4-bisect-02426-g6491cb7030d4 error on 2nd boot BAD
7.1.0-rc4-bisect-02025-gd97e5246790d 12 boots without error GOOD
7.1.0-rc1-bisect-00173-gd97d13c24d78 12 boots without error GOOD
7.1.0-rc4-bisect-02116-g44f7bcae0b3d error on 6th boot BAD
7.1.0-rc4-bisect-00045-g0a8e31d303d5 12 boots without error GOOD
7.0.0-bisect-10592-g2acc3b265f94 12 boots without error GOOD
7.1.0-rc4-bisect-02057-g134bedf6b3e5 error on 1st boot BAD
7.0.0-bisect-10570-g99bde1dfe878 12 boots without error GOOD
7.0.0-bisect-10616-g5d1aae9252b4 12 boots without error GOOD
7.1.0-rc4-bisect-00003-g24214ad405d1 12 boots without error GOOD
7.1.0-rc4-bisect-02055-g95185f3a36ec error on 4th boot BAD
This first bisect gives this bogus result:
commit 95185f3a36ec ("Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git")
which shows there's a false negative in the bisection above.
So I retested some of the commits from above:
7.1.0-rc4-bisect-02025-gd97e5246790d error on 1st boot (13 boots in total) BAD
7.1.0-rc4-bisect-01608-gbd4178634082 12 boots without error (25 in total) GOOD
and started a bisection from bd4178634082 to d97e5246790d (with increased
number of boots required for a good commit)
7.1.0-rc4-bisect-00204-g43467cbc2260 18 boots without error GOOD
7.1.0-rc4-bisect-00333-geda8cb3fb0cb error on 3rd boot BAD
With the good and bad commits this close I took a look at
git log --oneline 43467cbc2260..eda8cb3fb0cb
and found exactly one RCU related commit:
dc651e25a6d2 ("fs: RCU-ify filesystems list")
So I reverted the this commit in next-20260519 (to get a clean revert I needed to
revert commit
36b3306779ea ("fs: cache the string generated by reading /proc/filesystems") first.
$ git log --oneline
c7321982a5d0 (HEAD -> rcu_critical_readside_bug) Revert "fs: RCU-ify filesystems list"
16ff8d6e7c28 Revert "fs: cache the string generated by reading /proc/filesystems"
6a50ba100ace (tag: next-20260519, origin/master, origin/HEAD, master) Add linux-next specific files for 20260519
With these reverts next-20260519 boots 30 times in a row without error, so
it appears that commit dc651e25a6d2 ("fs: RCU-ify filesystems list") causing the
error.
To see if this issue is PREEMPT_RT only I also tested next-20260519 *without* PREEMPT_RT
and got a different bug at my first boot (the second boot worked, the third failed again)
In the non-RT case there's no rescue shell so this error message is copied from a (bad) photo:
[ 2.823291][ T510] BUG: scheduling while atomic: sytemd-hiberna/510
[ 2.824837][ T504] /usr/lib/systemd/system-generators/systemd-hibernate-resume-generator terminated by signal SEGV
BUG: scheduling while atomic: sytemd-hiberna/510
Call Trace:
dump_stack_lvl
__schedule_bug.cold
[...]
asm_exc_page_fault
Code: unable to access opcode bytes at 0x7f243337e216
To see if the non-RT error is caused by the same commit as the RT error I tested
next-20260519 with the reverts and *without* PREEMPT_RT. With the reverts there was
no error in 20 boots. So the problem in the non-RT and RT case seem to be caused by
the same commits.
Hardware used:
$ cat /proc/cpuinfo
processor : 31
vendor_id : AuthenticAMD
cpu family : 26
model : 68
model name : AMD Ryzen 9 9950X 16-Core Processor
stepping : 0
microcode : 0xb404035
cpu MHz : 624.194
cache size : 1024 KB
physical id : 0
siblings : 32
core id : 15
cpu cores : 16
apicid : 31
initial apicid : 31
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx_vnni avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid bus_lock_detect movdiri movdir64b overflow_recov succor smca fsrm avx512_vp2intersect flush_l1d amd_lbr_pmc_freeze
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso spectre_v2_user vmscape
bogomips : 8599.98
TLB size : 192 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
$ lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Root Complex [1022:14d8]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge IOMMU [1022:14d9]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge [1022:14da]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge [1022:14db]
00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge [1022:14db]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge [1022:14da]
00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge [1022:14db]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge [1022:14da]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge [1022:14da]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge [1022:14da]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Internal GPP Bridge to Bus [C:A] [1022:14dd]
00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Internal GPP Bridge to Bus [C:A] [1022:14dd]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 71)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 0 [1022:14e0]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 1 [1022:14e1]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 2 [1022:14e2]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 3 [1022:14e3]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 4 [1022:14e4]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 5 [1022:14e5]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 6 [1022:14e6]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 7 [1022:14e7]
01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 25)
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 25)
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 44 [RX 9060 XT] [1002:7590] (rev c0)
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 48 HDMI/DP Audio Controller [1002:ab40]
04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD 9100 PRO [PM9E1] [144d:a810]
05:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Upstream Port [1022:43f4] (rev 01)
06:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port [1022:43f5] (rev 01)
06:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port [1022:43f5] (rev 01)
06:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port [1022:43f5] (rev 01)
06:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port [1022:43f5] (rev 01)
06:0c.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port [1022:43f5] (rev 01)
06:0d.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port [1022:43f5] (rev 01)
08:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I226-V [8086:125c] (rev 06)
09:00.0 Network controller [0280]: MEDIATEK Corp. MT7925 802.11be 160MHz 2x2 PCIe Wireless Network Adapter [Filogic 360] [14c3:7925]
0b:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 800 Series Chipset USB 3.x XHCI Controller [1022:43fc] (rev 01)
0c:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset SATA Controller [1022:43f6] (rev 01)
0d:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Granite Ridge [Radeon Graphics] [1002:13c0] (rev c1)
0d:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Radeon High Definition Audio Controller [Rembrandt/Strix] [1002:1640]
0d:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 19h PSP/CCP [1022:1649]
0d:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI [1022:15b6]
0d:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI [1022:15b7]
0e:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8]
Bert Karwatzki
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-20 22:52 context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT Bert Karwatzki
@ 2026-05-21 8:37 ` Thomas Gleixner
2026-05-21 8:53 ` Mateusz Guzik
1 sibling, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2026-05-21 8:37 UTC (permalink / raw)
To: Bert Karwatzki, Christian Brauner
Cc: Bert Karwatzki, linux-kernel, linux-next, linux-rt-devel,
linux-fsdevel, mjguzik, adobriyan, jack, viro,
Sebastian Andrzej Siewior
Bert!
On Thu, May 21 2026 at 00:52, Bert Karwatzki wrote:
> Since version next-20260518 (with PREEMPT_RT) I noticed that my debian stable/trixie system
> would sometimes hang when booting displaying the following error message. After about ~1min
> booting continues to a rescue shell where I could save the dmesg output (The output shown
> here is not from next-20260519 but from a step in the bisection).
>
> [ 2.900440] [ T709] ------------[ cut here ]------------
> [ 2.900441] [ T709] Voluntary context switch within RCU read-side critical section!
> [ 2.900441] [ T709] WARNING: kernel/rcu/tree_plugin.h:332 at rcu_note_context_switch+0x2ac/0x460, CPU#4: systemd-fstab-g/709
> [ 2.900447] [ T709] Modules linked in: efivarfs autofs4 ext4 mbcache jbd2 hid_generic usbhid hid amdgpu drm_client_lib i2c_algo_bit drm_buddy drm_ttm_helper ttm drm_exec drm_suballoc_helper mfd_core drm_panel_backlight_quirks gpu_sched xhci_pci amdxcp drm_display_helper xhci_hcd drm_kms_helper ahci libahci drm libata usbcore nvme scsi_mod nvme_core igc video i2c_piix4 cec nvme_keyring i2c_smbus usb_common scsi_common crc16 nvme_auth wmi gpio_amdpt gpio_generic
> [ 2.900456] [ T709] CPU: 4 UID: 0 PID: 709 Comm: systemd-fstab-g Not tainted 7.1.0-rc4-bisect-02057-g134bedf6b3e5 #452 PREEMPT_RT
> [ 2.900457] [ T709] Hardware name: ASUS System Product Name/ROG STRIX B850-F GAMING WIFI, BIOS 1627 02/05/2026
> [ 2.900458] [ T709] RIP: 0010:rcu_note_context_switch+0x2ac/0x460
> [ 2.900459] [ T709] Code: ef e8 58 56 87 00 48 8b 55 28 b9 01 00 00 00 4c 89 ef c6 45 11 00 48 89 c6 e8 e0 99 ff ff e9 cd fd ff ff 48 8d 3d 84 8a de 00 <67> 48 0f b9 3a e9 89 fd ff ff a9 a0 20 00 00 0f 85 df 00 00 00 f6
> [ 2.900460] [ T709] RSP: 0018:ffffb538c1e3fb98 EFLAGS: 00010002
> [ 2.900461] [ T709] RAX: 0000000000000001 RBX: ffff9bd494a0db00 RCX: 0000000000000000
> [ 2.900462] [ T709] RDX: 0000000000000000 RSI: ffffffffb27ad182 RDI: ffffffffb2d29400
> [ 2.900462] [ T709] RBP: ffff9be33f326b00 R08: ffffeba64492bec0 R09: ffff9bd491ed1100
> [ 2.900462] [ T709] R10: 0000000000000001 R11: ffffeba64492bec0 R12: 0000000000000000
> [ 2.900462] [ T709] R13: 0000000000000000 R14: ffff9bd494a0db00 R15: ffffb538c1e3fcc0
> [ 2.900463] [ T709] FS: 00007f0367aaf9c0(0000) GS:ffff9be38c475000(0000) knlGS:0000000000000000
> [ 2.900464] [ T709] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.900464] [ T709] CR2: 00007f082baae1d4 CR3: 0000000111ff2000 CR4: 0000000000f50ef0
> [ 2.900464] [ T709] PKRU: 55555554
> [ 2.900465] [ T709] Call Trace:
> [ 2.900466] [ T709] <TASK>
> [ 2.900466] [ T709] ? __schedule+0x78/0xe50
> [ 2.900469] [ T709] ? blk_finish_plug+0x23/0x40
> [ 2.900472] [ T709] ? read_pages+0x17f/0x210
> [ 2.900474] [ T709] ? schedule+0x22/0xd0
> [ 2.900475] [ T709] ? io_schedule+0x41/0x60
> [ 2.900476] [ T709] ? folio_wait_bit_common+0x10d/0x2f0
> [ 2.900477] [ T709] ? filemap_invalidate_unlock_two+0x40/0x40
> [ 2.900478] [ T709] ? filemap_fault+0x7a1/0xfc0
> [ 2.900479] [ T709] ? __do_fault+0x30/0x90
> [ 2.900480] [ T709] ? do_fault+0x3a9/0x5a0
> [ 2.900481] [ T709] ? __handle_mm_fault+0x2c6/0x3a0
> [ 2.900482] [ T709] ? handle_mm_fault+0xdc/0x2c0
> [ 2.900483] [ T709] ? do_user_addr_fault+0x1e2/0x5f0
> [ 2.900485] [ T709] ? exc_page_fault+0x49/0x70
> [ 2.900486] [ T709] ? asm_exc_page_fault+0x26/0x30
> [ 2.900487] [ T709] </TASK>
That's a user page fault, which means something (syscall or interrupt)
exited to user space with RCU read side held.
> With the good and bad commits this close I took a look at
> git log --oneline 43467cbc2260..eda8cb3fb0cb
> and found exactly one RCU related commit:
> dc651e25a6d2 ("fs: RCU-ify filesystems list")
>
> So I reverted the this commit in next-20260519 (to get a clean revert I needed to
> revert commit
> 36b3306779ea ("fs: cache the string generated by reading /proc/filesystems") first.
>
> $ git log --oneline
> c7321982a5d0 (HEAD -> rcu_critical_readside_bug) Revert "fs: RCU-ify filesystems list"
> 16ff8d6e7c28 Revert "fs: cache the string generated by reading /proc/filesystems"
> 6a50ba100ace (tag: next-20260519, origin/master, origin/HEAD, master) Add linux-next specific files for 20260519
>
> With these reverts next-20260519 boots 30 times in a row without error, so
> it appears that commit dc651e25a6d2 ("fs: RCU-ify filesystems list") causing the
> error.
>
> To see if this issue is PREEMPT_RT only I also tested next-20260519 *without* PREEMPT_RT
> and got a different bug at my first boot (the second boot worked, the third failed again)
>
> In the non-RT case there's no rescue shell so this error message is copied from a (bad) photo:
>
> [ 2.823291][ T510] BUG: scheduling while atomic: sytemd-hiberna/510
> [ 2.824837][ T504] /usr/lib/systemd/system-generators/systemd-hibernate-resume-generator terminated by signal SEGV
> BUG: scheduling while atomic: sytemd-hiberna/510
> Call Trace:
> dump_stack_lvl
> __schedule_bug.cold
> [...]
> asm_exc_page_fault
> Code: unable to access opcode bytes at 0x7f243337e216
>
> To see if the non-RT error is caused by the same commit as the RT error I tested
> next-20260519 with the reverts and *without* PREEMPT_RT. With the reverts there was
> no error in 20 boots. So the problem in the non-RT and RT case seem to be caused by
> the same commits.
Which is not surprising, though on a quick inspection of the commit in
question I can't see where it would leak the RCU read side.
Can you please enable lockdep? That should tell us what exits to user
space with RCU held and also where the RCU read side was acquired.
Btw, which compiler are you using?
Thanks,
tglx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-20 22:52 context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT Bert Karwatzki
2026-05-21 8:37 ` Thomas Gleixner
@ 2026-05-21 8:53 ` Mateusz Guzik
2026-05-21 9:08 ` Sebastian Andrzej Siewior
` (2 more replies)
1 sibling, 3 replies; 18+ messages in thread
From: Mateusz Guzik @ 2026-05-21 8:53 UTC (permalink / raw)
To: Bert Karwatzki, Christian Brauner
Cc: linux-kernel, linux-next, linux-rt-devel, linux-fsdevel,
adobriyan, jack, viro, Sebastian Andrzej Siewior, Thomas Gleixner
On Thu, May 21, 2026 at 12:52:44AM +0200, Bert Karwatzki wrote:
> Since version next-20260518 (with PREEMPT_RT) I noticed that my debian stable/trixie system
> would sometimes hang when booting displaying the following error message. After about ~1min
> booting continues to a rescue shell where I could save the dmesg output (The output shown
> here is not from next-20260519 but from a step in the bisection).
>
[..]
> So I reverted the this commit in next-20260519 (to get a clean revert I needed to
> revert commit
> 36b3306779ea ("fs: cache the string generated by reading /proc/filesystems") first.
>
> $ git log --oneline
> c7321982a5d0 (HEAD -> rcu_critical_readside_bug) Revert "fs: RCU-ify filesystems list"
> 16ff8d6e7c28 Revert "fs: cache the string generated by reading /proc/filesystems"
> 6a50ba100ace (tag: next-20260519, origin/master, origin/HEAD, master) Add linux-next specific files for 20260519
>
> With these reverts next-20260519 boots 30 times in a row without error, so
> it appears that commit dc651e25a6d2 ("fs: RCU-ify filesystems list") causing the
> error.
>
I think the patch below will do the trick.
If someone wonders how come the missing unlocks: the original patch had
them in place, but when I was rebasing on top of the RCU-ifing commit I
figured I'm going to do guard/scoped_guard in there as well. Later it
started failing as the compiler did not like goto retry out of a scoped
guard area and the unlocks did not come back.
tl;dr there is definitely my bug here and it is most likely *the* bug
Christian, can you fold this in please.
diff --git a/fs/filesystems.c b/fs/filesystems.c
index 771fc31a69b8..8f17c0abbc95 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
* Did someone beat us to it?
*/
if (old && old->gen == file_systems_gen) {
+ spin_unlock(&file_systems_lock);
kfree(new);
return 0;
}
@@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
* Did the list change in the meantime?
*/
if (gen != file_systems_gen) {
+ spin_unlock(&file_systems_lock);
kfree(new);
goto retry;
}
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 8:53 ` Mateusz Guzik
@ 2026-05-21 9:08 ` Sebastian Andrzej Siewior
2026-05-21 9:17 ` Mateusz Guzik
2026-05-21 9:09 ` Mateusz Guzik
2026-05-21 10:05 ` Thomas Gleixner
2 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-05-21 9:08 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Bert Karwatzki, Christian Brauner, linux-kernel, linux-next,
linux-rt-devel, linux-fsdevel, adobriyan, jack, viro,
Thomas Gleixner
On 2026-05-21 10:53:03 [+0200], Mateusz Guzik wrote:
> them in place, but when I was rebasing on top of the RCU-ifing commit I
> figured I'm going to do guard/scoped_guard in there as well. Later it
> started failing as the compiler did not like goto retry out of a scoped
> guard area and the unlocks did not come back.
futex_hash_allocate() has a scoped_guard a goto to again. In case it
helps.
Sebastian
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 8:53 ` Mateusz Guzik
2026-05-21 9:08 ` Sebastian Andrzej Siewior
@ 2026-05-21 9:09 ` Mateusz Guzik
2026-05-21 9:20 ` Bert Karwatzki
2026-05-21 10:05 ` Thomas Gleixner
2 siblings, 1 reply; 18+ messages in thread
From: Mateusz Guzik @ 2026-05-21 9:09 UTC (permalink / raw)
To: Bert Karwatzki, Christian Brauner
Cc: linux-kernel, linux-next, linux-rt-devel, linux-fsdevel,
adobriyan, jack, viro, Sebastian Andrzej Siewior, Thomas Gleixner
On Thu, May 21, 2026 at 10:53:03AM +0200, Mateusz Guzik wrote:
> Christian, can you fold this in please.
>
> diff --git a/fs/filesystems.c b/fs/filesystems.c
> index 771fc31a69b8..8f17c0abbc95 100644
> --- a/fs/filesystems.c
> +++ b/fs/filesystems.c
> @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> * Did someone beat us to it?
> */
> if (old && old->gen == file_systems_gen) {
> + spin_unlock(&file_systems_lock);
> kfree(new);
> return 0;
> }
> @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> * Did the list change in the meantime?
> */
> if (gen != file_systems_gen) {
> + spin_unlock(&file_systems_lock);
> kfree(new);
> goto retry;
> }
>
>
Even better, I got the above fixup + some polish listed below:
- removed an extra space in newlen calculation
- the WARN_ON_ONCE case needs to free 'new', not 'old'
- there is no READ_ONCE anymore in filesystems_proc_show()
goes into the "fs: cache the string generated by reading /proc/filesystems"
commit.
diff --git a/fs/filesystems.c b/fs/filesystems.c
index 771fc31a69b8..712316a1e3e0 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -269,7 +269,7 @@ static __cold noinline int regen_filesystems_string(void)
hlist_for_each_entry_rcu(p, &file_systems, list) {
if (!(p->fs_flags & FS_REQUIRES_DEV))
newlen += strlen("nodev");
- newlen += strlen("\t") + strlen(p->name) + strlen("\n");
+ newlen += strlen("\t") + strlen(p->name) + strlen("\n");
}
spin_unlock(&file_systems_lock);
@@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
* Did someone beat us to it?
*/
if (old && old->gen == file_systems_gen) {
+ spin_unlock(&file_systems_lock);
kfree(new);
return 0;
}
@@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
* Did the list change in the meantime?
*/
if (gen != file_systems_gen) {
+ spin_unlock(&file_systems_lock);
kfree(new);
goto retry;
}
@@ -321,13 +323,12 @@ static __cold noinline int regen_filesystems_string(void)
* generation above and messes it up.
*/
spin_unlock(&file_systems_lock);
- if (old)
- kfree_rcu(old, rcu);
+ kfree(new);
return -EINVAL;
}
/*
- * Paired with consume fence in READ_ONCE() in filesystems_proc_show()
+ * Paired with consume fence in rcu_dereference() in filesystems_proc_show()
*/
smp_store_release(&file_systems_string, new);
spin_unlock(&file_systems_lock);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 9:08 ` Sebastian Andrzej Siewior
@ 2026-05-21 9:17 ` Mateusz Guzik
0 siblings, 0 replies; 18+ messages in thread
From: Mateusz Guzik @ 2026-05-21 9:17 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Bert Karwatzki, Christian Brauner, linux-kernel, linux-next,
linux-rt-devel, linux-fsdevel, adobriyan, jack, viro,
Thomas Gleixner
On Thu, May 21, 2026 at 11:08 AM Sebastian Andrzej Siewior
<bigeasy@linutronix•de> wrote:
>
> On 2026-05-21 10:53:03 [+0200], Mateusz Guzik wrote:
> > them in place, but when I was rebasing on top of the RCU-ifing commit I
> > figured I'm going to do guard/scoped_guard in there as well. Later it
> > started failing as the compiler did not like goto retry out of a scoped
> > guard area and the unlocks did not come back.
>
> futex_hash_allocate() has a scoped_guard a goto to again. In case it
> helps.
>
Huh, now I slapped the following for testing purposes and it compiles:
diff --git a/fs/filesystems.c b/fs/filesystems.c
index 771fc31a69b8..d21d264672c5 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -282,7 +282,7 @@ static __cold noinline int regen_filesystems_string(void)
new->len = newlen;
new->string[newlen] = '\0';
- spin_lock(&file_systems_lock);
+ scoped_guard(spinlock, &file_systems_lock) {
old = file_systems_string;
/*
@@ -320,7 +320,6 @@ static __cold noinline int regen_filesystems_string(void)
* Should never happen of course, keep this in case
someone changes string
* generation above and messes it up.
*/
- spin_unlock(&file_systems_lock);
if (old)
kfree_rcu(old, rcu);
return -EINVAL;
@@ -330,7 +329,7 @@ static __cold noinline int regen_filesystems_string(void)
* Paired with consume fence in READ_ONCE() in filesystems_proc_show()
*/
smp_store_release(&file_systems_string, new);
- spin_unlock(&file_systems_lock);
+ }
if (old)
kfree_rcu(old, rcu);
return 0;
curious, but ultimately does not matter. I think the current code is a
little better without the guard stuff in this particular place due to
kfree calls.
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 9:09 ` Mateusz Guzik
@ 2026-05-21 9:20 ` Bert Karwatzki
2026-05-21 9:25 ` Mateusz Guzik
2026-05-21 10:17 ` Thomas Gleixner
0 siblings, 2 replies; 18+ messages in thread
From: Bert Karwatzki @ 2026-05-21 9:20 UTC (permalink / raw)
To: Mateusz Guzik, Christian Brauner
Cc: linux-kernel, linux-next, linux-rt-devel, linux-fsdevel,
adobriyan, jack, viro, Sebastian Andrzej Siewior, Thomas Gleixner,
spasswolf
Am Donnerstag, dem 21.05.2026 um 11:09 +0200 schrieb Mateusz Guzik:
> On Thu, May 21, 2026 at 10:53:03AM +0200, Mateusz Guzik wrote:
> > Christian, can you fold this in please.
> >
> > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > index 771fc31a69b8..8f17c0abbc95 100644
> > --- a/fs/filesystems.c
> > +++ b/fs/filesystems.c
> > @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> > * Did someone beat us to it?
> > */
> > if (old && old->gen == file_systems_gen) {
> > + spin_unlock(&file_systems_lock);
> > kfree(new);
> > return 0;
> > }
> > @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> > * Did the list change in the meantime?
> > */
> > if (gen != file_systems_gen) {
> > + spin_unlock(&file_systems_lock);
> > kfree(new);
> > goto retry;
> > }
> >
> >
>
> Even better, I got the above fixup + some polish listed below:
> - removed an extra space in newlen calculation
> - the WARN_ON_ONCE case needs to free 'new', not 'old'
> - there is no READ_ONCE anymore in filesystems_proc_show()
>
> goes into the "fs: cache the string generated by reading /proc/filesystems"
> commit.
>
> diff --git a/fs/filesystems.c b/fs/filesystems.c
> index 771fc31a69b8..712316a1e3e0 100644
> --- a/fs/filesystems.c
> +++ b/fs/filesystems.c
> @@ -269,7 +269,7 @@ static __cold noinline int regen_filesystems_string(void)
> hlist_for_each_entry_rcu(p, &file_systems, list) {
> if (!(p->fs_flags & FS_REQUIRES_DEV))
> newlen += strlen("nodev");
> - newlen += strlen("\t") + strlen(p->name) + strlen("\n");
> + newlen += strlen("\t") + strlen(p->name) + strlen("\n");
> }
> spin_unlock(&file_systems_lock);
>
> @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> * Did someone beat us to it?
> */
> if (old && old->gen == file_systems_gen) {
> + spin_unlock(&file_systems_lock);
> kfree(new);
> return 0;
> }
> @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> * Did the list change in the meantime?
> */
> if (gen != file_systems_gen) {
> + spin_unlock(&file_systems_lock);
> kfree(new);
> goto retry;
> }
> @@ -321,13 +323,12 @@ static __cold noinline int regen_filesystems_string(void)
> * generation above and messes it up.
> */
> spin_unlock(&file_systems_lock);
> - if (old)
> - kfree_rcu(old, rcu);
> + kfree(new);
> return -EINVAL;
> }
>
> /*
> - * Paired with consume fence in READ_ONCE() in filesystems_proc_show()
> + * Paired with consume fence in rcu_dereference() in filesystems_proc_show()
> */
> smp_store_release(&file_systems_string, new);
> spin_unlock(&file_systems_lock);
>
So it was commit 36b3306779ea
("fs: cache the string generated by reading /proc/filesystems")
which caused the problem. If I had finished the bisection properly instead
of cutting I probably would have noticed this...
So I tested
diff --git a/fs/filesystems.c b/fs/filesystems.c
index 771fc31a69b8..8f17c0abbc95 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
* Did someone beat us to it?
*/
if (old && old->gen == file_systems_gen) {
+ spin_unlock(&file_systems_lock);
kfree(new);
return 0;
}
@@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
* Did the list change in the meantime?
*/
if (gen != file_systems_gen) {
+ spin_unlock(&file_systems_lock);
kfree(new);
goto retry;
}
with next-20260519 (no RT, no LOCKDEP) and got no crash so far (4 boots only though (next-20260619
crashed in 2 out of 3 boots without RT)) but I get this warning on every boot:
[ 2.793416] [ T331] ------------[ cut here ]------------
[ 2.793433] [ T331] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ 2.793434] [ T331] WARNING: kernel/locking/mutex.c:625 at __mutex_lock+0x586/0x10c0, CPU#17: (udev-worker)/331
[ 2.793463] [ T331] Modules linked in: amdgpu(+) hid_generic usbhid drm_client_lib i2c_algo_bit drm_buddy hid drm_ttm_helper ttm drm_exec
drm_suballoc_helper mfd_core drm_panel_backlight_quirks gpu_sched amdxcp drm_display_helper drm_kms_helper ahci libahci xhci_pci libata xhci_hcd drm nvme
scsi_mod igc usbcore nvme_core scsi_common video nvme_keyring i2c_piix4 cec nvme_auth usb_common crc16 i2c_smbus wmi gpio_amdpt gpio_generic
[ 2.793518] [ T331] CPU: 17 UID: 0 PID: 331 Comm: (udev-worker) Not tainted 7.1.0-rc4-next-20260519-rcunortlockdep-dirty #465 PREEMPT
[ 2.793534] [ T331] Hardware name: ASUS System Product Name/ROG STRIX B850-F GAMING WIFI, BIOS 1627 02/05/2026
[ 2.793547] [ T331] RIP: 0010:__mutex_lock+0x58d/0x10c0
[ 2.793555] [ T331] Code: 4c 8b 4d 88 85 c0 0f 84 f8 fa ff ff 44 8b 15 ca 9b 81 00 45 85 d2 0f 85 e8 fa ff ff 48 8d 3d 1a 57 82 00 48 c7 c6 a6 51 9e 83
<67> 48 0f b9 3a 4c 8b 4d 88 e9 cc fa ff ff 48 8b bd 78 ff ff ff e8
[ 2.793579] [ T331] RSP: 0018:ffffa497016c3510 EFLAGS: 00010246
[ 2.793588] [ T331] RAX: 0000000000000001 RBX: ffff88c33a4c2ad8 RCX: 0000000000000000
[ 2.793598] [ T331] RDX: 0000000000000001 RSI: ffffffff839e51a6 RDI: ffffffff83de3c00
[ 2.793609] [ T331] RBP: ffffa497016c35c0 R08: ffffffffc0a55d92 R09: 0000000000000000
[ 2.793619] [ T331] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 2.793629] [ T331] R13: 0000000000000002 R14: ffffa497016c3550 R15: 0000000000268000
[ 2.793641] [ T331] FS: 00007f1f32e5b9c0(0000) GS:ffff88d23b2ca000(0000) knlGS:0000000000000000
[ 2.793653] [ T331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.793662] [ T331] CR2: 000055cdfa28f588 CR3: 0000000112e73000 CR4: 0000000000f50ef0
[ 2.793673] [ T331] PKRU: 55555554
[ 2.793678] [ T331] Call Trace:
[ 2.793683] [ T331] <TASK>
[ 2.793687] [ T331] ? lock_acquire+0xbe/0x2d0
[ 2.793696] [ T331] ? init_mqd+0x122/0x190 [amdgpu]
[ 2.793809] [ T331] ? lock_release+0xc6/0x2a0
[ 2.793816] [ T331] ? init_mqd+0x122/0x190 [amdgpu]
[ 2.793902] [ T331] init_mqd+0x122/0x190 [amdgpu]
[ 2.793961] [ T331] init_mqd_hiq+0xd/0x20 [amdgpu]
[ 2.794015] [ T331] kq_initialize.constprop.0+0x2b8/0x370 [amdgpu]
[ 2.794071] [ T331] kernel_queue_init+0x3f/0x60 [amdgpu]
[ 2.794125] [ T331] pm_init+0x6b/0x100 [amdgpu]
[ 2.794178] [ T331] start_cpsch+0x1d6/0x270 [amdgpu]
[ 2.794234] [ T331] kgd2kfd_device_init.cold+0x7b9/0xa1a [amdgpu]
[ 2.794365] [ T331] amdgpu_amdkfd_device_init+0x190/0x260 [amdgpu]
[ 2.794444] [ T331] amdgpu_device_init.cold+0x1952/0x1c79 [amdgpu]
[ 2.794556] [ T331] amdgpu_driver_load_kms+0x14/0x80 [amdgpu]
[ 2.794622] [ T331] amdgpu_pci_probe+0x1cd/0x440 [amdgpu]
[ 2.794684] [ T331] pci_device_probe+0xc2/0x1a0
[ 2.794693] [ T331] really_probe+0xd9/0x370
[ 2.794701] [ T331] ? __device_attach_driver+0x130/0x130
[ 2.794710] [ T331] __driver_probe_device+0x80/0x150
[ 2.794718] [ T331] driver_probe_device+0x1a/0x80
[ 2.794726] [ T331] __driver_attach+0xb9/0x1f0
[ 2.794734] [ T331] bus_for_each_dev+0x7b/0xd0
[ 2.794742] [ T331] bus_add_driver+0x11d/0x200
[ 2.794749] [ T331] driver_register+0x6d/0xc0
[ 2.794756] [ T331] ? ledtrig_usb_exit+0x880/0x880 [usb_common]
[ 2.794767] [ T331] do_one_initcall+0x57/0x3a0
[ 2.794774] [ T331] ? __kmalloc_cache_noprof+0x323/0x3f0
[ 2.794785] [ T331] do_init_module+0x5b/0x210
[ 2.794793] [ T331] init_module_from_file+0xd4/0x130
[ 2.794802] [ T331] idempotent_init_module+0x100/0x300
[ 2.794811] [ T331] __x64_sys_finit_module+0x6c/0xe0
[ 2.794819] [ T331] ? kmem_cache_free+0x1e9/0x420
[ 2.794827] [ T331] do_syscall_64+0xf8/0x6b0
[ 2.794834] [ T331] ? lock_release+0xc6/0x2a0
[ 2.794842] [ T331] ? kmem_cache_free+0x279/0x420
[ 2.794849] [ T331] ? do_sys_openat2+0x80/0xc0
[ 2.794857] [ T331] ? __x64_sys_openat+0x4f/0xa0
[ 2.794866] [ T331] ? do_syscall_64+0x1ef/0x6b0
[ 2.794873] [ T331] ? do_syscall_64+0x1ef/0x6b0
[ 2.794880] [ T331] ? do_syscall_64+0x1ef/0x6b0
[ 2.794888] [ T331] ? do_syscall_64+0x1ef/0x6b0
[ 2.794895] [ T331] ? do_syscall_64+0x1ef/0x6b0
[ 2.794903] [ T331] ? lockdep_hardirqs_on_prepare+0xd7/0x180
[ 2.794912] [ T331] ? do_syscall_64+0x38/0x6b0
[ 2.794919] [ T331] ? do_syscall_64+0xad/0x6b0
[ 2.794926] [ T331] entry_SYSCALL_64_after_hwframe+0x55/0x5d
[ 2.795281] [ T331] RIP: 0033:0x7f1f339b97b9
[ 2.795620] [ T331] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05
<48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 66 0d 00 f7 d8 64 89 01 48
[ 2.795936] [ T331] RSP: 002b:00007ffe5b4ce6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 2.796219] [ T331] RAX: ffffffffffffffda RBX: 000055cdfa2b8a20 RCX: 00007f1f339b97b9
[ 2.796490] [ T331] RDX: 0000000000000004 RSI: 00007f1f320f644d RDI: 000000000000003b
[ 2.796741] [ T331] RBP: 0000000000000004 R08: 0000000000000000 R09: 000055cdfa282d70
[ 2.796986] [ T331] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f1f320f644d
[ 2.797227] [ T331] R13: 0000000000020000 R14: 000055cdfa286d60 R15: 0000000000000000
[ 2.797461] [ T331] </TASK>
[ 2.797680] [ T331] irq event stamp: 160663
[ 2.797897] [ T331] hardirqs last enabled at (160663): [<ffffffff835c6cbf>] _raw_spin_unlock_irqrestore+0x3f/0x50
[ 2.798125] [ T331] hardirqs last disabled at (160662): [<ffffffff835c6a7f>] _raw_spin_lock_irqsave+0x4f/0x60
[ 2.798350] [ T331] softirqs last enabled at (160282): [<ffffffff82ac9888>] __irq_exit_rcu+0xc8/0x130
[ 2.798581] [ T331] softirqs last disabled at (160277): [<ffffffff82ac9888>] __irq_exit_rcu+0xc8/0x130
[ 2.798806] [ T331] ---[ end trace 0000000000000000 ]---
Bert Karwatzki
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 9:20 ` Bert Karwatzki
@ 2026-05-21 9:25 ` Mateusz Guzik
2026-05-21 9:57 ` Bert Karwatzki
2026-05-21 10:17 ` Thomas Gleixner
1 sibling, 1 reply; 18+ messages in thread
From: Mateusz Guzik @ 2026-05-21 9:25 UTC (permalink / raw)
To: Bert Karwatzki
Cc: Christian Brauner, linux-kernel, linux-next, linux-rt-devel,
linux-fsdevel, adobriyan, jack, viro, Sebastian Andrzej Siewior,
Thomas Gleixner
On Thu, May 21, 2026 at 11:21 AM Bert Karwatzki <spasswolf@web•de> wrote:
>
> Am Donnerstag, dem 21.05.2026 um 11:09 +0200 schrieb Mateusz Guzik:
> > On Thu, May 21, 2026 at 10:53:03AM +0200, Mateusz Guzik wrote:
> > > Christian, can you fold this in please.
> > >
> > > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > > index 771fc31a69b8..8f17c0abbc95 100644
> > > --- a/fs/filesystems.c
> > > +++ b/fs/filesystems.c
> > > @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > * Did someone beat us to it?
> > > */
> > > if (old && old->gen == file_systems_gen) {
> > > + spin_unlock(&file_systems_lock);
> > > kfree(new);
> > > return 0;
> > > }
> > > @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > * Did the list change in the meantime?
> > > */
> > > if (gen != file_systems_gen) {
> > > + spin_unlock(&file_systems_lock);
> > > kfree(new);
> > > goto retry;
> > > }
> > >
> > >
> >
> > Even better, I got the above fixup + some polish listed below:
> > - removed an extra space in newlen calculation
> > - the WARN_ON_ONCE case needs to free 'new', not 'old'
> > - there is no READ_ONCE anymore in filesystems_proc_show()
> >
> > goes into the "fs: cache the string generated by reading /proc/filesystems"
> > commit.
> >
> > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > index 771fc31a69b8..712316a1e3e0 100644
> > --- a/fs/filesystems.c
> > +++ b/fs/filesystems.c
> > @@ -269,7 +269,7 @@ static __cold noinline int regen_filesystems_string(void)
> > hlist_for_each_entry_rcu(p, &file_systems, list) {
> > if (!(p->fs_flags & FS_REQUIRES_DEV))
> > newlen += strlen("nodev");
> > - newlen += strlen("\t") + strlen(p->name) + strlen("\n");
> > + newlen += strlen("\t") + strlen(p->name) + strlen("\n");
> > }
> > spin_unlock(&file_systems_lock);
> >
> > @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> > * Did someone beat us to it?
> > */
> > if (old && old->gen == file_systems_gen) {
> > + spin_unlock(&file_systems_lock);
> > kfree(new);
> > return 0;
> > }
> > @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> > * Did the list change in the meantime?
> > */
> > if (gen != file_systems_gen) {
> > + spin_unlock(&file_systems_lock);
> > kfree(new);
> > goto retry;
> > }
> > @@ -321,13 +323,12 @@ static __cold noinline int regen_filesystems_string(void)
> > * generation above and messes it up.
> > */
> > spin_unlock(&file_systems_lock);
> > - if (old)
> > - kfree_rcu(old, rcu);
> > + kfree(new);
> > return -EINVAL;
> > }
> >
> > /*
> > - * Paired with consume fence in READ_ONCE() in filesystems_proc_show()
> > + * Paired with consume fence in rcu_dereference() in filesystems_proc_show()
> > */
> > smp_store_release(&file_systems_string, new);
> > spin_unlock(&file_systems_lock);
> >
>
> So it was commit 36b3306779ea
> ("fs: cache the string generated by reading /proc/filesystems")
> which caused the problem. If I had finished the bisection properly instead
> of cutting I probably would have noticed this...
>
> So I tested
>
> diff --git a/fs/filesystems.c b/fs/filesystems.c
> index 771fc31a69b8..8f17c0abbc95 100644
> --- a/fs/filesystems.c
> +++ b/fs/filesystems.c
> @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> * Did someone beat us to it?
> */
> if (old && old->gen == file_systems_gen) {
> + spin_unlock(&file_systems_lock);
> kfree(new);
> return 0;
> }
> @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> * Did the list change in the meantime?
> */
> if (gen != file_systems_gen) {
> + spin_unlock(&file_systems_lock);
> kfree(new);
> goto retry;
> }
>
> with next-20260519 (no RT, no LOCKDEP) and got no crash so far (4 boots only though (next-20260619
> crashed in 2 out of 3 boots without RT)) but I get this warning on every boot:
>
> [ 2.793416] [ T331] ------------[ cut here ]------------
> [ 2.793433] [ T331] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> [ 2.793434] [ T331] WARNING: kernel/locking/mutex.c:625 at __mutex_lock+0x586/0x10c0, CPU#17: (udev-worker)/331
Just so that we are on the same page: if you take the -next tag as is
and revert the "fs: cache the string generated by reading
/proc/filesystems" commit alone things work fine?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 9:25 ` Mateusz Guzik
@ 2026-05-21 9:57 ` Bert Karwatzki
0 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2026-05-21 9:57 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Christian Brauner, linux-kernel, linux-next, linux-rt-devel,
linux-fsdevel, adobriyan, jack, viro, Sebastian Andrzej Siewior,
spasswolf, Thomas Gleixner
Am Donnerstag, dem 21.05.2026 um 11:25 +0200 schrieb Mateusz Guzik:
> On Thu, May 21, 2026 at 11:21 AM Bert Karwatzki <spasswolf@web•de> wrote:
> >
> > Am Donnerstag, dem 21.05.2026 um 11:09 +0200 schrieb Mateusz Guzik:
> > > On Thu, May 21, 2026 at 10:53:03AM +0200, Mateusz Guzik wrote:
> > > > Christian, can you fold this in please.
> > > >
> > > > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > > > index 771fc31a69b8..8f17c0abbc95 100644
> > > > --- a/fs/filesystems.c
> > > > +++ b/fs/filesystems.c
> > > > @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > > * Did someone beat us to it?
> > > > */
> > > > if (old && old->gen == file_systems_gen) {
> > > > + spin_unlock(&file_systems_lock);
> > > > kfree(new);
> > > > return 0;
> > > > }
> > > > @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > > * Did the list change in the meantime?
> > > > */
> > > > if (gen != file_systems_gen) {
> > > > + spin_unlock(&file_systems_lock);
> > > > kfree(new);
> > > > goto retry;
> > > > }
> > > >
> > > >
> > >
> > > Even better, I got the above fixup + some polish listed below:
> > > - removed an extra space in newlen calculation
> > > - the WARN_ON_ONCE case needs to free 'new', not 'old'
> > > - there is no READ_ONCE anymore in filesystems_proc_show()
> > >
> > > goes into the "fs: cache the string generated by reading /proc/filesystems"
> > > commit.
> > >
> > > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > > index 771fc31a69b8..712316a1e3e0 100644
> > > --- a/fs/filesystems.c
> > > +++ b/fs/filesystems.c
> > > @@ -269,7 +269,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > hlist_for_each_entry_rcu(p, &file_systems, list) {
> > > if (!(p->fs_flags & FS_REQUIRES_DEV))
> > > newlen += strlen("nodev");
> > > - newlen += strlen("\t") + strlen(p->name) + strlen("\n");
> > > + newlen += strlen("\t") + strlen(p->name) + strlen("\n");
> > > }
> > > spin_unlock(&file_systems_lock);
> > >
> > > @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > * Did someone beat us to it?
> > > */
> > > if (old && old->gen == file_systems_gen) {
> > > + spin_unlock(&file_systems_lock);
> > > kfree(new);
> > > return 0;
> > > }
> > > @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> > > * Did the list change in the meantime?
> > > */
> > > if (gen != file_systems_gen) {
> > > + spin_unlock(&file_systems_lock);
> > > kfree(new);
> > > goto retry;
> > > }
> > > @@ -321,13 +323,12 @@ static __cold noinline int regen_filesystems_string(void)
> > > * generation above and messes it up.
> > > */
> > > spin_unlock(&file_systems_lock);
> > > - if (old)
> > > - kfree_rcu(old, rcu);
> > > + kfree(new);
> > > return -EINVAL;
> > > }
> > >
> > > /*
> > > - * Paired with consume fence in READ_ONCE() in filesystems_proc_show()
> > > + * Paired with consume fence in rcu_dereference() in filesystems_proc_show()
> > > */
> > > smp_store_release(&file_systems_string, new);
> > > spin_unlock(&file_systems_lock);
> > >
> >
> > So it was commit 36b3306779ea
> > ("fs: cache the string generated by reading /proc/filesystems")
> > which caused the problem. If I had finished the bisection properly instead
> > of cutting I probably would have noticed this...
> >
> > So I tested
> >
> > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > index 771fc31a69b8..8f17c0abbc95 100644
> > --- a/fs/filesystems.c
> > +++ b/fs/filesystems.c
> > @@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
> > * Did someone beat us to it?
> > */
> > if (old && old->gen == file_systems_gen) {
> > + spin_unlock(&file_systems_lock);
> > kfree(new);
> > return 0;
> > }
> > @@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
> > * Did the list change in the meantime?
> > */
> > if (gen != file_systems_gen) {
> > + spin_unlock(&file_systems_lock);
> > kfree(new);
> > goto retry;
> > }
> >
> > with next-20260519 (no RT, no LOCKDEP) and got no crash so far (4 boots only though (next-20260619
> > crashed in 2 out of 3 boots without RT)) but I get this warning on every boot:
> >
> > [ 2.793416] [ T331] ------------[ cut here ]------------
> > [ 2.793433] [ T331] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> > [ 2.793434] [ T331] WARNING: kernel/locking/mutex.c:625 at __mutex_lock+0x586/0x10c0, CPU#17: (udev-worker)/331
>
> Just so that we are on the same page: if you take the -next tag as is
> and revert the "fs: cache the string generated by reading
> /proc/filesystems" commit alone things work fine?
Yes this works fine:
16ff8d6e7c28 (HEAD) Revert "fs: cache the string generated by reading /proc/filesystems"
6a50ba100ace (tag: next-20260519, origin/master, origin/HEAD, master) Add linux-next specific files for 20260519
(tested only without RT as the bug seems to be easier to trigger ...)
Bert Karwatzki
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 8:53 ` Mateusz Guzik
2026-05-21 9:08 ` Sebastian Andrzej Siewior
2026-05-21 9:09 ` Mateusz Guzik
@ 2026-05-21 10:05 ` Thomas Gleixner
2026-05-21 10:13 ` Bert Karwatzki
2 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2026-05-21 10:05 UTC (permalink / raw)
To: Mateusz Guzik, Bert Karwatzki, Christian Brauner
Cc: linux-kernel, linux-next, linux-rt-devel, linux-fsdevel,
adobriyan, jack, viro, Sebastian Andrzej Siewior
On Thu, May 21 2026 at 10:53, Mateusz Guzik wrote:
> On Thu, May 21, 2026 at 12:52:44AM +0200, Bert Karwatzki wrote:
>> With these reverts next-20260519 boots 30 times in a row without error, so
>> it appears that commit dc651e25a6d2 ("fs: RCU-ify filesystems list") causing the
>> error.
>>
>
> I think the patch below will do the trick.
>
> If someone wonders how come the missing unlocks: the original patch had
> them in place, but when I was rebasing on top of the RCU-ifing commit I
> figured I'm going to do guard/scoped_guard in there as well. Later it
> started failing as the compiler did not like goto retry out of a scoped
> guard area and the unlocks did not come back.
>
> tl;dr there is definitely my bug here and it is most likely *the* bug
I was staring at the wrong commit then :)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 10:05 ` Thomas Gleixner
@ 2026-05-21 10:13 ` Bert Karwatzki
0 siblings, 0 replies; 18+ messages in thread
From: Bert Karwatzki @ 2026-05-21 10:13 UTC (permalink / raw)
To: Thomas Gleixner, Mateusz Guzik, Christian Brauner
Cc: linux-kernel, linux-next, linux-rt-devel, linux-fsdevel,
adobriyan, jack, viro, Sebastian Andrzej Siewior, spasswolf
Am Donnerstag, dem 21.05.2026 um 12:05 +0200 schrieb Thomas Gleixner:
> On Thu, May 21 2026 at 10:53, Mateusz Guzik wrote:
> > On Thu, May 21, 2026 at 12:52:44AM +0200, Bert Karwatzki wrote:
> > > With these reverts next-20260519 boots 30 times in a row without error, so
> > > it appears that commit dc651e25a6d2 ("fs: RCU-ify filesystems list") causing the
> > > error.
> > >
> >
> > I think the patch below will do the trick.
> >
> > If someone wonders how come the missing unlocks: the original patch had
> > them in place, but when I was rebasing on top of the RCU-ifing commit I
> > figured I'm going to do guard/scoped_guard in there as well. Later it
> > started failing as the compiler did not like goto retry out of a scoped
> > guard area and the unlocks did not come back.
> >
> > tl;dr there is definitely my bug here and it is most likely *the* bug
>
> I was staring at the wrong commit then :)
Sorry, as the bug was rather hard to trigger (with PREEMPT_RT at least) I
wanted to save myself from 100-150 reboot cycles during the bisection.
Bert Karwatzki
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 9:20 ` Bert Karwatzki
2026-05-21 9:25 ` Mateusz Guzik
@ 2026-05-21 10:17 ` Thomas Gleixner
2026-05-21 10:21 ` Bert Karwatzki
1 sibling, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2026-05-21 10:17 UTC (permalink / raw)
To: Bert Karwatzki, Mateusz Guzik, Christian Brauner
Cc: linux-kernel, linux-next, linux-rt-devel, linux-fsdevel,
adobriyan, jack, viro, Sebastian Andrzej Siewior, spasswolf,
Alex Deucher, amd-gfx
On Thu, May 21 2026 at 11:20, Bert Karwatzki wrote:
> Am Donnerstag, dem 21.05.2026 um 11:09 +0200 schrieb Mateusz Guzik:
>
> with next-20260519 (no RT, no LOCKDEP) and got no crash so far (4 boots only though (next-20260619
> crashed in 2 out of 3 boots without RT)) but I get this warning on every boot:
>
> [ 2.793416] [ T331] ------------[ cut here ]------------
> [ 2.793433] [ T331] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> [ 2.793434] [ T331] WARNING: kernel/locking/mutex.c:625 at __mutex_lock+0x586/0x10c0, CPU#17: (udev-worker)/331
So either the mutex is corrupted or was never initialized.
> [ 2.793463] [ T331] Modules linked in: amdgpu(+) hid_generic usbhid drm_client_lib i2c_algo_bit drm_buddy hid drm_ttm_helper ttm drm_exec
> drm_suballoc_helper mfd_core drm_panel_backlight_quirks gpu_sched amdxcp drm_display_helper drm_kms_helper ahci libahci xhci_pci libata xhci_hcd drm nvme
> scsi_mod igc usbcore nvme_core scsi_common video nvme_keyring i2c_piix4 cec nvme_auth usb_common crc16 i2c_smbus wmi gpio_amdpt gpio_generic
> [ 2.793518] [ T331] CPU: 17 UID: 0 PID: 331 Comm: (udev-worker) Not tainted 7.1.0-rc4-next-20260519-rcunortlockdep-dirty #465 PREEMPT
> [ 2.793534] [ T331] Hardware name: ASUS System Product Name/ROG STRIX B850-F GAMING WIFI, BIOS 1627 02/05/2026
> [ 2.793547] [ T331] RIP: 0010:__mutex_lock+0x58d/0x10c0
> [ 2.793555] [ T331] Code: 4c 8b 4d 88 85 c0 0f 84 f8 fa ff ff 44 8b 15 ca 9b 81 00 45 85 d2 0f 85 e8 fa ff ff 48 8d 3d 1a 57 82 00 48 c7 c6 a6 51 9e 83
> <67> 48 0f b9 3a 4c 8b 4d 88 e9 cc fa ff ff 48 8b bd 78 ff ff ff e8
> [ 2.793579] [ T331] RSP: 0018:ffffa497016c3510 EFLAGS: 00010246
> [ 2.793588] [ T331] RAX: 0000000000000001 RBX: ffff88c33a4c2ad8 RCX: 0000000000000000
> [ 2.793598] [ T331] RDX: 0000000000000001 RSI: ffffffff839e51a6 RDI: ffffffff83de3c00
> [ 2.793609] [ T331] RBP: ffffa497016c35c0 R08: ffffffffc0a55d92 R09: 0000000000000000
> [ 2.793619] [ T331] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 2.793629] [ T331] R13: 0000000000000002 R14: ffffa497016c3550 R15: 0000000000268000
> [ 2.793641] [ T331] FS: 00007f1f32e5b9c0(0000) GS:ffff88d23b2ca000(0000) knlGS:0000000000000000
> [ 2.793653] [ T331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.793662] [ T331] CR2: 000055cdfa28f588 CR3: 0000000112e73000 CR4: 0000000000f50ef0
> [ 2.793673] [ T331] PKRU: 55555554
> [ 2.793678] [ T331] Call Trace:
> [ 2.793683] [ T331] <TASK>
> [ 2.793687] [ T331] ? lock_acquire+0xbe/0x2d0
> [ 2.793696] [ T331] ? init_mqd+0x122/0x190 [amdgpu]
> [ 2.793809] [ T331] ? lock_release+0xc6/0x2a0
> [ 2.793816] [ T331] ? init_mqd+0x122/0x190 [amdgpu]
> [ 2.793902] [ T331] init_mqd+0x122/0x190 [amdgpu]
> [ 2.793961] [ T331] init_mqd_hiq+0xd/0x20 [amdgpu]
> [ 2.794015] [ T331] kq_initialize.constprop.0+0x2b8/0x370 [amdgpu]
> [ 2.794071] [ T331] kernel_queue_init+0x3f/0x60 [amdgpu]
> [ 2.794125] [ T331] pm_init+0x6b/0x100 [amdgpu]
> [ 2.794178] [ T331] start_cpsch+0x1d6/0x270 [amdgpu]
> [ 2.794234] [ T331] kgd2kfd_device_init.cold+0x7b9/0xa1a [amdgpu]
> [ 2.794365] [ T331] amdgpu_amdkfd_device_init+0x190/0x260 [amdgpu]
amdgpu_amdkfd_device_init()
kgd2kfd_device_init() {
....
init_mqd()
mutex_lock(... profiler_lock); <- FAIL
mutex_init(...profiler_lock);
}
Seems the famous graphics CI failed to catch this...
Thanks,
tglx
---
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -744,6 +744,9 @@ bool kgd2kfd_device_init(struct kfd_dev
KGD_ENGINE_SDMA1);
kfd->shared_resources = *gpu_resources;
+ kfd->profiler_process = NULL;
+ mutex_init(&kfd->profiler_lock);
+
kfd->num_nodes = amdgpu_xcp_get_num_xcp(kfd->adev->xcp_mgr);
if (kfd->num_nodes == 0) {
@@ -936,9 +939,6 @@ bool kgd2kfd_device_init(struct kfd_dev
svm_range_set_max_pages(kfd->adev);
- kfd->profiler_process = NULL;
- mutex_init(&kfd->profiler_lock);
-
kfd->init_complete = true;
dev_info(kfd_device, "added device %x:%x\n", kfd->adev->pdev->vendor,
kfd->adev->pdev->device);
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 10:17 ` Thomas Gleixner
@ 2026-05-21 10:21 ` Bert Karwatzki
2026-05-21 10:33 ` Mateusz Guzik
0 siblings, 1 reply; 18+ messages in thread
From: Bert Karwatzki @ 2026-05-21 10:21 UTC (permalink / raw)
To: Thomas Gleixner, Mateusz Guzik, Christian Brauner
Cc: linux-kernel, linux-next, linux-rt-devel, linux-fsdevel,
adobriyan, jack, viro, Sebastian Andrzej Siewior, spasswolf,
Alex Deucher, amd-gfx
Am Donnerstag, dem 21.05.2026 um 12:17 +0200 schrieb Thomas Gleixner:
> On Thu, May 21 2026 at 11:20, Bert Karwatzki wrote:
> > Am Donnerstag, dem 21.05.2026 um 11:09 +0200 schrieb Mateusz Guzik:
> >
> > with next-20260519 (no RT, no LOCKDEP) and got no crash so far (4 boots only though (next-20260619
> > crashed in 2 out of 3 boots without RT)) but I get this warning on every boot:
> >
> > [ 2.793416] [ T331] ------------[ cut here ]------------
> > [ 2.793433] [ T331] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> > [ 2.793434] [ T331] WARNING: kernel/locking/mutex.c:625 at __mutex_lock+0x586/0x10c0, CPU#17: (udev-worker)/331
>
> So either the mutex is corrupted or was never initialized.
>
> > [ 2.793463] [ T331] Modules linked in: amdgpu(+) hid_generic usbhid drm_client_lib i2c_algo_bit drm_buddy hid drm_ttm_helper ttm drm_exec
> > drm_suballoc_helper mfd_core drm_panel_backlight_quirks gpu_sched amdxcp drm_display_helper drm_kms_helper ahci libahci xhci_pci libata xhci_hcd drm nvme
> > scsi_mod igc usbcore nvme_core scsi_common video nvme_keyring i2c_piix4 cec nvme_auth usb_common crc16 i2c_smbus wmi gpio_amdpt gpio_generic
> > [ 2.793518] [ T331] CPU: 17 UID: 0 PID: 331 Comm: (udev-worker) Not tainted 7.1.0-rc4-next-20260519-rcunortlockdep-dirty #465 PREEMPT
> > [ 2.793534] [ T331] Hardware name: ASUS System Product Name/ROG STRIX B850-F GAMING WIFI, BIOS 1627 02/05/2026
> > [ 2.793547] [ T331] RIP: 0010:__mutex_lock+0x58d/0x10c0
> > [ 2.793555] [ T331] Code: 4c 8b 4d 88 85 c0 0f 84 f8 fa ff ff 44 8b 15 ca 9b 81 00 45 85 d2 0f 85 e8 fa ff ff 48 8d 3d 1a 57 82 00 48 c7 c6 a6 51 9e 83
> > <67> 48 0f b9 3a 4c 8b 4d 88 e9 cc fa ff ff 48 8b bd 78 ff ff ff e8
> > [ 2.793579] [ T331] RSP: 0018:ffffa497016c3510 EFLAGS: 00010246
> > [ 2.793588] [ T331] RAX: 0000000000000001 RBX: ffff88c33a4c2ad8 RCX: 0000000000000000
> > [ 2.793598] [ T331] RDX: 0000000000000001 RSI: ffffffff839e51a6 RDI: ffffffff83de3c00
> > [ 2.793609] [ T331] RBP: ffffa497016c35c0 R08: ffffffffc0a55d92 R09: 0000000000000000
> > [ 2.793619] [ T331] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > [ 2.793629] [ T331] R13: 0000000000000002 R14: ffffa497016c3550 R15: 0000000000268000
> > [ 2.793641] [ T331] FS: 00007f1f32e5b9c0(0000) GS:ffff88d23b2ca000(0000) knlGS:0000000000000000
> > [ 2.793653] [ T331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2.793662] [ T331] CR2: 000055cdfa28f588 CR3: 0000000112e73000 CR4: 0000000000f50ef0
> > [ 2.793673] [ T331] PKRU: 55555554
> > [ 2.793678] [ T331] Call Trace:
> > [ 2.793683] [ T331] <TASK>
> > [ 2.793687] [ T331] ? lock_acquire+0xbe/0x2d0
> > [ 2.793696] [ T331] ? init_mqd+0x122/0x190 [amdgpu]
> > [ 2.793809] [ T331] ? lock_release+0xc6/0x2a0
> > [ 2.793816] [ T331] ? init_mqd+0x122/0x190 [amdgpu]
> > [ 2.793902] [ T331] init_mqd+0x122/0x190 [amdgpu]
> > [ 2.793961] [ T331] init_mqd_hiq+0xd/0x20 [amdgpu]
> > [ 2.794015] [ T331] kq_initialize.constprop.0+0x2b8/0x370 [amdgpu]
> > [ 2.794071] [ T331] kernel_queue_init+0x3f/0x60 [amdgpu]
> > [ 2.794125] [ T331] pm_init+0x6b/0x100 [amdgpu]
> > [ 2.794178] [ T331] start_cpsch+0x1d6/0x270 [amdgpu]
> > [ 2.794234] [ T331] kgd2kfd_device_init.cold+0x7b9/0xa1a [amdgpu]
> > [ 2.794365] [ T331] amdgpu_amdkfd_device_init+0x190/0x260 [amdgpu]
>
> amdgpu_amdkfd_device_init()
> kgd2kfd_device_init() {
> ....
> init_mqd()
> mutex_lock(... profiler_lock); <- FAIL
>
> mutex_init(...profiler_lock);
> }
>
> Seems the famous graphics CI failed to catch this...
>
> Thanks,
>
> tglx
> ---
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -744,6 +744,9 @@ bool kgd2kfd_device_init(struct kfd_dev
> KGD_ENGINE_SDMA1);
> kfd->shared_resources = *gpu_resources;
>
> + kfd->profiler_process = NULL;
> + mutex_init(&kfd->profiler_lock);
> +
> kfd->num_nodes = amdgpu_xcp_get_num_xcp(kfd->adev->xcp_mgr);
>
> if (kfd->num_nodes == 0) {
> @@ -936,9 +939,6 @@ bool kgd2kfd_device_init(struct kfd_dev
>
> svm_range_set_max_pages(kfd->adev);
>
> - kfd->profiler_process = NULL;
> - mutex_init(&kfd->profiler_lock);
> -
> kfd->init_complete = true;
> dev_info(kfd_device, "added device %x:%x\n", kfd->adev->pdev->vendor,
> kfd->adev->pdev->device);
Actually, when I test next-20260519 with the improved fix, I do not see
the warning from amdgpu.
diff --git a/fs/filesystems.c b/fs/filesystems.c
index 771fc31a69b8..712316a1e3e0 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -269,7 +269,7 @@ static __cold noinline int regen_filesystems_string(void)
hlist_for_each_entry_rcu(p, &file_systems, list) {
if (!(p->fs_flags & FS_REQUIRES_DEV))
newlen += strlen("nodev");
- newlen += strlen("\t") + strlen(p->name) + strlen("\n");
+ newlen += strlen("\t") + strlen(p->name) + strlen("\n");
}
spin_unlock(&file_systems_lock);
@@ -289,6 +289,7 @@ static __cold noinline int regen_filesystems_string(void)
* Did someone beat us to it?
*/
if (old && old->gen == file_systems_gen) {
+ spin_unlock(&file_systems_lock);
kfree(new);
return 0;
}
@@ -297,6 +298,7 @@ static __cold noinline int regen_filesystems_string(void)
* Did the list change in the meantime?
*/
if (gen != file_systems_gen) {
+ spin_unlock(&file_systems_lock);
kfree(new);
goto retry;
}
@@ -321,13 +323,12 @@ static __cold noinline int regen_filesystems_string(void)
* generation above and messes it up.
*/
spin_unlock(&file_systems_lock);
- if (old)
- kfree_rcu(old, rcu);
+ kfree(new);
return -EINVAL;
}
/*
- * Paired with consume fence in READ_ONCE() in filesystems_proc_show()
+ * Paired with consume fence in rcu_dereference() in filesystems_proc_show()
*/
smp_store_release(&file_systems_string, new);
spin_unlock(&file_systems_lock);
Bert Karwatzki
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 10:21 ` Bert Karwatzki
@ 2026-05-21 10:33 ` Mateusz Guzik
2026-05-21 11:50 ` Bert Karwatzki
0 siblings, 1 reply; 18+ messages in thread
From: Mateusz Guzik @ 2026-05-21 10:33 UTC (permalink / raw)
To: Bert Karwatzki
Cc: Thomas Gleixner, Christian Brauner, linux-kernel, linux-next,
linux-rt-devel, linux-fsdevel, adobriyan, jack, viro,
Sebastian Andrzej Siewior, Alex Deucher, amd-gfx
On Thu, May 21, 2026 at 12:22 PM Bert Karwatzki <spasswolf@web•de> wrote:
>
> Am Donnerstag, dem 21.05.2026 um 12:17 +0200 schrieb Thomas Gleixner:
> > On Thu, May 21 2026 at 11:20, Bert Karwatzki wrote:
> > > Am Donnerstag, dem 21.05.2026 um 11:09 +0200 schrieb Mateusz Guzik:
> > >
> > > with next-20260519 (no RT, no LOCKDEP) and got no crash so far (4 boots only though (next-20260619
> > > crashed in 2 out of 3 boots without RT)) but I get this warning on every boot:
> > >
> > > [ 2.793416] [ T331] ------------[ cut here ]------------
> > > [ 2.793433] [ T331] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> > > [ 2.793434] [ T331] WARNING: kernel/locking/mutex.c:625 at __mutex_lock+0x586/0x10c0, CPU#17: (udev-worker)/331
> >
> > So either the mutex is corrupted or was never initialized.
> >
> > > [ 2.793463] [ T331] Modules linked in: amdgpu(+) hid_generic usbhid drm_client_lib i2c_algo_bit drm_buddy hid drm_ttm_helper ttm drm_exec
> > > drm_suballoc_helper mfd_core drm_panel_backlight_quirks gpu_sched amdxcp drm_display_helper drm_kms_helper ahci libahci xhci_pci libata xhci_hcd drm nvme
> > > scsi_mod igc usbcore nvme_core scsi_common video nvme_keyring i2c_piix4 cec nvme_auth usb_common crc16 i2c_smbus wmi gpio_amdpt gpio_generic
> > > [ 2.793518] [ T331] CPU: 17 UID: 0 PID: 331 Comm: (udev-worker) Not tainted 7.1.0-rc4-next-20260519-rcunortlockdep-dirty #465 PREEMPT
> > > [ 2.793534] [ T331] Hardware name: ASUS System Product Name/ROG STRIX B850-F GAMING WIFI, BIOS 1627 02/05/2026
> > > [ 2.793547] [ T331] RIP: 0010:__mutex_lock+0x58d/0x10c0
> > > [ 2.793555] [ T331] Code: 4c 8b 4d 88 85 c0 0f 84 f8 fa ff ff 44 8b 15 ca 9b 81 00 45 85 d2 0f 85 e8 fa ff ff 48 8d 3d 1a 57 82 00 48 c7 c6 a6 51 9e 83
> > > <67> 48 0f b9 3a 4c 8b 4d 88 e9 cc fa ff ff 48 8b bd 78 ff ff ff e8
> > > [ 2.793579] [ T331] RSP: 0018:ffffa497016c3510 EFLAGS: 00010246
> > > [ 2.793588] [ T331] RAX: 0000000000000001 RBX: ffff88c33a4c2ad8 RCX: 0000000000000000
> > > [ 2.793598] [ T331] RDX: 0000000000000001 RSI: ffffffff839e51a6 RDI: ffffffff83de3c00
> > > [ 2.793609] [ T331] RBP: ffffa497016c35c0 R08: ffffffffc0a55d92 R09: 0000000000000000
> > > [ 2.793619] [ T331] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > [ 2.793629] [ T331] R13: 0000000000000002 R14: ffffa497016c3550 R15: 0000000000268000
> > > [ 2.793641] [ T331] FS: 00007f1f32e5b9c0(0000) GS:ffff88d23b2ca000(0000) knlGS:0000000000000000
> > > [ 2.793653] [ T331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 2.793662] [ T331] CR2: 000055cdfa28f588 CR3: 0000000112e73000 CR4: 0000000000f50ef0
> > > [ 2.793673] [ T331] PKRU: 55555554
> > > [ 2.793678] [ T331] Call Trace:
> > > [ 2.793683] [ T331] <TASK>
> > > [ 2.793687] [ T331] ? lock_acquire+0xbe/0x2d0
> > > [ 2.793696] [ T331] ? init_mqd+0x122/0x190 [amdgpu]
> > > [ 2.793809] [ T331] ? lock_release+0xc6/0x2a0
> > > [ 2.793816] [ T331] ? init_mqd+0x122/0x190 [amdgpu]
> > > [ 2.793902] [ T331] init_mqd+0x122/0x190 [amdgpu]
> > > [ 2.793961] [ T331] init_mqd_hiq+0xd/0x20 [amdgpu]
> > > [ 2.794015] [ T331] kq_initialize.constprop.0+0x2b8/0x370 [amdgpu]
> > > [ 2.794071] [ T331] kernel_queue_init+0x3f/0x60 [amdgpu]
> > > [ 2.794125] [ T331] pm_init+0x6b/0x100 [amdgpu]
> > > [ 2.794178] [ T331] start_cpsch+0x1d6/0x270 [amdgpu]
> > > [ 2.794234] [ T331] kgd2kfd_device_init.cold+0x7b9/0xa1a [amdgpu]
> > > [ 2.794365] [ T331] amdgpu_amdkfd_device_init+0x190/0x260 [amdgpu]
> >
> > amdgpu_amdkfd_device_init()
> > kgd2kfd_device_init() {
> > ....
> > init_mqd()
> > mutex_lock(... profiler_lock); <- FAIL
> >
> > mutex_init(...profiler_lock);
> > }
> >
> > Seems the famous graphics CI failed to catch this...
> >
> > Thanks,
> >
> > tglx
> > ---
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > @@ -744,6 +744,9 @@ bool kgd2kfd_device_init(struct kfd_dev
> > KGD_ENGINE_SDMA1);
> > kfd->shared_resources = *gpu_resources;
> >
> > + kfd->profiler_process = NULL;
> > + mutex_init(&kfd->profiler_lock);
> > +
> > kfd->num_nodes = amdgpu_xcp_get_num_xcp(kfd->adev->xcp_mgr);
> >
> > if (kfd->num_nodes == 0) {
> > @@ -936,9 +939,6 @@ bool kgd2kfd_device_init(struct kfd_dev
> >
> > svm_range_set_max_pages(kfd->adev);
> >
> > - kfd->profiler_process = NULL;
> > - mutex_init(&kfd->profiler_lock);
> > -
> > kfd->init_complete = true;
> > dev_info(kfd_device, "added device %x:%x\n", kfd->adev->pdev->vendor,
> > kfd->adev->pdev->device);
>
> Actually, when I test next-20260519 with the improved fix, I do not see
> the warning from amdgpu.
>
Can you please do the following:
1. go back to the known crashing-tag, add my fix, verify you still get
the amd splat and then try out the fix provided by Thomas
2. regardless if the above helps, can you boot a kernel built with
CONFIG_KASAN=y
fwiw I verified my patch works fine with KASAN, including by
intentionally miscalculating the size of the target buffer and seeing
a nice splat from it so I'm confident I'm not corrupting anything.
However, as there are new mallocs + free flying around at early boot,
it is *plausible* amd was getting zeroed memory without asking for it
and it worked by accident.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 10:33 ` Mateusz Guzik
@ 2026-05-21 11:50 ` Bert Karwatzki
2026-05-21 12:01 ` Mateusz Guzik
0 siblings, 1 reply; 18+ messages in thread
From: Bert Karwatzki @ 2026-05-21 11:50 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Thomas Gleixner, spasswolf, Christian Brauner, spasswolf,
linux-kernel, linux-next, linux-rt-devel, linux-fsdevel,
adobriyan, jack, viro, Sebastian Andrzej Siewior, Alex Deucher,
amd-gfx
>
> Can you please do the following:
> 1. go back to the known crashing-tag, add my fix, verify you still get
> the amd splat and then try out the fix provided by Thomas
> 2. regardless if the above helps, can you boot a kernel built with
> CONFIG_KASAN=y
>
> fwiw I verified my patch works fine with KASAN, including by
> intentionally miscalculating the size of the target buffer and seeing
> a nice splat from it so I'm confident I'm not corrupting anything.
> However, as there are new mallocs + free flying around at early boot,
> it is *plausible* amd was getting zeroed memory without asking for it
> and it worked by accident.
I think the warnning from amdgpu is only displayed with CONFIG_LOCKDEP=y, so
your "improved fix" does not silence the warning from amdgpu.
The additional fix from Thomas fixes the amdgpu warning.
I also built the kernel with CONFIG_KASAN and get no error messages.
Bert Karwatzki
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 11:50 ` Bert Karwatzki
@ 2026-05-21 12:01 ` Mateusz Guzik
2026-05-28 17:59 ` Bert Karwatzki
0 siblings, 1 reply; 18+ messages in thread
From: Mateusz Guzik @ 2026-05-21 12:01 UTC (permalink / raw)
To: Bert Karwatzki
Cc: Thomas Gleixner, Christian Brauner, linux-kernel, linux-next,
linux-rt-devel, linux-fsdevel, adobriyan, jack, viro,
Sebastian Andrzej Siewior, Alex Deucher, amd-gfx
On Thu, May 21, 2026 at 1:51 PM Bert Karwatzki <spasswolf@web•de> wrote:
>
>
> >
> > Can you please do the following:
> > 1. go back to the known crashing-tag, add my fix, verify you still get
> > the amd splat and then try out the fix provided by Thomas
> > 2. regardless if the above helps, can you boot a kernel built with
> > CONFIG_KASAN=y
> >
> > fwiw I verified my patch works fine with KASAN, including by
> > intentionally miscalculating the size of the target buffer and seeing
> > a nice splat from it so I'm confident I'm not corrupting anything.
> > However, as there are new mallocs + free flying around at early boot,
> > it is *plausible* amd was getting zeroed memory without asking for it
> > and it worked by accident.
>
> I think the warnning from amdgpu is only displayed with CONFIG_LOCKDEP=y, so
> your "improved fix" does not silence the warning from amdgpu.
>
> The additional fix from Thomas fixes the amdgpu warning.
>
I just wanted to confirm my patch does not *cause* issues, at worst
uncovers them.
> I also built the kernel with CONFIG_KASAN and get no error messages.
>
nice
I presume Thomas will handle getting the amdgpu patch to the right
people, I think it will be fine to drop all the mailing lists and the
cc's. :-)
So overall I think we are done here.
Thank you for testing and sorry for the breakage.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-21 12:01 ` Mateusz Guzik
@ 2026-05-28 17:59 ` Bert Karwatzki
2026-05-29 17:20 ` Mateusz Guzik
0 siblings, 1 reply; 18+ messages in thread
From: Bert Karwatzki @ 2026-05-28 17:59 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Thomas Gleixner, spasswolf, Christian Brauner, linux-kernel,
linux-next, linux-rt-devel, linux-fsdevel, adobriyan, jack, viro,
Sebastian Andrzej Siewior, Alex Deucher, amd-gfx
Am Donnerstag, dem 21.05.2026 um 14:01 +0200 schrieb Mateusz Guzik:
>
> So overall I think we are done here.
>
> Thank you for testing and sorry for the breakage.
Just as a reminder, this has not been fixed in linux-next, yet,
up to version next-20260528.
Bert Karwatzki
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT
2026-05-28 17:59 ` Bert Karwatzki
@ 2026-05-29 17:20 ` Mateusz Guzik
0 siblings, 0 replies; 18+ messages in thread
From: Mateusz Guzik @ 2026-05-29 17:20 UTC (permalink / raw)
To: Bert Karwatzki
Cc: Thomas Gleixner, Christian Brauner, linux-kernel, linux-next,
linux-rt-devel, linux-fsdevel, adobriyan, jack, viro,
Sebastian Andrzej Siewior, Alex Deucher, amd-gfx
On Thu, May 28, 2026 at 7:59 PM Bert Karwatzki <spasswolf@web•de> wrote:
>
> Am Donnerstag, dem 21.05.2026 um 14:01 +0200 schrieb Mateusz Guzik:
> >
> > So overall I think we are done here.
> >
> > Thank you for testing and sorry for the breakage.
>
> Just as a reminder, this has not been fixed in linux-next, yet,
> up to version next-20260528.
>
I sent a v4 of the patchset with some extra touch ups:
https://lore.kernel.org/linux-fsdevel/20260529171840.2576445-1-mjguzik@gmail.com/T/#t
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-05-29 17:20 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20 22:52 context switch within RCU read-side critical section in next-20260518+ with PREEMPT_RT Bert Karwatzki
2026-05-21 8:37 ` Thomas Gleixner
2026-05-21 8:53 ` Mateusz Guzik
2026-05-21 9:08 ` Sebastian Andrzej Siewior
2026-05-21 9:17 ` Mateusz Guzik
2026-05-21 9:09 ` Mateusz Guzik
2026-05-21 9:20 ` Bert Karwatzki
2026-05-21 9:25 ` Mateusz Guzik
2026-05-21 9:57 ` Bert Karwatzki
2026-05-21 10:17 ` Thomas Gleixner
2026-05-21 10:21 ` Bert Karwatzki
2026-05-21 10:33 ` Mateusz Guzik
2026-05-21 11:50 ` Bert Karwatzki
2026-05-21 12:01 ` Mateusz Guzik
2026-05-28 17:59 ` Bert Karwatzki
2026-05-29 17:20 ` Mateusz Guzik
2026-05-21 10:05 ` Thomas Gleixner
2026-05-21 10:13 ` Bert Karwatzki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox