public inbox for linux-next@vger.kernel.org 
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse•com>
To: dsterba@suse•cz, Venkat Rao Bagalkote <venkat88@linux•ibm.com>
Cc: riteshh@linux•ibm.com, linux-btrfs@vger•kernel.org,
	Qu Wenruo <quwenruo.btrfs@gmx•com>,
	David Sterba <dsterba@suse•com>,
	LKML <linux-kernel@vger•kernel.org>,
	Madhavan Srinivasan <maddy@linux•ibm.com>,
	Linux Next Mailing List <linux-next@vger•kernel.org>,
	Stephen Rothwell <sfr@canb•auug.org.au>
Subject: Re: [linux-next20251112]Kernel OOPs while running btrfs/023 test case
Date: Fri, 14 Nov 2025 08:03:05 +1030	[thread overview]
Message-ID: <d84d8a70-bd78-4e49-965f-150a3c231d2a@suse.com> (raw)
In-Reply-To: <20251113155107.GQ13846@twin.jikos.cz>



在 2025/11/14 02:21, David Sterba 写道:
> On Thu, Nov 13, 2025 at 06:47:43PM +0530, Venkat Rao Bagalkote wrote:
>> On 13/11/25 6:21 pm, Venkat Rao Bagalkote wrote:
>>> Greetings!!!
>>>
>>> IBM CI has reported a kernel crash while running btrfs/023 test from
>>> xfstest suite on IBM Power11 system.
>>>
>>>
>>> Traces:
>>> [  184.714500] BTRFS: device fsid b8c762d5-3f1a-4020-bca9-2e7e107e5363
>>> devid 1 transid 8 /dev/loop1 (7:1) scanned by mkfs.btrfs (2697)
>>> [  184.714612] BTRFS: device fsid b8c762d5-3f1a-4020-bca9-2e7e107e5363
>>> devid 2 transid 8 /dev/loop2 (7:2) scanned by mkfs.btrfs (2697)
>>> [  184.714731] BTRFS: device fsid b8c762d5-3f1a-4020-bca9-2e7e107e5363
>>> devid 3 transid 8 /dev/loop3 (7:3) scanned by mkfs.btrfs (2697)
>>> [  184.714825] BTRFS: device fsid b8c762d5-3f1a-4020-bca9-2e7e107e5363
>>> devid 4 transid 8 /dev/loop4 (7:4) scanned by mkfs.btrfs (2697)
>>> [  184.714918] BTRFS: device fsid b8c762d5-3f1a-4020-bca9-2e7e107e5363
>>> devid 5 transid 8 /dev/loop5 (7:5) scanned by mkfs.btrfs (2697)
>>> [  184.720659] BTRFS info (device loop1): first mount of filesystem
>>> b8c762d5-3f1a-4020-bca9-2e7e107e5363
>>> [  184.720694] BTRFS info (device loop1): using crc32c (crc32c-lib)
>>> checksum algorithm
>>> [  184.720708] BTRFS info (device loop1): forcing free space tree for
>>> sector size 4096 with page size 65536
>>> [  184.725011] BTRFS info (device loop1): checking UUID tree
>>> [  184.725060] BTRFS info (device loop1): enabling ssd optimizations
>>> [  184.725068] BTRFS info (device loop1): turning on async discard
>>> [  184.725075] BTRFS info (device loop1): enabling free space tree
>>> [  184.735050] BUG: Unable to handle kernel data access at
>>> 0x6696fffdda1ea4c2
>>> [  184.735072] Faulting instruction address: 0xc0000000007bd030
>>> [  184.735087] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [  184.735101] LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
>>> [  184.735118] Modules linked in: loop nft_fib_inet nft_fib_ipv4
>>> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
>>> nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
>>> nf_defrag_ipv4 bonding tls ip_set rfkill nf_tables sunrpc nfnetlink
>>> pseries_rng vmx_crypto fuse ext4 crc16 mbcache jbd2 sd_mod sg ibmvscsi
>>> ibmveth scsi_transport_srp pseries_wdt
>>> [  184.735316] CPU: 22 UID: 0 PID: 1948 Comm: systemd-udevd Kdump:
>>> loaded Tainted: G    B               6.18.0-rc5-next-20251112 #1
>>> VOLUNTARY
>>> [  184.735342] Tainted: [B]=BAD_PAGE
>>> [  184.735352] Hardware name: IBM,9080-HEX Power11 (architected)
>>> 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
>>> [  184.735369] NIP:  c0000000007bd030 LR: c0000000007bcef4 CTR:
>>> c000000000902824
>>> [  184.735386] REGS: c00000006fdb7910 TRAP: 0380   Tainted: G B
>>>        (6.18.0-rc5-next-20251112)
>>> [  184.735404] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR:
>>> 28004402  XER: 20040000
>>> [  184.735460] CFAR: c0000000007bcf98 IRQMASK: 0
>>> [  184.735460] GPR00: c0000000007bcef4 c00000006fdb7bb0
>>> c0000000026aa100 0000000000000000
>>> [  184.735460] GPR04: 0000000000000cc0 000000013470ff60
>>> 00000000000006f0 c0000009906ff4f0
>>> [  184.735460] GPR08: 669164fddb1e9c02 0000000000000800
>>> 000000098d420000 0000000000000000
>>> [  184.735460] GPR12: c000000000902824 c000000991e0e700
>>> 0000000000000000 0000000000000000
>>> [  184.735460] GPR16: 0000000000000000 0000000000000000
>>> 0000000000000000 0000000000000000
>>> [  184.735460] GPR20: 0000000000000000 0000000000000000
>>> 0000000000000000 0000000000000000
>>> [  184.735460] GPR24: 00000000000006ef 0000000000001000
>>> ffffffffffffffff c00c000000402680
>>> [  184.735460] GPR28: c0000000008f312c 0000000000000cc0
>>> 6696fffdda1e9cc2 c00000000701e880
>>> [  184.735688] NIP [c0000000007bd030] kmem_cache_alloc_noprof+0x4ac/0x708
>>> [  184.735711] LR [c0000000007bcef4] kmem_cache_alloc_noprof+0x370/0x708
>>> [  184.735729] Call Trace:
>>> [  184.735738] [c00000006fdb7bb0] [c0000000007bcef4]
>>> kmem_cache_alloc_noprof+0x370/0x708 (unreliable)
>>> [  184.735766] [c00000006fdb7c30] [c0000000008f312c]
>>> getname_flags.part.0+0x54/0x30c
>>> [  184.735793] [c00000006fdb7c80] [c0000000009028a0]
>>> sys_unlinkat+0x7c/0xe4
>>> [  184.735814] [c00000006fdb7cc0] [c000000000039d50]
>>> system_call_exception+0x1e0/0x450
>>> [  184.735839] [c00000006fdb7e50] [c00000000000d05c]
>>> system_call_vectored_common+0x15c/0x2ec
>>> [  184.735866] ---- interrupt: 3000 at 0x7fff9df366bc
>>> [  184.735881] NIP:  00007fff9df366bc LR: 00007fff9df366bc CTR:
>>> 0000000000000000
>>> [  184.735897] REGS: c00000006fdb7e80 TRAP: 3000   Tainted: G B
>>>        (6.18.0-rc5-next-20251112)
>>> [  184.735913] MSR:  800000000280f033
>>> <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 48004402  XER: 00000000
>>> [  184.735989] IRQMASK: 0
>>> [  184.735989] GPR00: 0000000000000124 00007fffe0b3a3a0
>>> 00007fff9e037d00 0000000000000006
>>> [  184.735989] GPR04: 000000013470ff60 0000000000000000
>>> 0000000000001000 00007fff9e0314b8
>>> [  184.735989] GPR08: 0000000000000271 0000000000000000
>>> 0000000000000000 0000000000000000
>>> [  184.735989] GPR12: 0000000000000000 00007fff9e8c4ca0
>>> 00000001161e5a78 00007fffe0b3ab10
>>> [  184.735989] GPR16: 0000000000000003 0000000000000000
>>> 00000001161aaed0 00000001161e9750
>>> [  184.735989] GPR20: 00007fffe0b3a780 00000001161eb260
>>> 00000001161eb320 0000000000000008
>>> [  184.735989] GPR24: 00000001347061c0 0000000000000000
>>> 0000000000000009 00000001347061c0
>>> [  184.735989] GPR28: 0000000000000006 00007fffe0b3a53c
>>> 0000000134715740 0000000000100000
>>> [  184.736216] NIP [00007fff9df366bc] 0x7fff9df366bc
>>> [  184.736231] LR [00007fff9df366bc] 0x7fff9df366bc
>>> [  184.736251] ---- interrupt: 3000
>>> [  184.736262] Code: f8610030 4082fccc 4bfffc28 2c3e0000 4182ff98
>>> 2c3b0000 4182ff90 60000000 3b40ffff 813f0030 e91f00c0 38d80001
>>> <7f7e482a> 7d3e4a14 79270022 552ac03e
>>> [  184.736362] ---[ end trace 0000000000000000 ]---
>>>
> 
> Thanks for the report.
> 
>> Mostly the issue got introduced by one of the below three commits. As
>> reverting these three, this issue is not seen.
>>
>>
>> 9299051573d9 e8ea54f86241 cd93c0aad7e3
> 
> 9299051573d9 btrfs: enable encoded read/write/send for bs > ps cases
> e8ea54f86241 btrfs: make read verification handle bs > ps cases without large folios
> cd93c0aad7e3 btrfs: make btrfs_repair_io_failure() handle bs > ps cases without large folios
> 

I located the problem to be the patch "btrfs: raid56: remove sector_ptr 
structure", where I have a local fix not submitted to the mailing list.

And during the recent push into for-next branch, I'm again using the 
mailing list one, not the local fixed one, resulting 
btrfs_raid_bio::stripe_paddrs[*] to be assigned way beyond its boundary.

This makes us to randomly corrupt the memory, resulting weird results.

And the fix is pretty straightforward:

Bad:

+		rbio->stripe_paddrs[i] = page_to_phys(rbio->stripe_pages[page_index] +
+						      offset_in_page(offset));

Good:

+		rbio->stripe_paddrs[i] = page_to_phys(rbio->stripe_pages[page_index]) +
+						      offset_in_page(offset);

Since offset_in_page() is involved, it only affects subpage systems.

I'll fold the fix into the offending patch.

Thanks for the report, and sorry for the bug.

Thanks,
Qu

      parent reply	other threads:[~2025-11-13 21:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-13 12:51 [linux-next20251112]Kernel OOPs while running btrfs/023 test case Venkat Rao Bagalkote
2025-11-13 13:17 ` Venkat Rao Bagalkote
2025-11-13 15:51   ` David Sterba
2025-11-13 20:14     ` Qu Wenruo
2025-11-13 21:33     ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d84d8a70-bd78-4e49-965f-150a3c231d2a@suse.com \
    --to=wqu@suse$(echo .)com \
    --cc=dsterba@suse$(echo .)com \
    --cc=dsterba@suse$(echo .)cz \
    --cc=linux-btrfs@vger$(echo .)kernel.org \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linux-next@vger$(echo .)kernel.org \
    --cc=maddy@linux$(echo .)ibm.com \
    --cc=quwenruo.btrfs@gmx$(echo .)com \
    --cc=riteshh@linux$(echo .)ibm.com \
    --cc=sfr@canb$(echo .)auug.org.au \
    --cc=venkat88@linux$(echo .)ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox