From: Ding Hui <dinghui1111@163•com>
To: j.raczynski@samsung•com
Cc: alexandre.torgue@foss•st.com, andrew+netdev@lunn•ch,
andrew@lunn•ch, davem@davemloft•net, dinghui1111@163•com,
dinghui@lixiang•com, edumazet@google•com, kuba@kernel•org,
linux-arm-kernel@lists•infradead.org,
linux-kernel@vger•kernel.org,
linux-stm32@st-md-mailman•stormreply.com, liuxuanjun@lixiang•com,
maxime.chevallier@bootlin•com, mcoquelin.stm32@gmail•com,
netdev@vger•kernel.org, pabeni@redhat•com,
rmk+kernel@armlinux•org.uk, xiasanbo@lixiang•com,
yangchen11@lixiang•com
Subject: Re:Re: [PATCH v2] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers
Date: Fri, 29 May 2026 15:42:49 +0800 [thread overview]
Message-ID: <20260529074249.2640274-1-dinghui1111@163.com> (raw)
In-Reply-To: <ahhX4lIpHwhVekMc@AMDC4622.eu.corp.samsungelectronics.net>
At 2026-05-28 22:57:38, "Jakub Raczynski" <j.raczynski@samsung•com> wrote:
>On Tue, May 26, 2026 at 10:26:17AM +0800, Ding Hui wrote:
>> From: Ding Hui <dinghui@lixiang•com>
>> + } else {
>> + /* Theoretically unreachable: napi_disable() in
>> + * stmmac_suspend() ensures all initialized slots
>> + * have a valid page before we get here.
>> + * Defensive check only.
>> + */
>> + if (!buf->page)
>> + continue;
>> +
>> + stmmac_set_desc_addr(priv, p, buf->addr);
>> + stmmac_set_desc_sec_addr(priv, p, buf->sec_addr,
>> + priv->sph_active &&
>> + buf->sec_page);
>
>It this generally sufficient? Or, in fact, isn't that overkill?
>stmmac_rx_refill() generally does a bit more preparation of descriptors.
You are right that stmmac_rx_refill() does more work — it allocates new
pages and maps them. The key difference here is that in v2 we intentionally
keep all RX buffers alive across suspend/resume, so no allocation is needed.
The only thing that needs to be restored is the buffer address fields in the
descriptors, which were overwritten by hardware write-back.
>The issue seems to be that during suspend there is mismatch,
>caused by writeback format, between rx_dirty and rx_cur pointers and
>there is bad handling of this case, since there is no verification
>of leftover stuff and there will be leftover bad address crashing platform.
>So stmmac needs to refill/reinit descriptors that were consumed but not
>refilled. So isn't going through whole dma_rx_size overkill?
>Wouldn't it be better to iterate over buffer from cur_rx as long as descriptors
>are 0 and only apply refill to those corrupted?
Actually, The hardware may have consumed additional descriptors in the window
between stmmac_disable_all_queues() and stmmac_stop_all_dma(), so cur_rx can lag
behind the hardware's actual position. So maybe not only the descriptors between
rx_dirty and rx_cur pointers need to be refilled.
You are right that we should only refill the consumed descriptors. But checking the
OWN bit requires a new lightweight get_rx_owner() helper across all descriptor
variants (dwmac4, dwxgmac2, norm_desc, enh_desc), adding complexity for marginal gain.
>Could you paste panic that occurs during this issue?
>You mention "fatal bus error" which I would assume is system panic?
Apologies for the misleading wording — this does not cause a kernel panic.
The issue manifests as a Fatal Bus Error interrupt on the DMA controller.
Taking XGMAC as an example, dwxgmac2_dma_interrupt() detects XGMAC_FBE,
increments fatal_bus_error_irq, and returns tx_hard_error, which triggers
stmmac_tx_err() to stop and reset the TX DMA channel. But this has no effect
for the RX DMA engine (may be we should reset RX DMA here). The practical effect
is that the RX DMA engine halts after dereferencing the invalid buffer address,
and the network interface becomes non-functional after resume — no packets can be
received until the driver is reloaded or the device is re-probed.
To reproduce the issue on my platform:
1. Connect the DUT and a PC, configure IP addresses so they can ping
each other (e.g. DUT: 192.168.1.1, PC: 192.168.1.100).
2. On the PC, start an iperf3 server:
iperf3 -s
3. On the DUT, start a high-rate reverse UDP stream to keep the RX DMA
busy during suspend:
iperf3 -c 192.168.1.100 -u -b 900M -R -t 0
4. While iperf3 is running, trigger a suspend/resume cycle on the DUT.
5. After resume, check the fatal_bus_error_irq counter:
ethtool -S <iface> | grep fatal_bus_error_irq
Without this fix the counter increments and the interface stops
receiving packets. With this fix the counter stays at zero and
normal operation resumes.
I will update the commit message to clarify "fatal bus error causing RX
DMA to stop".
Thanks for the review.
Ding Hui
prev parent reply other threads:[~2026-05-29 7:45 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20260528145742eucas1p25c5d23dd93d0946689a8867fd95c5db7@eucas1p2.samsung.com>
2026-05-26 2:26 ` [PATCH v2] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers Ding Hui
2026-05-28 12:02 ` Paolo Abeni
2026-06-02 9:28 ` Ding Hui
2026-05-28 14:57 ` Jakub Raczynski
2026-05-29 7:42 ` Ding Hui [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260529074249.2640274-1-dinghui1111@163.com \
--to=dinghui1111@163$(echo .)com \
--cc=alexandre.torgue@foss$(echo .)st.com \
--cc=andrew+netdev@lunn$(echo .)ch \
--cc=andrew@lunn$(echo .)ch \
--cc=davem@davemloft$(echo .)net \
--cc=dinghui@lixiang$(echo .)com \
--cc=edumazet@google$(echo .)com \
--cc=j.raczynski@samsung$(echo .)com \
--cc=kuba@kernel$(echo .)org \
--cc=linux-arm-kernel@lists$(echo .)infradead.org \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=linux-stm32@st-md-mailman$(echo .)stormreply.com \
--cc=liuxuanjun@lixiang$(echo .)com \
--cc=maxime.chevallier@bootlin$(echo .)com \
--cc=mcoquelin.stm32@gmail$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=pabeni@redhat$(echo .)com \
--cc=rmk+kernel@armlinux$(echo .)org.uk \
--cc=xiasanbo@lixiang$(echo .)com \
--cc=yangchen11@lixiang$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox