From: Ding Hui <dinghui1111@163•com>
To: pabeni@redhat•com
Cc: alexandre.torgue@foss•st.com, andrew+netdev@lunn•ch,
andrew@lunn•ch, davem@davemloft•net, dinghui1111@163•com,
dinghui@lixiang•com, edumazet@google•com, kuba@kernel•org,
linux-arm-kernel@lists•infradead.org,
linux-kernel@vger•kernel.org,
linux-stm32@st-md-mailman•stormreply.com, liuxuanjun@lixiang•com,
maxime.chevallier@bootlin•com, mcoquelin.stm32@gmail•com,
netdev@vger•kernel.org, rmk+kernel@armlinux•org.uk,
xiasanbo@lixiang•com, yangchen11@lixiang•com
Subject: Re:Re: [PATCH v2] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers
Date: Tue, 2 Jun 2026 17:28:17 +0800 [thread overview]
Message-ID: <20260602092817.929979-1-dinghui1111@163.com> (raw)
In-Reply-To: <20260528120203.245762-1-pabeni@redhat.com>
At 2026-05-28 20:02:02, "Paolo Abeni" <pabeni@redhat•com> wrote:
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index 3591755ea30b..0dc27d8c66a0 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -1642,6 +1642,79 @@ static void stmmac_clear_descriptors(struct stmmac_priv *priv,
>> stmmac_clear_tx_descriptors(priv, dma_conf, queue);
>> }
>>
>> +/**
>> + * stmmac_reinit_rx_descriptors - re-program RX descriptors from existing
>> + * buffers (allocation-free)
>
>[ ... ]
>
>> +static void stmmac_reinit_rx_descriptors(struct stmmac_priv *priv,
>> + struct stmmac_dma_conf *dma_conf,
>> + u32 queue)
>> +{
>> + struct stmmac_rx_queue *rx_q = &dma_conf->rx_queue[queue];
>> + int i;
>> +
>> + for (i = 0; i < dma_conf->dma_rx_size; i++) {
>> + struct stmmac_rx_buffer *buf = &rx_q->buf_pool[i];
>> + struct dma_desc *p = stmmac_get_rx_desc(priv, rx_q, i);
>> +
>> + if (rx_q->xsk_pool) {
>> + dma_addr_t dma_addr;
>> +
>> + /* The XSK pool may not be fully populated (e.g.
>> + * xdpsock TX-only); skip empty slots.
>> + */
>> + if (!buf->xdp)
>> + continue;
>
>[High]
>Does the continue here regress to the same fatal bus error this patch
>sets out to fix?
>
>When the slot is skipped, its descriptor address fields keep whatever
>HW writeback left behind. Then stmmac_clear_descriptors() runs right
>after stmmac_reinit_rx_descriptors() in stmmac_resume() and unconditionally
>re-arms OWN on every slot. For dwmac4 that is dwmac4_set_rx_owner():
>
> p->des3 |= cpu_to_le32(RDES3_OWN | RDES3_BUFFER1_VALID_ADDR |
> RDES3_INT_ON_COMPLETION_EN);
>
>and for enh_desc / ndesc the equivalent OR of RDES0_OWN / RDES1 fields.
>None of these clear the address fields.
>
>When stmmac_hw_setup() restarts the DMA, the controller sees an owned
>descriptor with VALID_ADDR set and a stale address — the exact failure
>mode the commit message describes. Could the skipped slots be cleared
>(stmmac_clear_desc() or zeroing des0/des1/des2) instead of left intact,
>or arranged so clear_descriptors() does not arm OWN on them?
It's right that the skipped slots must be cleared instead of left intact.
This change will be reflected in version v3.
>
>> +
>> + dma_addr = xsk_buff_xdp_get_dma(buf->xdp);
>> + stmmac_set_desc_addr(priv, p, dma_addr);
>> + stmmac_set_desc_sec_addr(priv, p, 0, false);
>> + } else {
>> + /* Theoretically unreachable: napi_disable() in
>> + * stmmac_suspend() ensures all initialized slots
>> + * have a valid page before we get here.
>> + * Defensive check only.
>> + */
>> + if (!buf->page)
>> + continue;
>
>[High]
>Is the comment "Theoretically unreachable: napi_disable() in
>stmmac_suspend() ensures all initialized slots have a valid page"
>accurate?
>
>stmmac_rx() assigns buf->page = NULL at multiple points when handing the
>page on to the stack or to XDP, for example:
>
> buf->page = NULL;
>
>in stmmac_rx() after the page has been consumed.
>
>stmmac_rx_refill() allocates with GFP_ATOMIC | __GFP_NOWARN and breaks
>on allocation failure without advancing dirty_rx, leaving the failing
>slot (and any subsequent slots) with buf->page == NULL:
>
> if (!buf->page) {
> buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp);
> if (!buf->page)
> break;
> ...
> }
>
>stmmac_disable_all_queues() only does synchronize_rcu() and
>napi_disable() — it does not retry refills. Under the memory-pressure
>scenario this patch is explicitly designed to make safe, a prior
>GFP_ATOMIC failure can leave the ring with NULL-page slots at the
>moment suspend happens.
>
>If such a slot is then skipped here, the same OWN-with-stale-address
>condition described above applies on resume. Should the page_pool path
>also clear the descriptor for skipped slots, or allocate a replacement
>buffer rather than leave a stale-address descriptor for
>stmmac_clear_descriptors() to arm?
>
>> +
>> + stmmac_set_desc_addr(priv, p, buf->addr);
>> + stmmac_set_desc_sec_addr(priv, p, buf->sec_addr,
>> + priv->sph_active &&
>> + buf->sec_page);
>> +
>> + if (dma_conf->dma_buf_sz == BUF_SIZE_16KiB)
>> + stmmac_init_desc3(priv, p);
>> + }
>> + }
>
>[ ... ]
>
>> +}
>> +
Same as above, the skipped slots must be cleared instead of left intact.
This change will be reflected in version v3.
Thanks for the review. I will upload the v3 version soon.
Ding Hui
next prev parent reply other threads:[~2026-06-02 9:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20260528145742eucas1p25c5d23dd93d0946689a8867fd95c5db7@eucas1p2.samsung.com>
2026-05-26 2:26 ` [PATCH v2] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers Ding Hui
2026-05-28 12:02 ` Paolo Abeni
2026-06-02 9:28 ` Ding Hui [this message]
2026-05-28 14:57 ` Jakub Raczynski
2026-05-29 7:42 ` Ding Hui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260602092817.929979-1-dinghui1111@163.com \
--to=dinghui1111@163$(echo .)com \
--cc=alexandre.torgue@foss$(echo .)st.com \
--cc=andrew+netdev@lunn$(echo .)ch \
--cc=andrew@lunn$(echo .)ch \
--cc=davem@davemloft$(echo .)net \
--cc=dinghui@lixiang$(echo .)com \
--cc=edumazet@google$(echo .)com \
--cc=kuba@kernel$(echo .)org \
--cc=linux-arm-kernel@lists$(echo .)infradead.org \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=linux-stm32@st-md-mailman$(echo .)stormreply.com \
--cc=liuxuanjun@lixiang$(echo .)com \
--cc=maxime.chevallier@bootlin$(echo .)com \
--cc=mcoquelin.stm32@gmail$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=pabeni@redhat$(echo .)com \
--cc=rmk+kernel@armlinux$(echo .)org.uk \
--cc=xiasanbo@lixiang$(echo .)com \
--cc=yangchen11@lixiang$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox