public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed
From: Ding Hui <dinghui1111@163•com>
To: pabeni@redhat•com
Cc: alexandre.torgue@foss•st.com, andrew+netdev@lunn•ch,
	andrew@lunn•ch, davem@davemloft•net, dinghui1111@163•com,
	dinghui@lixiang•com, edumazet@google•com, kuba@kernel•org,
	linux-arm-kernel@lists•infradead.org,
	linux-kernel@vger•kernel.org,
	linux-stm32@st-md-mailman•stormreply.com, liuxuanjun@lixiang•com,
	maxime.chevallier@bootlin•com, mcoquelin.stm32@gmail•com,
	netdev@vger•kernel.org, rmk+kernel@armlinux•org.uk,
	xiasanbo@lixiang•com, yangchen11@lixiang•com
Subject: Re:Re: [PATCH v2] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers
Date: Tue,  2 Jun 2026 17:28:17 +0800	[thread overview]
Message-ID: <20260602092817.929979-1-dinghui1111@163.com> (raw)
In-Reply-To: <20260528120203.245762-1-pabeni@redhat.com>

At 2026-05-28 20:02:02, "Paolo Abeni" <pabeni@redhat•com> wrote:

>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index 3591755ea30b..0dc27d8c66a0 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -1642,6 +1642,79 @@ static void stmmac_clear_descriptors(struct stmmac_priv *priv,
>>  		stmmac_clear_tx_descriptors(priv, dma_conf, queue);
>>  }
>>  
>> +/**
>> + * stmmac_reinit_rx_descriptors - re-program RX descriptors from existing
>> + *				   buffers (allocation-free)
>
>[ ... ]
>
>> +static void stmmac_reinit_rx_descriptors(struct stmmac_priv *priv,
>> +					 struct stmmac_dma_conf *dma_conf,
>> +					 u32 queue)
>> +{
>> +	struct stmmac_rx_queue *rx_q = &dma_conf->rx_queue[queue];
>> +	int i;
>> +
>> +	for (i = 0; i < dma_conf->dma_rx_size; i++) {
>> +		struct stmmac_rx_buffer *buf = &rx_q->buf_pool[i];
>> +		struct dma_desc *p = stmmac_get_rx_desc(priv, rx_q, i);
>> +
>> +		if (rx_q->xsk_pool) {
>> +			dma_addr_t dma_addr;
>> +
>> +			/* The XSK pool may not be fully populated (e.g.
>> +			 * xdpsock TX-only); skip empty slots.
>> +			 */
>> +			if (!buf->xdp)
>> +				continue;
>
>[High]
>Does the continue here regress to the same fatal bus error this patch
>sets out to fix?
>
>When the slot is skipped, its descriptor address fields keep whatever
>HW writeback left behind. Then stmmac_clear_descriptors() runs right
>after stmmac_reinit_rx_descriptors() in stmmac_resume() and unconditionally
>re-arms OWN on every slot. For dwmac4 that is dwmac4_set_rx_owner():
>
>	p->des3 |= cpu_to_le32(RDES3_OWN | RDES3_BUFFER1_VALID_ADDR |
>			       RDES3_INT_ON_COMPLETION_EN);
>
>and for enh_desc / ndesc the equivalent OR of RDES0_OWN / RDES1 fields.
>None of these clear the address fields.
>
>When stmmac_hw_setup() restarts the DMA, the controller sees an owned
>descriptor with VALID_ADDR set and a stale address — the exact failure
>mode the commit message describes. Could the skipped slots be cleared
>(stmmac_clear_desc() or zeroing des0/des1/des2) instead of left intact,
>or arranged so clear_descriptors() does not arm OWN on them?

It's right that the skipped slots must be cleared instead of left intact.
This change will be reflected in version v3.

>
>> +
>> +			dma_addr = xsk_buff_xdp_get_dma(buf->xdp);
>> +			stmmac_set_desc_addr(priv, p, dma_addr);
>> +			stmmac_set_desc_sec_addr(priv, p, 0, false);
>> +		} else {
>> +			/* Theoretically unreachable: napi_disable() in
>> +			 * stmmac_suspend() ensures all initialized slots
>> +			 * have a valid page before we get here.
>> +			 * Defensive check only.
>> +			 */
>> +			if (!buf->page)
>> +				continue;
>
>[High]
>Is the comment "Theoretically unreachable: napi_disable() in
>stmmac_suspend() ensures all initialized slots have a valid page"
>accurate?
>
>stmmac_rx() assigns buf->page = NULL at multiple points when handing the
>page on to the stack or to XDP, for example:
>
>	buf->page = NULL;
>
>in stmmac_rx() after the page has been consumed.
>
>stmmac_rx_refill() allocates with GFP_ATOMIC | __GFP_NOWARN and breaks
>on allocation failure without advancing dirty_rx, leaving the failing
>slot (and any subsequent slots) with buf->page == NULL:
>
>	if (!buf->page) {
>		buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp);
>		if (!buf->page)
>			break;
>		...
>	}
>
>stmmac_disable_all_queues() only does synchronize_rcu() and
>napi_disable() — it does not retry refills. Under the memory-pressure
>scenario this patch is explicitly designed to make safe, a prior
>GFP_ATOMIC failure can leave the ring with NULL-page slots at the
>moment suspend happens.
>
>If such a slot is then skipped here, the same OWN-with-stale-address
>condition described above applies on resume. Should the page_pool path
>also clear the descriptor for skipped slots, or allocate a replacement
>buffer rather than leave a stale-address descriptor for
>stmmac_clear_descriptors() to arm?
>
>> +
>> +			stmmac_set_desc_addr(priv, p, buf->addr);
>> +			stmmac_set_desc_sec_addr(priv, p, buf->sec_addr,
>> +						 priv->sph_active &&
>> +						 buf->sec_page);
>> +
>> +			if (dma_conf->dma_buf_sz == BUF_SIZE_16KiB)
>> +				stmmac_init_desc3(priv, p);
>> +		}
>> +	}
>
>[ ... ]
>
>> +}
>> +

Same as above, the skipped slots must be cleared instead of left intact.
This change will be reflected in version v3.

Thanks for the review. I will upload the v3 version soon.

Ding Hui



  reply	other threads:[~2026-06-02  9:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20260528145742eucas1p25c5d23dd93d0946689a8867fd95c5db7@eucas1p2.samsung.com>
2026-05-26  2:26 ` [PATCH v2] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers Ding Hui
2026-05-28 12:02   ` Paolo Abeni
2026-06-02  9:28     ` Ding Hui [this message]
2026-05-28 14:57   ` Jakub Raczynski
2026-05-29  7:42     ` Ding Hui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260602092817.929979-1-dinghui1111@163.com \
    --to=dinghui1111@163$(echo .)com \
    --cc=alexandre.torgue@foss$(echo .)st.com \
    --cc=andrew+netdev@lunn$(echo .)ch \
    --cc=andrew@lunn$(echo .)ch \
    --cc=davem@davemloft$(echo .)net \
    --cc=dinghui@lixiang$(echo .)com \
    --cc=edumazet@google$(echo .)com \
    --cc=kuba@kernel$(echo .)org \
    --cc=linux-arm-kernel@lists$(echo .)infradead.org \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linux-stm32@st-md-mailman$(echo .)stormreply.com \
    --cc=liuxuanjun@lixiang$(echo .)com \
    --cc=maxime.chevallier@bootlin$(echo .)com \
    --cc=mcoquelin.stm32@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=pabeni@redhat$(echo .)com \
    --cc=rmk+kernel@armlinux$(echo .)org.uk \
    --cc=xiasanbo@lixiang$(echo .)com \
    --cc=yangchen11@lixiang$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox