From: Andrea della Porta <andrea.porta@suse•com>
To: Lukasz Raczylo <lukasz@raczylo•com>
Cc: netdev@vger•kernel.org, Theo Lebrun <theo.lebrun@bootlin•com>,
Andrea della Porta <andrea.porta@suse•com>,
Nicolas Ferre <nicolas.ferre@microchip•com>,
Claudiu Beznea <claudiu.beznea@tuxon•dev>,
Andrew Lunn <andrew+netdev@lunn•ch>,
"David S . Miller" <davem@davemloft•net>,
Eric Dumazet <edumazet@google•com>,
Jakub Kicinski <kuba@kernel•org>, Paolo Abeni <pabeni@redhat•com>,
linux-kernel@vger•kernel.org,
linux-arm-kernel@lists•infradead.org,
linux-rpi-kernel@lists•infradead.org
Subject: Re: [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts
Date: Thu, 28 May 2026 18:08:59 +0200 [thread overview]
Message-ID: <ahhom_0PJtqnXsv5@apocalypse> (raw)
In-Reply-To: <20260514215459.36109-4-lukasz@raczylo.com>
Hi Lukasz,
On 22:54 Thu 14 May , Lukasz Raczylo wrote:
> On PCIe-attached macb instances (BCM2712 + RP1 PCIe south bridge
> on Raspberry Pi 5 is the case I have in front of me), a TCOMP
> interrupt can be lost: the TSTART doorbell can be lost in the
> posted-write fabric (addressed by an earlier patch), or the
> descriptor TX_USED DMA write can be observed late by the driver
> (also addressed earlier). When that happens the TX ring stalls
> silently until something else kicks TSTART.
>
> Add a per-queue delayed_work that runs once per second. It
> detects forward progress on the TX completion path via a per-queue
> bool tx_stall_tail_moved that macb_tx_complete() sets when tx_tail
> advances and the watchdog clears on each tick. If the ring is
> non-empty (queue->tx_head != queue->tx_tail) and the flag is
> unset when the tick runs, the watchdog calls the existing
> macb_tx_restart() to re-assert TSTART.
>
> The bool form (rather than a tx_tail snapshot) sidesteps any
> concern about ring-index aliasing between ticks and is the form
> suggested by Phil Elwell when reviewing the same series anchored
> against rpi-6.18.y at raspberrypi/linux#7340 (merged 2026-05-08).
>
> No new recovery logic is introduced. macb_tx_restart() already
> exists in this file, is correctly locked (tx_ptr_lock, bp->lock),
> and verifies that the hardware's TBQP is behind the driver's head
> index before re-asserting TSTART. On a healthy ring it is a
> no-op at the hardware level.
>
> Cost on a healthy queue: one spin_lock_irqsave / spin_unlock and
> two field assignments per tick. The delayed_work is only
> scheduled between macb_open() and macb_close() and is cancelled
> synchronously on close.
>
> A netif_carrier_ok() gate at the top of the tick skips the stall
> check when there is no carrier (no completion is possible without
> a link), eliminating a boot-time false positive where queue->tx_head
> can advance from kernel-queued packets between macb_open() and link
> autoneg completion, while tx_tail stays unchanged because no TCOMPs
> have arrived yet.
>
> netdev_warn_ratelimited() is used rather than netdev_warn_once() so
> operators can count occurrences across the lifetime of the netdev.
>
> Link: https://github.com/cilium/cilium/issues/43198
> Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
> Link: https://github.com/raspberrypi/linux/pull/7340
> Signed-off-by: Lukasz Raczylo <lukasz@raczylo•com>
> ---
> drivers/net/ethernet/cadence/macb.h | 10 ++++
> drivers/net/ethernet/cadence/macb_main.c | 72 ++++++++++++++++++++++++
> 2 files changed, 82 insertions(+)
>
> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index ce9037f9e..75df0f75b 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -1282,6 +1282,16 @@ struct macb_queue {
> dma_addr_t tx_ring_dma;
> struct work_struct tx_error_task;
> bool txubr_pending;
> +
> + /* TX stall watchdog -- see macb_tx_stall_watchdog() in macb_main.c.
> + * tx_stall_tail_moved is set by macb_tx_complete() when tx_tail
> + * advances and cleared by the watchdog tick on each pass (both
> + * under tx_ptr_lock). Using a bool sidesteps any ring-index
> + * aliasing concern between ticks.
> + */
> + struct delayed_work tx_stall_watchdog_work;
> + bool tx_stall_tail_moved;
> +
> struct napi_struct napi_tx;
>
> dma_addr_t rx_ring_dma;
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index f7fa9e7ad..8245c69e1 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1473,6 +1473,8 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
> packets, bytes);
>
> queue->tx_tail = tail;
> + if (packets)
> + queue->tx_stall_tail_moved = true;
> if (__netif_subqueue_stopped(bp->dev, queue_index) &&
> CIRC_CNT(queue->tx_head, queue->tx_tail,
> bp->tx_ring_size) <= MACB_TX_WAKEUP_THRESH(bp))
> @@ -2003,6 +2005,70 @@ static int macb_tx_poll(struct napi_struct *napi, int budget)
> return work_done;
> }
>
> +#define MACB_TX_STALL_INTERVAL_MS 1000
> +
> +/*
> + * TX stall watchdog.
> + *
> + * Recovers from lost TCOMP interrupts on PCIe-attached macb
> + * instances. macb already has a recovery chain
> + * (txubr_pending -> macb_tx_restart()) that fires on TCOMP; if
> + * TCOMP itself is lost the TX ring stalls silently until something
> + * else kicks TSTART. This watchdog runs once per second per queue
> + * and calls macb_tx_restart() if the ring is non-empty and
> + * tx_tail has not advanced since the previous tick.
> + *
> + * Movement is tracked via the tx_stall_tail_moved boolean rather
> + * than a tx_tail snapshot, sidestepping any ring-index aliasing
> + * concern. The bool is set by macb_tx_complete() when tx_tail
> + * advances and cleared here on each tick; both writes are under
> + * tx_ptr_lock so no atomic is required.
> + *
> + * macb_tx_restart() already checks the hardware's TBQP against
> + * the driver's head index before re-asserting TSTART, so on a
> + * healthy ring this is a no-op at the hardware level. The
> + * watchdog only supplies the missing trigger.
> + */
> +static void macb_tx_stall_watchdog(struct work_struct *work)
> +{
> + struct macb_queue *queue = container_of(to_delayed_work(work),
> + struct macb_queue,
> + tx_stall_watchdog_work);
> + struct macb *bp = queue->bp;
> + unsigned int cur_tail, cur_head;
> + bool stalled = false;
> + unsigned long flags;
> +
> + if (!netif_running(bp->dev))
> + return;
> +
> + /* No carrier => no completion is possible. Skip the check
> + * but keep the watchdog ticking for when carrier comes up.
> + */
> + if (!netif_carrier_ok(bp->dev))
> + goto reschedule;
> +
> + spin_lock_irqsave(&queue->tx_ptr_lock, flags);
> + cur_tail = queue->tx_tail;
> + cur_head = queue->tx_head;
> + if (cur_head != cur_tail && !queue->tx_stall_tail_moved)
> + stalled = true;
> + queue->tx_stall_tail_moved = false;
> + spin_unlock_irqrestore(&queue->tx_ptr_lock, flags);
> +
> + if (stalled) {
> + netdev_warn_ratelimited(bp->dev,
> + "TX stall detected on queue %u (tail=%u head=%u); re-kicking TSTART\n",
> + (unsigned int)(queue - bp->queues),
> + cur_tail, cur_head);
> + macb_tx_restart(queue);
> + }
> +
> +reschedule:
> + schedule_delayed_work(&queue->tx_stall_watchdog_work,
> + msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS));
> +}
> +
> static void macb_hresp_error_task(struct work_struct *work)
> {
> struct macb *bp = from_work(bp, work, hresp_err_bh_work);
> @@ -3192,6 +3258,9 @@ static int macb_open(struct net_device *dev)
> for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
> napi_enable(&queue->napi_rx);
> napi_enable(&queue->napi_tx);
> + queue->tx_stall_tail_moved = true;
> + schedule_delayed_work(&queue->tx_stall_watchdog_work,
> + msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS));
> }
>
> macb_init_hw(bp);
> @@ -3242,6 +3311,7 @@ static int macb_close(struct net_device *dev)
> for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
> napi_disable(&queue->napi_rx);
> napi_disable(&queue->napi_tx);
> + cancel_delayed_work_sync(&queue->tx_stall_watchdog_work);
> netdev_tx_reset_queue(netdev_get_tx_queue(dev, q));
> }
>
> @@ -4804,6 +4874,8 @@ static int macb_init_dflt(struct platform_device *pdev)
> }
>
> INIT_WORK(&queue->tx_error_task, macb_tx_error_task);
> + INIT_DELAYED_WORK(&queue->tx_stall_watchdog_work,
> + macb_tx_stall_watchdog);
> q++;
> }
>
> --
> 2.54.0
>
Has this patch seen a V3 version? I guess the only thing it's needed
is something like this:
static void macb_tx_timeout(struct net_device *dev, unsigned int txqueue)
{
struct macb *bp = netdev_priv(dev);
macb_tx_restart(&bp->queues[txqueue]);
}
and then setting .ndo_tx_timeout = macb_tx_timeout in macb_netdev_ops.
Many thanks,
Andrea
next prev parent reply other threads:[~2026-05-28 16:05 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 22:38 [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after TSTART doorbell Lukasz Raczylo
2026-05-05 13:17 ` Andrea della Porta
2026-04-24 22:38 ` [RFC PATCH net-next 2/3] net: macb: re-check ISR after IER re-enable in macb_tx_poll Lukasz Raczylo
2026-04-24 22:38 ` [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as defence-in-depth safety net Lukasz Raczylo
2026-05-05 13:30 ` Andrea della Porta
2026-04-25 21:48 ` [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1 Lukasz Raczylo
2026-05-14 10:31 ` Théo Lebrun
2026-05-14 21:51 ` Lukasz Raczylo
2026-05-16 9:36 ` Théo Lebrun
2026-05-14 21:54 ` [PATCH net-next v2 " Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 1/3] net: macb: flush PCIe posted write after TSTART doorbell (PCIe-only) Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 2/3] net: macb: insert PCIe read barrier before TX completion descriptor check Lukasz Raczylo
2026-05-14 21:54 ` [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Lukasz Raczylo
2026-05-15 9:53 ` [PATCH net-next] net: macb: fix build of TX stall watchdog by replacing undefined netdev_warn_ratelimited Lukasz Raczylo
2026-05-15 12:47 ` Andrew Lunn
2026-05-15 13:08 ` [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Lukasz Raczylo
2026-05-28 16:08 ` Andrea della Porta [this message]
2026-05-29 7:46 ` Théo Lebrun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahhom_0PJtqnXsv5@apocalypse \
--to=andrea.porta@suse$(echo .)com \
--cc=andrew+netdev@lunn$(echo .)ch \
--cc=claudiu.beznea@tuxon$(echo .)dev \
--cc=davem@davemloft$(echo .)net \
--cc=edumazet@google$(echo .)com \
--cc=kuba@kernel$(echo .)org \
--cc=linux-arm-kernel@lists$(echo .)infradead.org \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=linux-rpi-kernel@lists$(echo .)infradead.org \
--cc=lukasz@raczylo$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=nicolas.ferre@microchip$(echo .)com \
--cc=pabeni@redhat$(echo .)com \
--cc=theo.lebrun@bootlin$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox