From: Jakub Kicinski <kuba@kernel•org>
To: lukasz.majewski@mailbox•org
Cc: Jakub Kicinski <kuba@kernel•org>,
andrew+netdev@lunn•ch, davem@davemloft•net, edumazet@google•com,
pabeni@redhat•com, richardcochran@gmail•com, lgirdwood@gmail•com,
nathan@kernel•org, linux-kernel@vger•kernel.org,
netdev@vger•kernel.org, krzk+dt@kernel•org, imx@lists•linux.dev,
linux-arm-kernel@lists•infradead.org, robh@kernel•org
Subject: Re: [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
Date: Thu, 4 Jun 2026 19:18:57 -0700 [thread overview]
Message-ID: <20260605021857.3594621-1-kuba@kernel.org> (raw)
In-Reply-To: <20260601112437.2216043-6-lukasz.majewski@mailbox.org>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
This patch fills in the previously stubbed mtip_switch_tx() and
mtip_switch_rx() functions for the MTIP L2 switch driver, adds a small
helper mtip_atable_get_entry_port_number() that looks a MAC up in the
hardware address table, and a mtip_update_cbd() helper used to refill RX
buffer descriptors from the page_pool.
> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> index 7dfeda230e3f..212207861e13 100644
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
[ ... ]
> @@ -836,11 +873,233 @@ static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
>
> static void mtip_switch_tx(struct switch_enet_private *fep)
> {
> + struct net_device *dev;
> + unsigned short status;
> + struct sk_buff *skb;
> + struct cbd_t *bdp;
> + int index;
> +
> + spin_lock_bh(&fep->hw_lock);
> + bdp = fep->dirty_tx;
> +
> + while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
> + if (bdp == fep->cur_tx &&
> + !mtip_netif_queues_stopped(fep))
> + break;
> +
> + index = bdp - fep->tx_bd_base;
> + dma_unmap_single(&fep->pdev->dev, bdp->cbd_bufaddr,
> + MTIP_SWITCH_TX_FRSIZE, DMA_TO_DEVICE);
> + bdp->cbd_bufaddr = 0;
> + skb = fep->tx_skbuff[index];
> + dev = skb->dev;
[High]
Can mtip_switch_tx() race with mtip_adjust_link() and dereference a NULL
skb here?
mtip_adjust_link() runs in process context from phylib's state machine
and calls:
mtip_netif_stop_queues(fep);
mtip_switch_restart(...);
if (mtip_netif_queues_stopped(fep))
mtip_netif_wake_queues(fep);
mtip_switch_restart() does not appear to take fep->hw_lock, does not
disable NAPI, clears tx_skbuff[i] = NULL and resets cur_tx = dirty_tx
= base, and at the end re-enables RX/TX interrupts via ESW_IMR.
While this is in flight, both per-port queues are XOFF so
mtip_netif_queues_stopped(fep) returns true. A concurrent RX interrupt
on another CPU can schedule NAPI, which calls mtip_switch_tx() with
this state:
bdp = fep->dirty_tx; /* == base after restart */
while ((bdp->cbd_sc & BD_ENET_TX_READY) == 0) {
if (bdp == fep->cur_tx && !mtip_netif_queues_stopped(fep))
break; /* not taken: queues stopped */
...
skb = fep->tx_skbuff[index]; /* NULL */
dev = skb->dev; /* NULL deref */
mtip_switch_restart() does not reset cbd_sc, so cbd_sc & BD_ENET_TX_READY
can still be 0 from prior TX activity, which lets the loop body run.
dma_unmap_single() would also be called with the stale cbd_bufaddr.
The mtip_timeout_work() path wraps the same restart in napi_disable() /
netif_tx_lock_bh(); should the adjust_link path do the same?
> + /* Check for errors */
> + if (status & (BD_ENET_TX_HB | BD_ENET_TX_LC |
> + BD_ENET_TX_RL | BD_ENET_TX_UN |
> + BD_ENET_TX_CSL)) {
> + dev->stats.tx_errors++;
> + if (status & BD_ENET_TX_HB) /* No heartbeat */
> + dev->stats.tx_heartbeat_errors++;
> + if (status & BD_ENET_TX_LC) /* Late collision */
> + dev->stats.tx_window_errors++;
> + if (status & BD_ENET_TX_RL) /* Retrans limit */
> + dev->stats.tx_aborted_errors++;
> + if (status & BD_ENET_TX_UN) /* Underrun */
> + dev->stats.tx_fifo_errors++;
> + if (status & BD_ENET_TX_CSL) /* Carrier lost */
> + dev->stats.tx_carrier_errors++;
> + } else {
> + dev->stats.tx_packets++;
> + }
> +
> + if (status & BD_ENET_TX_READY)
> + dev_err_ratelimited(&fep->pdev->dev,
> + "xmit interrupt and TX_READY.\n");
[Low]
Can this dev_err_ratelimited() ever fire?
The enclosing loop is:
while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
status is only assigned in the loop header and is not re-read inside the
body, so by construction status & BD_ENET_TX_READY is always 0 when this
diagnostic is reached. Was the intent to re-read bdp->cbd_sc, or is this
leftover scaffolding that should just be removed?
> +
> + /* Deferred means some collisions occurred during transmit,
> + * but we eventually sent the packet OK.
> + */
> + if (status & BD_ENET_TX_DEF)
> + dev->stats.collisions++;
> +
> + /* Free the sk buffer associated with this last transmit */
> + dev_consume_skb_any(skb);
> + fep->tx_skbuff[index] = NULL;
[ ... ]
> + }
> + fep->dirty_tx = bdp;
> + spin_unlock_bh(&fep->hw_lock);
> }
>
> +static int mtip_update_cbd(struct switch_enet_private *fep, struct cbd_t *bdp,
> + int index)
> +{
[ ... ]
> +}
> +
[ ... ]
> static int mtip_switch_rx(struct net_device *dev, int budget)
> {
> - return -ENOMEM;
> + struct mtip_ndev_priv *priv = netdev_priv(dev);
[Low]
After this patch mtip_switch_rx() only ever returns pkt_received, which
is non-negative; the page_pool allocation failure path does
pndev->stats.rx_dropped++ and goto rx_processing_done rather than
returning -ENOMEM.
The earlier patch in the series left mtip_rx_napi() with:
pkts = mtip_switch_rx(napi->dev, budget);
if (pkts == -ENOMEM) {
napi_complete(napi);
writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF, fep->hwp + ESW_IMR);
return 0;
}
Is that branch now unreachable, and should it be removed so the contract
between mtip_switch_rx() and its caller stays consistent?
[ ... ]
> +
> + /* Get correct RX page */
> + index = bdp - fep->rx_bd_base;
> + page = fep->page[index];
> + /* Process the incoming frame */
> + pkt_len = bdp->cbd_datlen;
> +
> + dma_sync_single_for_cpu(&fep->pdev->dev, bdp->cbd_bufaddr,
> + pkt_len, DMA_FROM_DEVICE);
> + net_prefetch(page_address(page));
> + data = page_address(page);
> +
> + if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> + swap_buffer(data, pkt_len);
> +
> + eth_hdr = (struct ethhdr *)data;
> + mtip_atable_get_entry_port_number(fep, eth_hdr->h_source,
> + &rx_port);
[ ... ]
> +
> + if (mtip_update_cbd(fep, bdp, index)) {
> + pndev->stats.rx_dropped++;
> + goto rx_processing_done;
> + }
> +
> + /* The packet length includes FCS, but we don't want to
> + * include that when passing upstream as it messes up
> + * bridging applications.
> + */
> + skb = build_skb(page_address(page), PAGE_SIZE);
> + if (unlikely(!skb)) {
> + page_pool_recycle_direct(fep->page_pool, page);
> + pndev->stats.rx_dropped++;
> +
> + netdev_err_once(pndev, "build_skb failed!\n");
> + goto rx_processing_done;
> + }
> +
> + skb_put(skb, pkt_len); /* Make room */
> + skb_mark_for_recycle(skb);
> + skb->protocol = eth_type_trans(skb, pndev);
> + skb->offload_fwd_mark = fep->br_offload;
> + napi_gro_receive(&fep->napi, skb);
> +
> + pndev->stats.rx_packets++;
> + pndev->stats.rx_bytes += pkt_len;
[High]
Is the FCS actually being stripped here?
The comment above build_skb() states:
/* The packet length includes FCS, but we don't want to
* include that when passing upstream as it messes up
* bridging applications.
*/
but skb_put() and the rx_bytes accounting both use pkt_len unmodified:
skb_put(skb, pkt_len); /* Make room */
...
pndev->stats.rx_bytes += pkt_len;
The equivalent fec_enet_rx_queue_napi() in fec_main.c does
pkt_len - sub_len (with sub_len = 4 + fep->rx_shift). Combined with
mtip_enet_init() programming MCF_FEC_RCR_CRC_FWD in the RCR register so
the CRC is forwarded into the RX buffer, every skb handed to
napi_gro_receive() ends up with 4 trailing FCS bytes and rx_bytes is
overcounted by 4 per frame. Should pkt_len be reduced by 4 (or by
4 + fep->rx_shift) before skb_put() and the stats update?
[ ... ]
parent reply other threads:[~2026-06-05 2:19 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <20260601112437.2216043-6-lukasz.majewski@mailbox.org>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260605021857.3594621-1-kuba@kernel.org \
--to=kuba@kernel$(echo .)org \
--cc=andrew+netdev@lunn$(echo .)ch \
--cc=davem@davemloft$(echo .)net \
--cc=edumazet@google$(echo .)com \
--cc=imx@lists$(echo .)linux.dev \
--cc=krzk+dt@kernel$(echo .)org \
--cc=lgirdwood@gmail$(echo .)com \
--cc=linux-arm-kernel@lists$(echo .)infradead.org \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=lukasz.majewski@mailbox$(echo .)org \
--cc=nathan@kernel$(echo .)org \
--cc=netdev@vger$(echo .)kernel.org \
--cc=pabeni@redhat$(echo .)com \
--cc=richardcochran@gmail$(echo .)com \
--cc=robh@kernel$(echo .)org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox