Re: [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver

public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel•org>
To: lukasz.majewski@mailbox•org
Cc: Jakub Kicinski <kuba@kernel•org>,
	andrew+netdev@lunn•ch, davem@davemloft•net, edumazet@google•com,
	pabeni@redhat•com, richardcochran@gmail•com, lgirdwood@gmail•com,
	nathan@kernel•org, linux-kernel@vger•kernel.org,
	netdev@vger•kernel.org, krzk+dt@kernel•org, imx@lists•linux.dev,
	linux-arm-kernel@lists•infradead.org, robh@kernel•org
Subject: Re: [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
Date: Thu,  4 Jun 2026 19:18:57 -0700	[thread overview]
Message-ID: <20260605021857.3594621-1-kuba@kernel.org> (raw)
In-Reply-To: <20260601112437.2216043-6-lukasz.majewski@mailbox.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver

This patch fills in the previously stubbed mtip_switch_tx() and
mtip_switch_rx() functions for the MTIP L2 switch driver, adds a small
helper mtip_atable_get_entry_port_number() that looks a MAC up in the
hardware address table, and a mtip_update_cbd() helper used to refill RX
buffer descriptors from the page_pool.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> index 7dfeda230e3f..212207861e13 100644
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c

[ ... ]

> @@ -836,11 +873,233 @@ static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
>  
>  static void mtip_switch_tx(struct switch_enet_private *fep)
>  {
> +	struct net_device *dev;
> +	unsigned short status;
> +	struct sk_buff *skb;
> +	struct cbd_t *bdp;
> +	int index;
> +
> +	spin_lock_bh(&fep->hw_lock);
> +	bdp = fep->dirty_tx;
> +
> +	while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
> +		if (bdp == fep->cur_tx &&
> +		    !mtip_netif_queues_stopped(fep))
> +			break;
> +
> +		index = bdp - fep->tx_bd_base;
> +		dma_unmap_single(&fep->pdev->dev, bdp->cbd_bufaddr,
> +				 MTIP_SWITCH_TX_FRSIZE, DMA_TO_DEVICE);
> +		bdp->cbd_bufaddr = 0;
> +		skb = fep->tx_skbuff[index];
> +		dev = skb->dev;

[High]
Can mtip_switch_tx() race with mtip_adjust_link() and dereference a NULL
skb here?

mtip_adjust_link() runs in process context from phylib's state machine
and calls:

    mtip_netif_stop_queues(fep);
    mtip_switch_restart(...);
    if (mtip_netif_queues_stopped(fep))
            mtip_netif_wake_queues(fep);

mtip_switch_restart() does not appear to take fep->hw_lock, does not
disable NAPI, clears tx_skbuff[i] = NULL and resets cur_tx = dirty_tx
= base, and at the end re-enables RX/TX interrupts via ESW_IMR.

While this is in flight, both per-port queues are XOFF so
mtip_netif_queues_stopped(fep) returns true. A concurrent RX interrupt
on another CPU can schedule NAPI, which calls mtip_switch_tx() with
this state:

    bdp = fep->dirty_tx;             /* == base after restart */
    while ((bdp->cbd_sc & BD_ENET_TX_READY) == 0) {
            if (bdp == fep->cur_tx && !mtip_netif_queues_stopped(fep))
                    break;           /* not taken: queues stopped */
            ...
            skb = fep->tx_skbuff[index];   /* NULL */
            dev = skb->dev;                /* NULL deref */

mtip_switch_restart() does not reset cbd_sc, so cbd_sc & BD_ENET_TX_READY
can still be 0 from prior TX activity, which lets the loop body run.
dma_unmap_single() would also be called with the stale cbd_bufaddr.

The mtip_timeout_work() path wraps the same restart in napi_disable() /
netif_tx_lock_bh(); should the adjust_link path do the same?

> +		/* Check for errors */
> +		if (status & (BD_ENET_TX_HB | BD_ENET_TX_LC |
> +				   BD_ENET_TX_RL | BD_ENET_TX_UN |
> +				   BD_ENET_TX_CSL)) {
> +			dev->stats.tx_errors++;
> +			if (status & BD_ENET_TX_HB)  /* No heartbeat */
> +				dev->stats.tx_heartbeat_errors++;
> +			if (status & BD_ENET_TX_LC)  /* Late collision */
> +				dev->stats.tx_window_errors++;
> +			if (status & BD_ENET_TX_RL)  /* Retrans limit */
> +				dev->stats.tx_aborted_errors++;
> +			if (status & BD_ENET_TX_UN)  /* Underrun */
> +				dev->stats.tx_fifo_errors++;
> +			if (status & BD_ENET_TX_CSL) /* Carrier lost */
> +				dev->stats.tx_carrier_errors++;
> +		} else {
> +			dev->stats.tx_packets++;
> +		}
> +
> +		if (status & BD_ENET_TX_READY)
> +			dev_err_ratelimited(&fep->pdev->dev,
> +					    "xmit interrupt and TX_READY.\n");

[Low]
Can this dev_err_ratelimited() ever fire?

The enclosing loop is:

    while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {

status is only assigned in the loop header and is not re-read inside the
body, so by construction status & BD_ENET_TX_READY is always 0 when this
diagnostic is reached. Was the intent to re-read bdp->cbd_sc, or is this
leftover scaffolding that should just be removed?

> +
> +		/* Deferred means some collisions occurred during transmit,
> +		 * but we eventually sent the packet OK.
> +		 */
> +		if (status & BD_ENET_TX_DEF)
> +			dev->stats.collisions++;
> +
> +		/* Free the sk buffer associated with this last transmit */
> +		dev_consume_skb_any(skb);
> +		fep->tx_skbuff[index] = NULL;

[ ... ]

> +	}
> +	fep->dirty_tx = bdp;
> +	spin_unlock_bh(&fep->hw_lock);
>  }
>  
> +static int mtip_update_cbd(struct switch_enet_private *fep, struct cbd_t *bdp,
> +			   int index)
> +{

[ ... ]

> +}
> +

[ ... ]

>  static int mtip_switch_rx(struct net_device *dev, int budget)
>  {
> -	return -ENOMEM;
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);

[Low]
After this patch mtip_switch_rx() only ever returns pkt_received, which
is non-negative; the page_pool allocation failure path does
pndev->stats.rx_dropped++ and goto rx_processing_done rather than
returning -ENOMEM.

The earlier patch in the series left mtip_rx_napi() with:

    pkts = mtip_switch_rx(napi->dev, budget);
    if (pkts == -ENOMEM) {
            napi_complete(napi);
            writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF, fep->hwp + ESW_IMR);
            return 0;
    }

Is that branch now unreachable, and should it be removed so the contract
between mtip_switch_rx() and its caller stays consistent?

[ ... ]

> +
> +		/* Get correct RX page */
> +		index = bdp - fep->rx_bd_base;
> +		page = fep->page[index];
> +		/* Process the incoming frame */
> +		pkt_len = bdp->cbd_datlen;
> +
> +		dma_sync_single_for_cpu(&fep->pdev->dev, bdp->cbd_bufaddr,
> +					pkt_len, DMA_FROM_DEVICE);
> +		net_prefetch(page_address(page));
> +		data = page_address(page);
> +
> +		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> +			swap_buffer(data, pkt_len);
> +
> +		eth_hdr = (struct ethhdr *)data;
> +		mtip_atable_get_entry_port_number(fep, eth_hdr->h_source,
> +						  &rx_port);

[ ... ]

> +
> +		if (mtip_update_cbd(fep, bdp, index)) {
> +			pndev->stats.rx_dropped++;
> +			goto rx_processing_done;
> +		}
> +
> +		/* The packet length includes FCS, but we don't want to
> +		 * include that when passing upstream as it messes up
> +		 * bridging applications.
> +		 */
> +		skb = build_skb(page_address(page), PAGE_SIZE);
> +		if (unlikely(!skb)) {
> +			page_pool_recycle_direct(fep->page_pool, page);
> +			pndev->stats.rx_dropped++;
> +
> +			netdev_err_once(pndev, "build_skb failed!\n");
> +			goto rx_processing_done;
> +		}
> +
> +		skb_put(skb, pkt_len);      /* Make room */
> +		skb_mark_for_recycle(skb);
> +		skb->protocol = eth_type_trans(skb, pndev);
> +		skb->offload_fwd_mark = fep->br_offload;
> +		napi_gro_receive(&fep->napi, skb);
> +
> +		pndev->stats.rx_packets++;
> +		pndev->stats.rx_bytes += pkt_len;

[High]
Is the FCS actually being stripped here?

The comment above build_skb() states:

    /* The packet length includes FCS, but we don't want to
     * include that when passing upstream as it messes up
     * bridging applications.
     */

but skb_put() and the rx_bytes accounting both use pkt_len unmodified:

    skb_put(skb, pkt_len);      /* Make room */
    ...
    pndev->stats.rx_bytes += pkt_len;

The equivalent fec_enet_rx_queue_napi() in fec_main.c does
pkt_len - sub_len (with sub_len = 4 + fep->rx_shift). Combined with
mtip_enet_init() programming MCF_FEC_RCR_CRC_FWD in the RCR register so
the CRC is forwarded into the RX buffer, every skb handed to
napi_gro_receive() ends up with 4 trailing FCS bytes and rx_bytes is
overcounted by 4 per frame. Should pkt_len be reduced by 4 (or by
4 + fep->rx_shift) before skb_put() and the stats update?

[ ... ]

          parent reply	other threads:[~2026-06-05  2:19 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <20260601112437.2216043-6-lukasz.majewski@mailbox.org>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260605021857.3594621-1-kuba@kernel.org \
    --to=kuba@kernel$(echo .)org \
    --cc=andrew+netdev@lunn$(echo .)ch \
    --cc=davem@davemloft$(echo .)net \
    --cc=edumazet@google$(echo .)com \
    --cc=imx@lists$(echo .)linux.dev \
    --cc=krzk+dt@kernel$(echo .)org \
    --cc=lgirdwood@gmail$(echo .)com \
    --cc=linux-arm-kernel@lists$(echo .)infradead.org \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=lukasz.majewski@mailbox$(echo .)org \
    --cc=nathan@kernel$(echo .)org \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=pabeni@redhat$(echo .)com \
    --cc=richardcochran@gmail$(echo .)com \
    --cc=robh@kernel$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox