* Re: [net-next v24 2/7] net: mtip: The L2 switch driver for imx287
[not found] <20260601112437.2216043-3-lukasz.majewski@mailbox.org>
@ 2026-06-05 2:18 ` Jakub Kicinski
0 siblings, 0 replies; only message in thread
From: Jakub Kicinski @ 2026-06-05 2:18 UTC (permalink / raw)
To: lukasz.majewski
Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni,
richardcochran, lgirdwood, nathan, linux-kernel, netdev, krzk+dt,
imx, linux-arm-kernel, robh
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mtip: The L2 switch driver for imx287
This patch adds a new driver for the MoreThanIP L2 switch embedded in
the i.MX287 SoC. The two-port switch sits between uDMA[01] and the
MAC-NET[01] blocks and can be used as an alternative to the existing
FEC driver.
A few questions inline below.
> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> new file mode 100644
> index 000000000000..1d1005a42306
> --- /dev/null
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
[ ... ]
> +static void mtip_switch_restart(struct net_device *dev, int duplex0,
> + int duplex1)
> +{
> + struct mtip_ndev_priv *priv = netdev_priv(dev);
> + struct switch_enet_private *fep = priv->fep;
> + int i;
> +
> + /* Perform a reset. We should wait for this. */
> + writel(MCF_ESW_MODE_SW_RST, fep->hwp + ESW_MODE);
[High]
Should this routine hold a lock? fep->hw_lock is declared and
initialised in mtip_sw_probe() with the comment "/* Lock for HW
configuration */" but neither mtip_switch_restart() nor mtip_adjust_link()
acquire it.
The two ports each get their own phy_device via of_phy_connect(), and
phylib serialises adjust_link callbacks per phy_device, not across them.
If both ports' state machines fire mtip_adjust_link() at the same time,
both CPUs end up inside mtip_switch_restart() in parallel, racing on
ESW_MODE_SW_RST, ESW_BMPC, ESW_ISR, ESW_RDSR/ESW_TDSR, fep->cur_tx /
cur_rx / dirty_tx, and fep->full_duplex[].
> + /* Reset SKB transmit buffers */
> + for (i = 0; i <= TX_RING_MOD_MASK; i++) {
> + if (fep->tx_skbuff[i]) {
> + dev_kfree_skb_any(fep->tx_skbuff[i]);
> + fep->tx_skbuff[i] = NULL;
> + }
> + }
[High]
Can two concurrent invocations of this loop double-free a tx_skbuff[i]
entry? Both CPUs read the same non-NULL pointer, both call
dev_kfree_skb_any() on it, and only then race on the NULL store.
> +
> + fep->full_duplex[0] = duplex0;
> + fep->full_duplex[1] = duplex1;
> +
> + mtip_configure_enet_mii(fep, 1);
> + mtip_configure_enet_mii(fep, 2);
> +
> + /* And last, enable the transmit and receive processing */
> + writel(MCF_ESW_RDAR_R_DES_ACTIVE, fep->hwp + ESW_RDAR);
> +
> + /* Enable interrupts we wish to service */
> + writel(0xFFFFFFFF, fep->hwp + ESW_ISR);
> + writel(MCF_ESW_IMR_TXF | MCF_ESW_IMR_RXF,
> + fep->hwp + ESW_IMR);
> +
> + mtip_config_switch(fep);
> +}
> +
> +static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
> +{
> + struct switch_enet_private *fep = ptr_fep;
> + irqreturn_t ret = IRQ_NONE;
> + u32 int_events, int_imask;
> +
> + /* Get the interrupt events that caused us to be here */
> + int_events = readl(fep->hwp + ESW_ISR);
> + writel(int_events, fep->hwp + ESW_ISR);
> +
> + if (int_events & (MCF_ESW_ISR_RXF | MCF_ESW_ISR_TXF)) {
> + ret = IRQ_HANDLED;
> + /* Disable the RX interrupt */
> + if (napi_schedule_prep(&fep->napi)) {
> + int_imask = readl(fep->hwp + ESW_IMR);
> + int_imask &= ~MCF_ESW_IMR_RXF;
> + writel(int_imask, fep->hwp + ESW_IMR);
> + __napi_schedule(&fep->napi);
> + }
> + }
> +
> + return ret;
> +}
[Medium]
Should this also clear MCF_ESW_IMR_TXF here? The condition above
schedules NAPI on either RXF or TXF, and the NAPI poll re-enables both
bits together:
writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF,
fep->hwp + ESW_IMR);
With only RXF masked here, every TX completion while NAPI is running
re-asserts the IRQ line; mtip_interrupt() runs again, napi_schedule_prep()
returns false because NAPI is already scheduled, and the handler returns
IRQ_HANDLED having done nothing useful. Once mtip_switch_tx() is filled
in by the later patch in the series, would this turn into a sustained
IRQ storm on TX-busy workloads?
[ ... ]
> +static int mtip_mii_init(struct switch_enet_private *fep,
> + struct platform_device *pdev)
> +{
> + struct device_node *node;
> + int err = -ENXIO;
> +
> + /* Clear MMFR to avoid to generate MII event by writing MSCR.
> + * MII event generation condition:
> + * - writing MSCR:
> + * - mmfr[31:0]_not_zero & mscr[7:0]_is_zero &
> + * mscr_reg_data_in[7:0] != 0
> + * - writing MMFR:
> + * - mscr[7:0]_not_zero
> + */
> + writel(0, fep->hwp + MCF_FEC_MII_DATA);
> + /* Clear any pending transaction complete indication */
> + writel(MCF_ENET_MII, fep->enet_addr + MCF_FEC_EIR);
[Medium]
Is the base address for the MMFR clear correct here?
fep->hwp is the switch base (fep->enet_addr + ENET_SWI_PHYS_ADDR_OFFSET,
i.e. enet_addr + 0x8000), while MCF_FEC_MII_DATA (0x040) is an offset
within the MAC register space. At offset 0x040 from fep->hwp the header
defines:
#define ESW_MCR (0x040)
So this writel() goes to the switch's Mirror Configuration Register
rather than to MMFR. The very next line correctly uses fep->enet_addr +
MCF_FEC_EIR for the EIR clear, and mtip_mdio_read()/mtip_mdio_write()
also use fep->enet_addr + MCF_FEC_MII_DATA for the actual MDIO accesses.
Should this writel() also use fep->enet_addr?
[ ... ]
> +static int mtip_open(struct net_device *dev)
> +{
> + struct mtip_ndev_priv *priv = netdev_priv(dev);
> + struct switch_enet_private *fep = priv->fep;
> + int ret, port_idx = priv->portnum - 1;
> +
> + if (fep->usage_count == 0) {
> + ret = clk_enable(fep->clk_ipg);
[ ... ]
> + ret = mtip_mii_probe(dev);
> + if (ret)
> + goto mtip_mii_probe_err;
> +
> + phy_start(fep->phy_dev[port_idx]);
> +
> + if (fep->usage_count == 0) {
> + napi_enable(&fep->napi);
> + mtip_switch_restart(dev, 1, 1);
> +
> + netif_start_queue(dev);
> + }
[High]
Can phy_start() race with the mtip_switch_restart() call below it?
Once phy_start() returns, the phylib state machine workqueue may invoke
mtip_adjust_link() on another CPU, which calls mtip_switch_restart() on
link-up. That can run concurrently with the mtip_switch_restart(dev,
1, 1) here, and both paths walk fep->tx_skbuff[] freeing entries and
reprogram switch HW registers without synchronisation.
Would it be safer to start the PHY only after the switch HW has been
configured (and to take a lock around mtip_switch_restart())?
[ ... ]
> +static int mtip_parse_of(struct switch_enet_private *fep,
> + struct device_node *np)
> +{
> + struct device_node *p;
> + unsigned int port_num;
> + int ret = 0;
> +
> + p = of_get_child_by_name(np, "ethernet-ports");
> +
> + for_each_available_child_of_node_scoped(p, port) {
> + if (of_property_read_u32(port, "reg", &port_num))
> + continue;
[Medium]
If of_get_child_by_name() returns NULL (no "ethernet-ports" node), the
for_each_available_child_of_node_scoped() loop is skipped entirely, ret
stays 0, and mtip_parse_of() returns success with no ports parsed.
Similarly, if every child fails the "reg" read above, the continue keeps
ret at 0.
After such a "success" return, mtip_ndev_init() calls
alloc_netdev(sizeof(struct mtip_ndev_priv), fep->ndev_name[i],
NET_NAME_USER, ether_setup) with fep->ndev_name[i] still NULL.
alloc_netdev_mqs() then calls strscpy_pad() on a NULL name and oopses.
Should missing "ethernet-ports", missing "reg", or having no parsed
ports be treated as errors here?
[ ... ]
> + fep->phy_np[port_num - 1] = of_parse_phandle(port,
> + "phy-handle", 0);
> + if (!fep->phy_np[port_num - 1]) {
> + dev_err(&fep->pdev->dev,
> + "%s: PHY handle not defined!\n", __func__);
> + ret = -ENODEV;
> + goto of_get_err;
> + }
> + }
> +
> + of_get_err:
> + mtip_parse_of_cleanup(fep);
> + of_node_put(p);
> +
> + return ret;
> +}
[High]
Does the success path fall through into of_get_err: unconditionally?
There is no return ret; (or success label) before the of_get_err label,
so on a successful parse mtip_parse_of_cleanup(fep) runs and calls
of_node_put() on every fep->phy_np[i] just acquired by
of_parse_phandle(). mtip_parse_of_cleanup() also does not NULL the
slots:
static void mtip_parse_of_cleanup(struct switch_enet_private *fep)
{
int i;
for (i = 0; i < SWITCH_EPORT_NUMBER; i++)
if (fep->phy_np[i])
of_node_put(fep->phy_np[i]);
}
After probe, mtip_open()->mtip_mii_probe() then passes the same
fep->phy_np[i] pointers to of_phy_connect() — does this dereference an
OF node the driver no longer holds a reference on?
The error and remove paths (the of_free_references label, the explicit
cleanup after devm_request_irq() failure, and mtip_sw_remove() ->
mtip_parse_of_cleanup()) call mtip_parse_of_cleanup() again on the same
non-NULL pointers, which would underflow the OF refcount on each.
Should there be a return ret; before of_get_err: and a NULL-out of the
slot in mtip_parse_of_cleanup() after each of_node_put()?
[ ... ]
> +static int mtip_sw_probe(struct platform_device *pdev)
> +{
[ ... ]
> + ret = devm_request_irq(&pdev->dev, fep->irq, mtip_interrupt, 0,
> + dev_name(&pdev->dev), fep);
> + if (ret) {
> + mtip_parse_of_cleanup(fep);
> + return dev_err_probe(&pdev->dev, ret, "Could not alloc IRQ\n");
> + }
[Medium]
Can a stale IRQ here dereference an uninitialised napi struct?
netif_napi_add(dev, &fep->napi, mtip_rx_napi) is only called from
mtip_open(), so between this devm_request_irq() and the first ndo_open
fep->napi is zero-initialised (no poll, no dev, no state). Probe also
doesn't write 0 to ESW_IMR before requesting the IRQ — it relies on the
register's reset value.
If the bootloader or a warm reset left the switch with IMR set and
events pending in ESW_ISR, mtip_interrupt() can fire immediately on
enable, see RXF/TXF, call napi_schedule_prep(&fep->napi) (which sets
NAPIF_STATE_SCHED unconditionally and returns true), and queue the
zero-initialised napi struct. net_rx_action() would then call
n->poll == NULL.
Would it help to either mask all interrupts (writel(0, fep->hwp +
ESW_IMR)) before request_irq, or defer request_irq until after
netif_napi_add()?
--
pw-bot: cr
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-06-05 2:19 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260601112437.2216043-3-lukasz.majewski@mailbox.org>
2026-06-05 2:18 ` [net-next v24 2/7] net: mtip: The L2 switch driver for imx287 Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox