public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
* [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
@ 2026-05-17 19:28 Rosen Penev
  2026-05-17 20:24 ` Andrew Lunn
  2026-05-20 23:57 ` Jakub Kicinski
  0 siblings, 2 replies; 12+ messages in thread
From: Rosen Penev @ 2026-05-17 19:28 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER,
	open list

Collect received skbs on a local list during RX polling and pass the
completed batch to netif_receive_skb_list(). This lets the networking
stack process packets from a poll cycle in bulk instead of handing each
skb up individually.

Speedup tested with bidirectional iperf3.

Before:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.00  sec   490 MBytes   411 Mbits/sec                  sender
[  5][TX-C]   0.00-10.01  sec   488 MBytes   409 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec   176 MBytes   147 Mbits/sec  167            sender
[  7][RX-C]   0.00-10.01  sec   175 MBytes   146 Mbits/sec                  receiver

After:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.00  sec   502 MBytes   421 Mbits/sec                  sender
[  5][TX-C]   0.00-10.01  sec   501 MBytes   420 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec   212 MBytes   178 Mbits/sec  148            sender
[  7][RX-C]   0.00-10.01  sec   211 MBytes   177 Mbits/sec                  receiver

Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail•com>
---
 drivers/net/ethernet/freescale/ucc_geth.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
index 7af4b5e3f38e..bce1079fc06a 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -2894,6 +2894,7 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 	u32 bd_status;
 	u8 *bdBuffer;
 	struct net_device *dev;
+	LIST_HEAD(rx_list);
 
 	ugeth_vdbg("%s: IN", __func__);
 
@@ -2934,7 +2935,7 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 
 			dev->stats.rx_bytes += length;
 			/* Send the packet up the stack */
-			netif_receive_skb(skb);
+			list_add_tail(&skb->list, &rx_list);
 		}
 
 		skb = get_new_skb(ugeth, bd);
@@ -2960,6 +2961,8 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 		bd_status = in_be32((u32 __iomem *)bd);
 	}
 
+	netif_receive_skb_list(&rx_list);
+
 	ugeth->rxBd[rxQ] = bd;
 	return howmany;
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
  2026-05-17 19:28 [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff Rosen Penev
@ 2026-05-17 20:24 ` Andrew Lunn
  2026-05-17 20:44   ` Rosen Penev
  2026-05-20 23:57 ` Jakub Kicinski
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2026-05-17 20:24 UTC (permalink / raw)
  To: Rosen Penev
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list

On Sun, May 17, 2026 at 12:28:56PM -0700, Rosen Penev wrote:
> Collect received skbs on a local list during RX polling and pass the
> completed batch to netif_receive_skb_list(). This lets the networking
> stack process packets from a poll cycle in bulk instead of handing each
> skb up individually.

So my first through was, why is the core not doing this? The core NAPI
poll code can initialise the list. netif_receive_skb() withing the
driver poll would see there is a list and append to it. And when the
poll finished the NAPI core would pass the list up the stack? Maybe
this already exists and this driver is just using the wrong API?

	Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
  2026-05-17 20:24 ` Andrew Lunn
@ 2026-05-17 20:44   ` Rosen Penev
  2026-05-17 21:01     ` Andrew Lunn
  0 siblings, 1 reply; 12+ messages in thread
From: Rosen Penev @ 2026-05-17 20:44 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list

On Sun, May 17, 2026 at 1:24 PM Andrew Lunn <andrew@lunn•ch> wrote:
>
> On Sun, May 17, 2026 at 12:28:56PM -0700, Rosen Penev wrote:
> > Collect received skbs on a local list during RX polling and pass the
> > completed batch to netif_receive_skb_list(). This lets the networking
> > stack process packets from a poll cycle in bulk instead of handing each
> > skb up individually.
>
> So my first through was, why is the core not doing this? The core NAPI
> poll code can initialise the list. netif_receive_skb() withing the
> driver poll would see there is a list and append to it. And when the
> poll finished the NAPI core would pass the list up the stack? Maybe
> this already exists and this driver is just using the wrong API?
I do not know. I know several drivers are already using
netif_receive_skb_list, some even which support hardware checksumming.
See 0a25d92c6f4facaf2852f1aac4cebfe01dd57a91

The core seems to use netif_receive_skb_list_internal. I do not know
the details.

Anyway, the performance difference is real.
>
>         Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
  2026-05-17 20:44   ` Rosen Penev
@ 2026-05-17 21:01     ` Andrew Lunn
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Lunn @ 2026-05-17 21:01 UTC (permalink / raw)
  To: Rosen Penev
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list

On Sun, May 17, 2026 at 01:44:40PM -0700, Rosen Penev wrote:
> On Sun, May 17, 2026 at 1:24 PM Andrew Lunn <andrew@lunn•ch> wrote:
> >
> > On Sun, May 17, 2026 at 12:28:56PM -0700, Rosen Penev wrote:
> > > Collect received skbs on a local list during RX polling and pass the
> > > completed batch to netif_receive_skb_list(). This lets the networking
> > > stack process packets from a poll cycle in bulk instead of handing each
> > > skb up individually.
> >
> > So my first through was, why is the core not doing this? The core NAPI
> > poll code can initialise the list. netif_receive_skb() withing the
> > driver poll would see there is a list and append to it. And when the
> > poll finished the NAPI core would pass the list up the stack? Maybe
> > this already exists and this driver is just using the wrong API?
> I do not know. I know several drivers are already using
> netif_receive_skb_list, some even which support hardware checksumming.
> See 0a25d92c6f4facaf2852f1aac4cebfe01dd57a91
> 
> The core seems to use netif_receive_skb_list_internal. I do not know
> the details.
> 
> Anyway, the performance difference is real.

I'm not disagreeing with that. But can a similar performance
difference be made for all drivers by doing this is the core?

That is the interesting question.

     Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
  2026-05-17 19:28 [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff Rosen Penev
  2026-05-17 20:24 ` Andrew Lunn
@ 2026-05-20 23:57 ` Jakub Kicinski
  2026-05-21  0:39   ` Rosen Penev
  1 sibling, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2026-05-20 23:57 UTC (permalink / raw)
  To: Rosen Penev
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list

On Sun, 17 May 2026 12:28:56 -0700 Rosen Penev wrote:
> Collect received skbs on a local list during RX polling and pass the
> completed batch to netif_receive_skb_list(). This lets the networking
> stack process packets from a poll cycle in bulk instead of handing each
> skb up individually.

GRO should be even better.

> Speedup tested with bidirectional iperf3.

Please mention the platform / board as well.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
  2026-05-20 23:57 ` Jakub Kicinski
@ 2026-05-21  0:39   ` Rosen Penev
  2026-05-21  0:45     ` Jakub Kicinski
  2026-05-21 13:41     ` Eric Dumazet
  0 siblings, 2 replies; 12+ messages in thread
From: Rosen Penev @ 2026-05-21  0:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list

On Wed, May 20, 2026 at 4:57 PM Jakub Kicinski <kuba@kernel•org> wrote:
>
> On Sun, 17 May 2026 12:28:56 -0700 Rosen Penev wrote:
> > Collect received skbs on a local list during RX polling and pass the
> > completed batch to netif_receive_skb_list(). This lets the networking
> > stack process packets from a poll cycle in bulk instead of handing each
> > skb up individually.
>
> GRO should be even better.
GRO will result in slower routing performance because there is no
hardware checksum.
>
> > Speedup tested with bidirectional iperf3.
>
> Please mention the platform / board as well.
Will do.
> --
> pw-bot: cr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
  2026-05-21  0:39   ` Rosen Penev
@ 2026-05-21  0:45     ` Jakub Kicinski
  2026-05-21  0:54       ` Rosen Penev
  2026-05-21 12:55       ` LEROY Christophe
  2026-05-21 13:41     ` Eric Dumazet
  1 sibling, 2 replies; 12+ messages in thread
From: Jakub Kicinski @ 2026-05-21  0:45 UTC (permalink / raw)
  To: Rosen Penev
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list

On Wed, 20 May 2026 17:39:41 -0700 Rosen Penev wrote:
> > On Sun, 17 May 2026 12:28:56 -0700 Rosen Penev wrote:  
> > > Collect received skbs on a local list during RX polling and pass the
> > > completed batch to netif_receive_skb_list(). This lets the networking
> > > stack process packets from a poll cycle in bulk instead of handing each
> > > skb up individually.  
> >
> > GRO should be even better.  
> GRO will result in slower routing performance because there is no
> hardware checksum.

Mention this in the commit message too.
Network adapters without checksum offload are pretty rare these days.
Speaking of being old, do you know if this driver is used in practice?
Maybe we can delete it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
  2026-05-21  0:45     ` Jakub Kicinski
@ 2026-05-21  0:54       ` Rosen Penev
  2026-05-21 12:55       ` LEROY Christophe
  1 sibling, 0 replies; 12+ messages in thread
From: Rosen Penev @ 2026-05-21  0:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list

On Wed, May 20, 2026 at 5:45 PM Jakub Kicinski <kuba@kernel•org> wrote:
>
> On Wed, 20 May 2026 17:39:41 -0700 Rosen Penev wrote:
> > > On Sun, 17 May 2026 12:28:56 -0700 Rosen Penev wrote:
> > > > Collect received skbs on a local list during RX polling and pass the
> > > > completed batch to netif_receive_skb_list(). This lets the networking
> > > > stack process packets from a poll cycle in bulk instead of handing each
> > > > skb up individually.
> > >
> > > GRO should be even better.
> > GRO will result in slower routing performance because there is no
> > hardware checksum.
>
> Mention this in the commit message too.
Will do.
> Network adapters without checksum offload are pretty rare these days.
Qualcomm continues to make adapters like these.
> Speaking of being old, do you know if this driver is used in practice?
Yes. In OpenWrt with kernel 6.18.
> Maybe we can delete it.
Way too early. I've tried previously to clean it up some but got rejected.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
  2026-05-21  0:45     ` Jakub Kicinski
  2026-05-21  0:54       ` Rosen Penev
@ 2026-05-21 12:55       ` LEROY Christophe
  1 sibling, 0 replies; 12+ messages in thread
From: LEROY Christophe @ 2026-05-21 12:55 UTC (permalink / raw)
  To: Jakub Kicinski, Rosen Penev
  Cc: netdev@vger•kernel.org, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list

Hi Jakub,

Le 21/05/2026 à 02:45, Jakub Kicinski a écrit :
> On Wed, 20 May 2026 17:39:41 -0700 Rosen Penev wrote:
>>> On Sun, 17 May 2026 12:28:56 -0700 Rosen Penev wrote:
>>>> Collect received skbs on a local list during RX polling and pass the
>>>> completed batch to netif_receive_skb_list(). This lets the networking
>>>> stack process packets from a poll cycle in bulk instead of handing each
>>>> skb up individually.
>>>
>>> GRO should be even better.
>> GRO will result in slower routing performance because there is no
>> hardware checksum.
> 
> Mention this in the commit message too.
> Network adapters without checksum offload are pretty rare these days.
> Speaking of being old, do you know if this driver is used in practice?
> Maybe we can delete it.
> 

That's way too early to remove that driver.

UCC is what provides Ethernet connectivity in the powerpc MPC83xx CPU 
family. This family has just been declared End Of Life by January this 
year with a Last Time Buy Date 30 Jul 2026 and Last Time Delivery Date 
by 30 Apr 2027. We have that CPU on hundreds of boards spread all over 
Europe and have to maintain those systems for the next 10 to 15 years if 
not even more.

So we can maybe reconsider removing that driver by 2040 but unlikely before.

Christophe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
  2026-05-21  0:39   ` Rosen Penev
  2026-05-21  0:45     ` Jakub Kicinski
@ 2026-05-21 13:41     ` Eric Dumazet
  2026-05-21 23:28       ` Rosen Penev
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2026-05-21 13:41 UTC (permalink / raw)
  To: Rosen Penev
  Cc: Jakub Kicinski, netdev, Andrew Lunn, David S. Miller, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list

On Wed, May 20, 2026 at 5:39 PM Rosen Penev <rosenp@gmail•com> wrote:
>
> On Wed, May 20, 2026 at 4:57 PM Jakub Kicinski <kuba@kernel•org> wrote:
> >
> > On Sun, 17 May 2026 12:28:56 -0700 Rosen Penev wrote:
> > > Collect received skbs on a local list during RX polling and pass the
> > > completed batch to netif_receive_skb_list(). This lets the networking
> > > stack process packets from a poll cycle in bulk instead of handing each
> > > skb up individually.
> >
> > GRO should be even better.
> GRO will result in slower routing performance because there is no
> hardware checksum.

Then provide a knob or something, instead of trying to avoid GRO.

For end hosts (forwarding not enabled), checksum will need to be
computed anyway.
GRO should be faster for them.

Note that GRO also uses netif_receive_skb_list_internal()

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
  2026-05-21 13:41     ` Eric Dumazet
@ 2026-05-21 23:28       ` Rosen Penev
  2026-05-22  4:44         ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Rosen Penev @ 2026-05-21 23:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jakub Kicinski, netdev, Andrew Lunn, David S. Miller, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list

On Thu, May 21, 2026 at 6:41 AM Eric Dumazet <edumazet@google•com> wrote:
>
> On Wed, May 20, 2026 at 5:39 PM Rosen Penev <rosenp@gmail•com> wrote:
> >
> > On Wed, May 20, 2026 at 4:57 PM Jakub Kicinski <kuba@kernel•org> wrote:
> > >
> > > On Sun, 17 May 2026 12:28:56 -0700 Rosen Penev wrote:
> > > > Collect received skbs on a local list during RX polling and pass the
> > > > completed batch to netif_receive_skb_list(). This lets the networking
> > > > stack process packets from a poll cycle in bulk instead of handing each
> > > > skb up individually.
> > >
> > > GRO should be even better.
> > GRO will result in slower routing performance because there is no
> > hardware checksum.
>
> Then provide a knob or something, instead of trying to avoid GRO.
>
> For end hosts (forwarding not enabled), checksum will need to be
> computed anyway.
> GRO should be faster for them.
>
> Note that GRO also uses netif_receive_skb_list_internal()
so you recommend switching to napi_gro_receive even though there's no
RX hardware checksum?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
  2026-05-21 23:28       ` Rosen Penev
@ 2026-05-22  4:44         ` Eric Dumazet
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2026-05-22  4:44 UTC (permalink / raw)
  To: Rosen Penev
  Cc: Jakub Kicinski, netdev, Andrew Lunn, David S. Miller, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list

On Thu, May 21, 2026 at 4:29 PM Rosen Penev <rosenp@gmail•com> wrote:
>
> On Thu, May 21, 2026 at 6:41 AM Eric Dumazet <edumazet@google•com> wrote:
> >
> > On Wed, May 20, 2026 at 5:39 PM Rosen Penev <rosenp@gmail•com> wrote:
> > >
> > > On Wed, May 20, 2026 at 4:57 PM Jakub Kicinski <kuba@kernel•org> wrote:
> > > >
> > > > On Sun, 17 May 2026 12:28:56 -0700 Rosen Penev wrote:
> > > > > Collect received skbs on a local list during RX polling and pass the
> > > > > completed batch to netif_receive_skb_list(). This lets the networking
> > > > > stack process packets from a poll cycle in bulk instead of handing each
> > > > > skb up individually.
> > > >
> > > > GRO should be even better.
> > > GRO will result in slower routing performance because there is no
> > > hardware checksum.
> >
> > Then provide a knob or something, instead of trying to avoid GRO.
> >
> > For end hosts (forwarding not enabled), checksum will need to be
> > computed anyway.
> > GRO should be faster for them.
> >
> > Note that GRO also uses netif_receive_skb_list_internal()
> so you recommend switching to napi_gro_receive even though there's no
> RX hardware checksum?

Certainly.

There is a reason we added support for sw checksum in GRO years ago.

Most linux hosts on this planet do not forward packets.
And if they do, there is a big chance the egress device supports TSO
or tx checksum offload.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-22  4:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-17 19:28 [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff Rosen Penev
2026-05-17 20:24 ` Andrew Lunn
2026-05-17 20:44   ` Rosen Penev
2026-05-17 21:01     ` Andrew Lunn
2026-05-20 23:57 ` Jakub Kicinski
2026-05-21  0:39   ` Rosen Penev
2026-05-21  0:45     ` Jakub Kicinski
2026-05-21  0:54       ` Rosen Penev
2026-05-21 12:55       ` LEROY Christophe
2026-05-21 13:41     ` Eric Dumazet
2026-05-21 23:28       ` Rosen Penev
2026-05-22  4:44         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox