public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel•com>
To: Jesper Dangaard Brouer <hawk@kernel•org>
Cc: <netdev@vger•kernel.org>, <bpf@vger•kernel.org>,
	<magnus.karlsson@intel•com>, <stfomichev@gmail•com>,
	<kuba@kernel•org>, <pabeni@redhat•com>, <horms@kernel•org>,
	<bjorn@kernel•org>, <lorenzo@kernel•org>, <toke@redhat•com>,
	Liang Chen <liangchen.linux@gmail•com>,
	Yunsheng Lin <linyunsheng@huawei•com>,
	huangjie.albert <huangjie.albert@bytedance•com>
Subject: Re: [PATCH RFC net-next 0/4] xdp: reuse generic skb XDP handling for veth
Date: Thu, 28 May 2026 15:24:06 +0200	[thread overview]
Message-ID: <ahhB9ufMk2tIlaoc@boxer> (raw)
In-Reply-To: <agdjjtlXxlzy6Mx5@boxer>

On Fri, May 15, 2026 at 08:18:54PM +0200, Maciej Fijalkowski wrote:
> On Thu, May 14, 2026 at 07:13:07AM +0200, Jesper Dangaard Brouer wrote:
> > 
> > 
> > On 09/05/2026 10.48, Maciej Fijalkowski wrote:
> > > Hi,
> > > 
> > > this series is an attempt to make skb-backed XDP handling in veth use
> > > the generic skb XDP machinery instead of converting skb-backed packets
> > > into xdp_frames for XDP_TX and XDP_REDIRECT.
> > 
> > I support this idea.  Thanks for working on this Maciej.
> 
> Hi Jesper, good to read!
> 
> > 
> > I basically proposed the same back in Aug 2023 [1].  My patchset[2] was
> > motivated by a performance improvement 23.5% see [3], that comes from
> > avoiding to reallocate most veth packets when XDP is loaded.

Hi again,

see below.

> > 
> > Please read veth_benchmark04.org[4] as it documents a number of
> > pitfalls, and highlight that the main trick to avoid reallocation is
> > changing net_device needed_headroom (when XDP is loaded).
> 
> Thanks, will do
> 
> > 
> > [1] https://lore.kernel.org/all/169272715407.1975370.3989385869434330916.stgit@firesoul/
> > [2] https://lore.kernel.org/all/169272709850.1975370.16698220879817216294.stgit@firesoul/
> > [3] https://github.com/xdp-project/xdp-project/blob/main/areas/core/veth_benchmark03.org
> > [4] https://github.com/xdp-project/xdp-project/blob/main/areas/core/veth_benchmark04.org
> > 
> > 
> > > This was brought up by
> > > Jakub during review of previous work, which was focused on addressing
> > > splats from AF_XDP selftests shrinking multi-buffer packet:
> > > 
> > >    https://lore.kernel.org/bpf/20251029221315.2694841-1-maciej.fijalkowski@intel.com/
> > > 
> > > veth currently has two different XDP input paths:
> > >    - xdp_frame input, coming through ndo_xdp_xmit(), for example from
> > >      devmap redirects; and
> > >    - skb input, coming through ndo_start_xmit(), for example from the
> > >      regular networking stack, pktgen, or other skb producers.
> > > 
> > > The xdp_frame path is naturally frame-based and this series keeps it on
> > > the existing veth xdp_frame handling path. The skb path, however, is
> > > still fundamentally skb-backed, but today veth converts it into an
> > > xdp_frame for XDP_TX/XDP_REDIRECT. That conversion is awkward and has
> > > been pointed out as undesirable before, because veth has to pin skb data,
> > > construct an xdp_frame view of it, and then restore/massage XDP memory
> > > metadata around that conversion.
> > > 
> > 
> > Yes, I really dislike the veth approach of stealing the SKB "head"/data
> > page by bumping the page refcnt directly.  If is awkward as you say.
> > I would be happy to see that code improved.
> > 
> > > This series takes a different approach: skb-backed veth XDP now reuses
> > > the generic skb XDP execution path. veth provides its own xdp_buff,
> > > xdp_rxq_info and page_pool to the generic XDP helper, so the generic code
> > > can still perform the required skb COW using veth's page_pool when the
> > > skb head/frags cannot be used directly. XDP_TX then uses generic_xdp_tx()
> > > and XDP_REDIRECT uses xdp_do_generic_redirect(), while the xdp_frame
> > > input path keeps using the existing veth xdp_frame TX/redirect handling.
> > 
> > I also used this approach in my patchset, so I like it.
> > 
> > https://lore.kernel.org/all/169272715407.1975370.3989385869434330916.stgit@firesoul/
> > 
> > > The problem this series tries to address more generally is that
> > > skb-backed generic XDP can end up with memory whose provenance is not
> > > described correctly by a single queue-level MEM_TYPE_PAGE_SHARED value.
> > > When skb is COWed the underlying memory is page_pool backed but current
> > > logic does not respect it.
> > > 
> > > For that reason the series introduces MEM_TYPE_PAGE_POOL_OR_SHARED. This
> > > type is not bound to a single registered page_pool allocator. Instead,
> > > the return path inspects the individual netmem and dispatches either to
> > > the page_pool return path or to page_frag_free(). This lets generic
> > > skb-backed XDP handle mixed page_pool/page-shared memory without
> > > mutating rxq->mem.type per packet.
> > 
> > I'm unsure about the introduction of MEM_TYPE_PAGE_POOL_OR_SHARED an
> > ambiguous memory type.  In [5] I considered adding two new memory types
> > MEM_TYPE_KMALLOC_SKB and MEM_TYPE_SKB_SMALL_HEAD_CACHE that __xdp_return
> > would handle, but I labeled is as an "uncertain approach" myself.

Further thoughts on your concerns I have a feeling that your mem types
would not be needed once we start to treat skb properly from veth's POV.
__do_xdp_generic() will kfree/consume skb. we will no longer return the
underlying skb via xdp_return_frame so __xdp_return() does not have
to be taught about mem source of skbs that now live in xdp_frame.

Plus, Jakub's point still stands, in XDP generic hook __xdp_return() is
only about releasing frags as in case of skb being non-linear we will hit
skb_pp_cow_data() and get back with page_pool-backed skb.

> 
> I assume your hacks were done on top of veth without page_pool being used
> for reallocations? Otherwise I have to understand your reasoning. I had a
> bit shattered week so expect response on monday.
> 
> Just as a reminder, I need this to be fixed as currently AF_XDP test suite
> splats heavily when we encounter skb_pp_cow_data() calls and
> __xdp_return() still sees MEM_TYPE_PAGE_SHARED when packet is shrunk via
> bpf_xdp_adjust_tail().
> 
> Also seems all parties agree (Jakub's response) we should come up with
> common conditions for taking the conversion path (and bump veth hroom).
> 
> To be fully transparent I assume we have to include Toke to discussion as
> you guys had discussion back on your RFC.
> 
> Thanks,
> Maciej
> > 
> > [5] https://github.com/xdp-project/xdp-project/blob/main/areas/core/veth_benchmark04.org#uncertain-approach
> > 
> > 
> > > The veth part also removes the old rq->xdp_mem juggling. For incoming
> > > xdp_frames, veth now uses a local rxq view whose mem.type is taken from
> > > frame->mem_type. This preserves the frame's original memory type for
> > > XDP_TX/XDP_REDIRECT without overwriting the persistent rq->xdp_rxq memory
> > > model used by skb-backed generic XDP.
> > > 
> > > One visible datapath change is that skb-backed veth XDP_TX no longer
> > > uses veth's xdp_frame bulk queue. It now follows the generic skb XDP_TX
> > > path. The xdp_frame-originated path is unchanged and still uses the
> > > existing veth bulk path. The old skb-backed path had batching, but it
> > > also paid the cost of converting skb-backed packets into xdp_frames. The
> > > new path removes that conversion and keeps skb-backed packets on the
> > > generic skb XDP path. From my local tests that consisted of
> > > pktgen + xdp_bench I did not notice any major performance regressions,
> > > however Lorenzo and Jesper might disagree here, hence the RFC status. I
> > > am fed up of internal wars with Sashiko so I would be pleased to get
> > > some human feedback.
> > > 
> > 
> > My benchmarking in above documents, showed that needed_headroom change was a
> > bigger performance boost than loosing the batching.
> > 
> > For a long time, I have considered adding batching to the generic-XDP code
> > path, simply via SKB-list trick.  I did an experiment doing this in the past
> > and my benchmarking showed 30% TX performance boost, because xmit_more takes
> > effect.  Given people are not suppose to use generic-XDP if they care about
> > performance, I never followed up.  If veth start to use this generic
> > redirect code path, then it makes sense to do this. In production we have
> > code that does XDP redirect of SKBs into veth (I have recommended to
> > redirect native xdp_frame's instead, but because traffic first need to
> > travel through some DDoS filters in iptables, it cannot be done without
> > first moving those filters into XDP).

If I provide sufficient headroom on veth side and lift
skb_head_is_locked() check at netif_receive_generic_xdp() (that is now
called from veth via __do_xdp_generic()) I do see a similiar improvement
on my side.

The counter-intuitive news is that skb batching via list does not yield
any better numbers on my side, I believe that you referred to a case where
you used real hw with xdp generic hook which had implemented skb-list
batching. I'll look a bit more onto this but feel free to share your
changes, plus I think that could be implemented as a follow-up if we agree
to move further with these changes. Just wanted to clear out the new
mem-type approach.

Thanks,
Maciej

> > 
> > 
> > > In particular, let's discuss:
> > > 
> > >    - whether MEM_TYPE_PAGE_POOL_OR_SHARED is an acceptable way to describe
> > >      skb-backed generic XDP memory that may be page_pool-backed after COW
> > >      or ordinary page-shared memory otherwise;
> > > 
> > >    - whether passing caller-provided xdp_buff/rxq/page_pool context into
> > >      the generic skb XDP helper is the right API shape;
> > > 
> > >    - whether letting veth provide its own page_pool to generic XDP is
> > >      acceptable for avoiding the old skb->xdp_frame conversion; could
> > >      veth just piggy-back on system's page_pool? even though it could,
> > >      we still need xdp_buff being passed (metadata) and other refactors
> > >      that allow veth to bump stats and do the redirect flush;
> > > 
> > >    - whether skb-backed veth XDP_TX using generic_xdp_tx(), while keeping
> > >      xdp_frame-originated traffic on the existing veth bulk path, is an
> > >      acceptable split;
> > > 
> > >    - the INDIRECT_CALL for taking the COW path; I wanted to preserve
> > >      existing behavior, but is it really needed or maybe it would be
> > >      possible to come up with conditions that would cover both generic
> > >      XDP path and veth?
> > > 
> > > FWIW I do like the end result on veth side. If I missed CCing someone,
> > > mea culpa.
> > 
> > Added Cc
> > - Liang Chen <liangchen.linux@gmail•com>
> > - Yunsheng Lin <linyunsheng@huawei•com>
> > - huangjie.albert@bytedance•com
> > 
> > > 
> > > Thanks,
> > > Maciej
> > > 
> > > Maciej Fijalkowski (4):
> > >    xdp: add mixed page_pool/page_shared memory type
> > >    xdp: return status from generic_xdp_tx()
> > >    xdp: split generic XDP skb handling
> > >    veth: use generic skb XDP handling
> > > 
> > >   drivers/net/veth.c        | 179 ++++++++------------------------------
> > >   include/linux/netdevice.h |  33 ++++++-
> > >   include/net/xdp.h         |   1 +
> > >   kernel/bpf/devmap.c       |   2 +-
> > >   net/core/dev.c            | 124 ++++++++++++++++++++------
> > >   net/core/filter.c         |   2 +-
> > >   net/core/xdp.c            |  54 ++++++++++--
> > >   7 files changed, 216 insertions(+), 179 deletions(-)
> > > 
> > 

  reply	other threads:[~2026-05-28 13:24 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09  8:48 [PATCH RFC net-next 0/4] xdp: reuse generic skb XDP handling for veth Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 1/4] xdp: add mixed page_pool/page_shared memory type Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 2/4] xdp: return status from generic_xdp_tx() Maciej Fijalkowski
2026-05-12 12:57   ` Björn Töpel
2026-05-12 17:13     ` Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 3/4] xdp: split generic XDP skb handling Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 4/4] veth: use generic skb XDP handling Maciej Fijalkowski
2026-05-12 14:32   ` Björn Töpel
2026-05-12 17:06     ` Maciej Fijalkowski
2026-05-13 11:31       ` Björn Töpel
2026-05-12 12:55 ` [PATCH RFC net-next 0/4] xdp: reuse generic skb XDP handling for veth Björn Töpel
2026-05-12 17:12   ` Maciej Fijalkowski
2026-05-14  5:13 ` Jesper Dangaard Brouer
2026-05-15 18:18   ` Maciej Fijalkowski
2026-05-28 13:24     ` Maciej Fijalkowski [this message]
2026-05-15  0:54 ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahhB9ufMk2tIlaoc@boxer \
    --to=maciej.fijalkowski@intel$(echo .)com \
    --cc=bjorn@kernel$(echo .)org \
    --cc=bpf@vger$(echo .)kernel.org \
    --cc=hawk@kernel$(echo .)org \
    --cc=horms@kernel$(echo .)org \
    --cc=huangjie.albert@bytedance$(echo .)com \
    --cc=kuba@kernel$(echo .)org \
    --cc=liangchen.linux@gmail$(echo .)com \
    --cc=linyunsheng@huawei$(echo .)com \
    --cc=lorenzo@kernel$(echo .)org \
    --cc=magnus.karlsson@intel$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=pabeni@redhat$(echo .)com \
    --cc=stfomichev@gmail$(echo .)com \
    --cc=toke@redhat$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox