public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat•com>
To: John Fastabend <john.fastabend@gmail•com>
Cc: Andy Gospodarek <andy@greyhouse•net>,
	Saeed Mahameed <saeedm@mellanox•com>,
	"netdev@vger•kernel.org" <netdev@vger•kernel.org>,
	David Miller <davem@davemloft•net>,
	Daniel Borkmann <daniel@iogearbox•net>,
	Alexei Starovoitov <ast@fb•com>,
	brouer@redhat•com
Subject: Re: [RFC PATCH 03/12] xdp: add bpf_redirect helper function
Date: Tue, 11 Jul 2017 21:38:11 +0200	[thread overview]
Message-ID: <20170711213811.3f83405c@redhat.com> (raw)
In-Reply-To: <59651B29.5040403@gmail.com>

On Tue, 11 Jul 2017 11:38:33 -0700
John Fastabend <john.fastabend@gmail•com> wrote:

> On 07/11/2017 07:09 AM, Andy Gospodarek wrote:
> > On Mon, Jul 10, 2017 at 1:23 PM, John Fastabend
> > <john.fastabend@gmail•com> wrote:  
> >> On 07/09/2017 06:37 AM, Saeed Mahameed wrote:  
> >>>
> >>>
> >>> On 7/7/2017 8:35 PM, John Fastabend wrote:  
> >>>> This adds support for a bpf_redirect helper function to the XDP
> >>>> infrastructure. For now this only supports redirecting to the egress
> >>>> path of a port.
> >>>>
> >>>> In order to support drivers handling a xdp_buff natively this patches
> >>>> uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff
> >>>> to the specified device.
> >>>>
> >>>> If the program specifies either (a) an unknown device or (b) a device
> >>>> that does not support the operation a BPF warning is thrown and the
> >>>> XDP_ABORTED error code is returned.
> >>>>
> >>>> Signed-off-by: John Fastabend <john.fastabend@gmail•com>
> >>>> Acked-by: Daniel Borkmann <daniel@iogearbox•net>
> >>>> ---  
> >>
> >> [...]
> >>  
> >>>>
> >>>> +static int __bpf_tx_xdp(struct net_device *dev, struct xdp_buff *xdp)
> >>>> +{
> >>>> +    if (dev->netdev_ops->ndo_xdp_xmit) {
> >>>> +        dev->netdev_ops->ndo_xdp_xmit(dev, xdp);  
> >>>
> >>> Hi John,
> >>>
> >>> I have some concern here regarding synchronizing between the
> >>> redirecting device and the target device:
> >>>
> >>> if the target device's NAPI is also doing XDP_TX on the same XDP TX
> >>> ring which this NDO might be redirecting xdp packets into the same
> >>> ring, there would be a race accessing this ring resources (buffers
> >>> and descriptors). Maybe you addressed this issue in the device driver
> >>> implementation of this ndo or with some NAPI tricks/assumptions, I
> >>> guess we have the same issue for if you run the same program to
> >>> redirect traffic from multiple netdevices into one netdevice, how do
> >>> you synchronize accessing this TX ring ?  
> >>
> >> The implementation uses a per cpu TX ring to resolve these races. And
> >> the pair of driver interface API calls, xdp_do_redirect() and xdp_do_flush_map()
> >> must be completed in a single poll() handler.
> >>
> >> This comment was included in the header file to document this,
> >>
> >> /* The pair of xdp_do_redirect and xdp_do_flush_map MUST be called in the
> >>  * same cpu context. Further for best results no more than a single map
> >>  * for the do_redirect/do_flush pair should be used. This limitation is
> >>  * because we only track one map and force a flush when the map changes.
> >>  * This does not appear to be a real limitation for existing software.
> >>  */
> >>
> >> In general some documentation about implementing XDP would probably be
> >> useful to add in Documentation/networking but this IMO goes beyond just
> >> this patch series.
> >>  
> >>>
> >>> Maybe we need some clear guidelines in this ndo documentation stating
> >>> how to implement this ndo and what are the assumptions on those XDP
> >>> TX redirect rings or from which context this ndo can run.
> >>>
> >>> can you please elaborate.  
> >>
> >> I think the best implementation is to use a per cpu TX ring as I did in
> >> this series. If your device is limited by the number of queues for some
> >> reason some other scheme would need to be devised. Unfortunately, the only
> >> thing I've come up for this case (using only this series) would both impact
> >> performance and make the code complex.
> >>
> >> A nice solution might be to constrain networking "tasks" to only a subset
> >> of cores. For 64+ core systems this might be a good idea. It would allow
> >> avoiding locking using per_cpu logic but also avoid networking consuming
> >> slices of every core in the system. As core count goes up I think we will
> >> eventually need to address this.I believe Eric was thinking along these
> >> lines with his netconf talk iirc. Obviously this work is way outside the
> >> scope of this series though.  
> > 
> > I agree that it is outside the scope of this series, but I think it is
> > important to consider the impact of the output queue selection in both
> > a heterogenous and homogenous driver setup and how tx could be
> > optimized or even considered to be more reliable and I think that was
> > part of Saeed's point.
> > 
> > I got base redirect support for bnxt_en working yesterday, but for it
> > and other drivers that do not necessarily create a ring/queue per core
> > like ixgbe there is probably a bit more to work in each driver to
> > properly track output tx rings/queues than what you have done with
> > ixgbe.
> >   
> 
> The problem, in my mind at least, is if you do not have a ring per core
> how does the locking work? I don't see any good way to do this outside
> of locking which I was trying to avoid.

My solution would be to queue the XDP packets in the devmap, and then
bulk xdp_xmit them to the device on flush.  Talking a lock per bulk
amortize cost to basically nothing.  The other advantage of this is
improving the instruction-cache (re)usage.

One thing I don't like with this patchset, is this implicit requirement
it put on drivers, that they must have a HW TX-ring queue per CPU in the
system.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2017-07-11 19:38 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-07 17:34 [RFC PATCH 00/12] Implement XDP bpf_redirect vairants John Fastabend
2017-07-07 17:34 ` [RFC PATCH 01/12] ixgbe: NULL xdp_tx rings on resource cleanup John Fastabend
2017-07-07 17:35 ` [RFC PATCH 02/12] net: xdp: support xdp generic on virtual devices John Fastabend
2017-07-07 17:35 ` [RFC PATCH 03/12] xdp: add bpf_redirect helper function John Fastabend
2017-07-09 13:37   ` Saeed Mahameed
2017-07-10 17:23     ` John Fastabend
2017-07-11 14:09       ` Andy Gospodarek
2017-07-11 18:38         ` John Fastabend
2017-07-11 19:38           ` Jesper Dangaard Brouer [this message]
2017-07-12 11:00             ` Saeed Mahameed
2017-07-07 17:35 ` [RFC PATCH 04/12] xdp: sample program for new bpf_redirect helper John Fastabend
2017-07-07 17:36 ` [RFC PATCH 05/12] net: implement XDP_REDIRECT for xdp generic John Fastabend
2017-07-07 17:36 ` [RFC PATCH 06/12] ixgbe: add initial support for xdp redirect John Fastabend
2017-07-07 17:36 ` [RFC PATCH 07/12] xdp: add trace event " John Fastabend
2017-07-07 17:37 ` [RFC PATCH 08/12] bpf: add devmap, a map for storing net device references John Fastabend
2017-07-08 18:57   ` Jesper Dangaard Brouer
2017-07-07 17:37 ` [RFC PATCH 09/12] bpf: add bpf_redirect_map helper routine John Fastabend
2017-07-07 17:37 ` [RFC PATCH 10/12] xdp: Add batching support to redirect map John Fastabend
2017-07-10 17:53   ` Jesper Dangaard Brouer
2017-07-10 17:56     ` John Fastabend
2017-07-07 17:38 ` [RFC PATCH 11/12] net: add notifier hooks for devmap bpf map John Fastabend
2017-07-07 17:38 ` [RFC PATCH 12/12] xdp: bpf redirect with map sample program John Fastabend
2017-07-07 17:48 ` [RFC PATCH 00/12] Implement XDP bpf_redirect vairants John Fastabend
2017-07-08  9:46   ` David Miller
2017-07-08 19:06     ` Jesper Dangaard Brouer
2017-07-10 18:30       ` Jesper Dangaard Brouer
2017-07-11  0:59         ` John Fastabend
2017-07-11 14:23           ` Jesper Dangaard Brouer
2017-07-11 18:26             ` John Fastabend
2017-07-13 11:14               ` Jesper Dangaard Brouer
2017-07-13 16:16                 ` Jesper Dangaard Brouer
2017-07-13 17:00                   ` John Fastabend
2017-07-13 18:21                     ` David Miller
2017-07-11 15:36       ` Jesper Dangaard Brouer
2017-07-11 17:48         ` John Fastabend
2017-07-11 18:01           ` Jesper Dangaard Brouer
2017-07-11 18:29             ` John Fastabend
2017-07-11 18:44             ` Jesper Dangaard Brouer
2017-07-11 18:56               ` John Fastabend
2017-07-11 19:19                 ` Jesper Dangaard Brouer
2017-07-11 19:37                   ` John Fastabend
2017-07-16  8:23                     ` Jesper Dangaard Brouer
2017-07-17 17:04                       ` Jesse Brandeburg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170711213811.3f83405c@redhat.com \
    --to=brouer@redhat$(echo .)com \
    --cc=andy@greyhouse$(echo .)net \
    --cc=ast@fb$(echo .)com \
    --cc=daniel@iogearbox$(echo .)net \
    --cc=davem@davemloft$(echo .)net \
    --cc=john.fastabend@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=saeedm@mellanox$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox