public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox•net>
To: Jakub Kicinski <kubakici@wp•pl>
Cc: Martin KaFai Lau <kafai@fb•com>,
	netdev@vger•kernel.org, Alexei Starovoitov <ast@fb•com>,
	Brenden Blanco <bblanco@plumgrid•com>,
	David Miller <davem@davemloft•net>,
	Jesper Dangaard Brouer <brouer@redhat•com>,
	John Fastabend <john.fastabend@gmail•com>,
	Saeed Mahameed <saeedm@mellanox•com>,
	Tariq Toukan <tariqt@mellanox•com>,
	Kernel Team <kernel-team@fb•com>
Subject: Re: [PATCH v3 net-next 1/4] bpf: xdp: Allow head adjustment in XDP prog
Date: Wed, 07 Dec 2016 14:34:55 +0100	[thread overview]
Message-ID: <58480FFF.9010302@iogearbox.net> (raw)
In-Reply-To: <20161207114112.6ad86da3@jkicinski-Precision-T1700>

On 12/07/2016 12:41 PM, Jakub Kicinski wrote:
> On Wed, 07 Dec 2016 10:32:19 +0100, Daniel Borkmann wrote:
>> On 12/07/2016 06:31 AM, Martin KaFai Lau wrote:
>> [...]
>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>>> index 49a81f1fc1d6..6261157f444e 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>>> @@ -2794,6 +2794,9 @@ static int mlx4_xdp(struct net_device *dev, struct netdev_xdp *xdp)
>>>    	case XDP_QUERY_PROG:
>>>    		xdp->prog_attached = mlx4_xdp_attached(dev);
>>>    		return 0;
>>> +	case XDP_QUERY_FEATURES:
>>> +		xdp->features = 0;
>>> +		return 0;
>>>    	default:
>>>    		return -EINVAL;
>>>    	}
>> [...]
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 1ff5ea6e1221..786ad7c67215 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -30,6 +30,7 @@
>>>    #include <linux/delay.h>
>>>    #include <linux/atomic.h>
>>>    #include <linux/prefetch.h>
>>> +#include <linux/bitops.h>
>>>    #include <asm/cache.h>
>>>    #include <asm/byteorder.h>
>>>
>>> @@ -805,6 +806,13 @@ struct tc_to_netdev {
>>>    	bool egress_dev;
>>>    };
>>>
>>> +/* Driver must allow a XDP prog to extend header by
>>> + * up to XDP_PACKET_HEADROOM.  It must also fill out
>>> + * the data_hard_start value in struct xdp_buff
>>> + * before calling out the xdp_prog.
>>> + */
>>> +#define XDP_F_ADJUST_HEAD	BIT(0)
>>> +
>>>    /* These structures hold the attributes of xdp state that are being passed
>>>     * to the netdevice through the xdp op.
>>>     */
>>> @@ -821,6 +829,8 @@ enum xdp_netdev_command {
>>>    	 * return true if a program is currently attached and running.
>>>    	 */
>>>    	XDP_QUERY_PROG,
>>> +	/* Check what XDP features are supported by a device */
>>> +	XDP_QUERY_FEATURES,
>>>    };
>>>
>>>    struct netdev_xdp {
>>> @@ -830,6 +840,8 @@ struct netdev_xdp {
>>>    		struct bpf_prog *prog;
>>>    		/* XDP_QUERY_PROG */
>>>    		bool prog_attached;
>>> +		/* XDP_QUERY_FEATURES */
>>> +		u32 features;
>>>    	};
>>>    };
>>>
>> [...]
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index bffb5253e778..90696f7e6b59 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -6722,6 +6722,15 @@ int dev_change_xdp_fd(struct net_device *dev, int fd, u32 flags)
>>>    		prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_XDP);
>>>    		if (IS_ERR(prog))
>>>    			return PTR_ERR(prog);

Ohh, by the way, here you fetch the prog, grabbing a reference.

>>> +
>>> +		xdp.command = XDP_QUERY_FEATURES;
>>> +		err = ops->ndo_xdp(dev, &xdp);
>>> +		if (err)

Therefore ... bpf_prog_put() ...

>>> +			return err;
>>> +
>>> +		if (prog->xdp_adjust_head &&
>>> +		    !(xdp.features & XDP_F_ADJUST_HEAD))

... same here, otherwise we leak it!

>>> +			return -ENOTSUPP;
>>>    	}
>>>
>>>    	memset(&xdp, 0, sizeof(xdp));
>>
>> I think this interface wrt feature flags is rather odd. Why can't this be
>> done the usual/expected way we already have today for drivers with NETIF_F_*
>> flags?
>>
>> We have include/linux/netdev_features.h, there, we add all NETIF_F_XDP_*
>> feature flags that the device would then select during init, perhaps some of
>> them in future might depend on a certain setups, etc, calculating them in a
>> separate ndo_xdp() seems odd also in the sense that in-kernel users always
>> need to call ops->ndo_xdp() with XDP_QUERY_FEATURES instead of just simply
>> doing the test on dev->features & NETIF_F_XDP_* directly. This is global to
>> the device anyway and doesn't need to be stored somewhere in private data
>> area.
>
> If I may offer one potential disadvantage of just using netdev
> features :)
> - if we ever want to report something more than flags (say the length
> of headroom) we will need another interface.  People who care about

Okay, but do we want XDP_QUERY_FEATURES to be a 'super-interface' returning
everything? I mean depending on what comes up in future, I'd rather imagine
that this is still partitioned a bit further, so that f.e. queries where the
driver would need to take some state lock are only required if the caller of
ndo_xdp() is really interested in that. Some of the features might simply be
bit flags, though, some others, if the flag is set, might need a query down
to the driver.

> memory savings may also get upset if we extend struct netdevice given
> there is no way to compile XDP out, that would be an argument for
> keeping the ndo invocation.

If this is a specific concern also regarding dev feature flags, then fair
enough. Just found it odd to have an extra ndo_xdp() call for it where they
could be stored in the dev directly instead. I don't know if we ever need to
pass dev pointer via struct xdp_buff to a helper function and query anything
from there, but worst case this would then need to be changed a bit.

>> I see nothing wrong if this is exposed/made visible in the usual way through
>> ethtool -k as well. I guess at least that would be the expected way to query
>> for such driver capabilities.
>
> +1 on exposing this to user space.  Whether via ethtool -k or a
> separate XDP-specific netlink message is mostly a question of whether
> we expect the need to expose more complex capabilities than bits.
>
> Thanks!
>

  reply	other threads:[~2016-12-07 13:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-07  5:31 [PATCH v3 net-next 0/4]: Allow head adjustment in XDP prog Martin KaFai Lau
2016-12-07  5:31 ` [PATCH v3 net-next 1/4] bpf: xdp: " Martin KaFai Lau
2016-12-07  9:32   ` Daniel Borkmann
2016-12-07 11:41     ` Jakub Kicinski
2016-12-07 13:34       ` Daniel Borkmann [this message]
2016-12-07 16:37       ` Alexei Starovoitov
2016-12-07 17:04         ` David Miller
2016-12-07 17:14           ` Daniel Borkmann
2016-12-07 17:26         ` Martin KaFai Lau
2016-12-07  5:31 ` [PATCH v3 net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs Martin KaFai Lau
2016-12-07  5:31 ` [PATCH v3 net-next 3/4] mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active Martin KaFai Lau
2016-12-07  5:31 ` [PATCH v3 net-next 4/4] bpf: xdp: Add XDP example for head adjustment Martin KaFai Lau
2016-12-07 10:34   ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58480FFF.9010302@iogearbox.net \
    --to=daniel@iogearbox$(echo .)net \
    --cc=ast@fb$(echo .)com \
    --cc=bblanco@plumgrid$(echo .)com \
    --cc=brouer@redhat$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=john.fastabend@gmail$(echo .)com \
    --cc=kafai@fb$(echo .)com \
    --cc=kernel-team@fb$(echo .)com \
    --cc=kubakici@wp$(echo .)pl \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=saeedm@mellanox$(echo .)com \
    --cc=tariqt@mellanox$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox