public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox•net>
To: Saeed Mahameed <saeedm@dev•mellanox.co.il>
Cc: "David S. Miller" <davem@davemloft•net>,
	Alexei Starovoitov <alexei.starovoitov@gmail•com>,
	Brenden Blanco <bblanco@plumgrid•com>,
	Zhiyi Sun <zhiyisun@gmail•com>, Rana Shahout <ranas@mellanox•com>,
	Saeed Mahameed <saeedm@mellanox•com>,
	Linux Netdev List <netdev@vger•kernel.org>
Subject: Re: [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
Date: Thu, 17 Nov 2016 21:03:50 +0100	[thread overview]
Message-ID: <582E0D26.3000907@iogearbox.net> (raw)
In-Reply-To: <CALzJLG-ZQxF0t1N5ry2OUqcBjrs+7XzuWb_dRDOjEnRf1975Hg@mail.gmail.com>

On 11/17/2016 10:48 AM, Saeed Mahameed wrote:
> On Wed, Nov 16, 2016 at 5:26 PM, Daniel Borkmann <daniel@iogearbox•net> wrote:
>> On 11/16/2016 03:30 PM, Saeed Mahameed wrote:
>>> On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@iogearbox•net>
>>> wrote:
>>>> On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
>>>>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox•net>
>>>>> wrote:
>>>>>>
>>>>>> There are multiple issues in mlx5e_xdp_set():
>>>>>>
>>>>>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>>>>>       doing so, it should be done at an earlier point in time to makes
>>>>>> sure
>>>>>>       that we cannot fail anymore at the time we want to set the program
>>>>>> for
>>>>>>       each channel. This only means that we have to undo the
>>>>>> bpf_prog_add()
>>>>>>       in case we return due to required reset.
>>>>>>
>>>>>> 2) When swapping the priv->xdp_prog, then no extra reference count must
>>>>>> be
>>>>>>       taken since we got that from call path via dev_change_xdp_fd()
>>>>>> already.
>>>>>>       Otherwise, we'd never be able to free the program. Also,
>>>>>> bpf_prog_add()
>>>>>>       without checking the return code could fail.
>>>>>>
>>>>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs
>>>>>> support")
>>>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox•net>
>>>>>> ---
>>>>>>     drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25
>>>>>> ++++++++++++++++++-----
>>>>>>     1 file changed, 20 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> index ab0c336..cf26672 100644
>>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device
>>>>>> *netdev, struct bpf_prog *prog)
>>>>>>                    goto unlock;
>>>>>>            }
>>>>>>
>>>>>> +       if (prog) {
>>>>>> +               /* num_channels is invariant here, so we can take the
>>>>>> +                * batched reference right upfront.
>>>>>> +                */
>>>>>> +               prog = bpf_prog_add(prog, priv->params.num_channels);
>>>>>> +               if (IS_ERR(prog)) {
>>>>>> +                       err = PTR_ERR(prog);
>>>>>> +                       goto unlock;
>>>>>> +               }
>>>>>> +       }
>>>>>> +
>>>>>
>>>>> With this approach you will end up taking a ref count twice per RQ! on
>>>>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
>>>>> One ref will be taken per RQ/Channel from the above code, and since
>>>>> reset will be TRUE mlx5e_open_locked will be called and another ref
>>>>> count will be taken on mlx5e_create_rq.
>>>>>
>>>>> The issue here is that we have two places for ref count accounting,
>>>>> xdp_set and  mlx5e_create_rq. Having ref-count updates in
>>>>> mlx5e_create_rq is essential for num_channels configuration changes
>>>>> (mlx5e_set_ringparam).
>>>>>
>>>>> Our previous approach made sure that only one path will do the ref
>>>>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
>>>>> reset == false).
>>>>
>>>> That is correct, for a short time bpf_prog_add() was charged also when
>>>> we reset. When we reset, we will then jump to unlock_put and safely undo
>>>> this since we took ref from mlx5e_create_rq() already in that case, and
>>>> return successfully. That was intentional, so that eventually we end up
>>>> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
>>>> below ...
>>>>
>>>> [...]
>>>>>
>>>>> 2. Keep the current approach and make sure to not update the ref count
>>>>> twice, you can batch update only if (!reset && was_open) otherwise you
>>>>> can rely on mlx5e_open_locked to take the num_channels ref count for
>>>>> you.
>>>>>
>>>>> Personally I prefer option 2, since we will keep the current logic
>>>>> which will allow configuration updates such as (mlx5e_set_ringparam)
>>>>> to not worry about ref counting since it will be done in the reset
>>>>> flow.
>>>>
>>>> ... agree on keeping current approach. I actually like the idea, so we'd
>>>> end up with this simpler version for the batched ref then.
>>>
>>> Great :).
>>>
>>> So let's do the batched update only if we are not going to reset (we
>>> already know that in advance), instead of the current patch where you
>>> batch update unconditionally and then
>>>    unlock_put in case reset was performed (which is just redundant and
>>> confusing).
>>>
>>> Please note that if open_locked fails you need to goto unlock_put.
>>
>> Sorry, I'm not quite sure I follow you here; are you /now/ commenting on
>> the original patch or on the already updated diff I did from my very last
>> email, that is, http://patchwork.ozlabs.org/patch/695564/ ?
>
> I am commenting on this patch  "V2 2/4".
> this patch will take batched ref count unconditionally and release the
> extra refs "unlock_put" if reset was performed.
> I am asking to take the batched ref only if we are not going to reset
> in advance to save the extra "unlock_put"

Okay, sure, it's clear and we discussed about that already earlier, and
my response back then was the diff in the /end/ of this email here:

   https://www.spinics.net/lists/netdev/msg404709.html

Hence my confusion why we were back on the original patch after that. ;)
Anyway, I'll get the updated set with the suggestion out tomorrow, thanks
for the review!

>>>> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not
>>>> needed
>>>> since this we already got that one through dev_change_xdp_fd() as
>>>> mentioned.
>>>
>>> If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup
>>> in your next patch? this doesn't look symmetric (right) !
>>> you shouldn't release a reference you didn't take.
>>> Overall with this series the driver can take num_channels refs and
>>> will release num_channels refs on mlx5e_close. we shouldn't release
>>> one extra ref on NIC cleanup.
>>
>> I already explained this in the commit description; when dev_change_xdp_fd()
>> is called and sees a valid fd, it does bpf_prog_get_type(), which calls the
>> __bpf_prog_get() taking a ref on the program (bpf_prog_inc()). That is then
>> passed down to ops->ndo_xdp(). Only in case of error from the ->ndo_xdp()
>> callback, we bpf_prog_put() this reference from dev_change_xdp_fd() side.
>>
>> For drivers that implement against this ndo, it means that you need N-1 refs
>> in addition. Have a look at the other two in-tree users, which do it
>> correctly,
>> that is mlx4_xdp_set() and nfp_net_xdp_setup().
>>
>> It's documented here (include/linux/netdevice.h) with ndo_xdp referring to
>> it:
>>
>> /* These structures hold the attributes of xdp state that are being passed
>>   * to the netdevice through the xdp op.
>>   */
>> enum xdp_netdev_command {
>>          /* Set or clear a bpf program used in the earliest stages of packet
>>           * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The
>> callee
>>           * is responsible for calling bpf_prog_put on any old progs that are
>>           * stored. In case of error, the callee need not release the new
>> prog
>>           * reference, but on success it takes ownership and must
>> bpf_prog_put
>>           * when it is no longer used.
>>           */
>>          XDP_SETUP_PROG,
>> [...]
>> };
>>
>> I think reason why this is rather confusing would be that initially, it was
>> just a single prog for all queues, but it was requested along the way to
>> move
>> prog pointer down to queues instead and let them have their own ref, so that
>> some time in future individual progs from a subset of the queues can be
>> exchanged.
>>
>> Since the way xdp in mlx5 is implemented, you currently have the
>> priv->xdp_prog
>> as the control prog. That's okay right now, but requires to drop the last
>> ref
>> on priv->xdp_prog via bpf_prog_put() when netdev is dismantled and
>> priv->xdp_prog
>> still present.
>
> Got it, but i would rather make the stack cleanup those refs once it
> receives an unregester_netdev event instead of counting on netdev to
> do it, as i said before, like any other resource added to the netdev
> by the stack (vlan/mac/etc..).
>
> but this is a general discussion of xdp API, it won't block this series :).
>
> Saeed.

  reply	other threads:[~2016-11-17 20:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-16  0:04 [PATCH net-next v2 0/4] Couple of BPF refcount fixes for mlx5 Daniel Borkmann
2016-11-16  0:04 ` [PATCH net-next v2 1/4] bpf, mlx5: fix mlx5e_create_rq taking reference on prog Daniel Borkmann
2016-11-16  0:04 ` [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set Daniel Borkmann
2016-11-16 12:25   ` Saeed Mahameed
2016-11-16 13:54     ` Daniel Borkmann
2016-11-16 14:30       ` Saeed Mahameed
2016-11-16 15:26         ` Daniel Borkmann
2016-11-17  9:48           ` Saeed Mahameed
2016-11-17 20:03             ` Daniel Borkmann [this message]
2016-11-16  0:04 ` [PATCH net-next v2 3/4] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup Daniel Borkmann
2016-11-16 12:51   ` Saeed Mahameed
2016-11-16 15:45     ` Daniel Borkmann
2016-11-16 15:55       ` Daniel Borkmann
2016-11-17  9:34         ` Saeed Mahameed
2016-11-16  0:04 ` [PATCH net-next v2 4/4] bpf: add __must_check attributes to refcount manipulating helpers Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=582E0D26.3000907@iogearbox.net \
    --to=daniel@iogearbox$(echo .)net \
    --cc=alexei.starovoitov@gmail$(echo .)com \
    --cc=bblanco@plumgrid$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=ranas@mellanox$(echo .)com \
    --cc=saeedm@dev$(echo .)mellanox.co.il \
    --cc=saeedm@mellanox$(echo .)com \
    --cc=zhiyisun@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox