public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat•com>
To: Daniel Borkmann <daniel@iogearbox•net>,
	Alexei Starovoitov <alexei.starovoitov@gmail•com>
Cc: bpf <bpf@vger•kernel.org>,
	Network Development <netdev@vger•kernel.org>,
	Martin KaFai Lau <kafai@fb•com>,
	Hangbin Liu <liuhangbin@gmail•com>,
	Jesper Dangaard Brouer <brouer@redhat•com>,
	Magnus Karlsson <magnus.karlsson@gmail•com>,
	"Paul E . McKenney" <paulmck@kernel•org>
Subject: Re: [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context
Date: Fri, 11 Jun 2021 00:27:52 +0200	[thread overview]
Message-ID: <874ke5we1j.fsf@toke.dk> (raw)
In-Reply-To: <c5192ab3-1c05-8679-79f2-59d98299095b@iogearbox.net>

Daniel Borkmann <daniel@iogearbox•net> writes:

> Hi Paul,
>
> On 6/10/21 8:38 PM, Alexei Starovoitov wrote:
>> On Wed, Jun 9, 2021 at 7:24 AM Toke Høiland-Jørgensen <toke@redhat•com> wrote:
>>>
>>> XDP programs are called from a NAPI poll context, which means the RCU
>>> reference liveness is ensured by local_bh_disable(). Add
>>> rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so
>>> lockdep understands that the dereferences are safe from inside *either* an
>>> rcu_read_lock() section *or* a local_bh_disable() section. This is done in
>>> preparation for removing the redundant rcu_read_lock()s from the drivers.
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat•com>
>>> ---
>>>   kernel/bpf/hashtab.c  | 21 ++++++++++++++-------
>>>   kernel/bpf/helpers.c  |  6 +++---
>>>   kernel/bpf/lpm_trie.c |  6 ++++--
>>>   3 files changed, 21 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>> index 6f6681b07364..72c58cc516a3 100644
>>> --- a/kernel/bpf/hashtab.c
>>> +++ b/kernel/bpf/hashtab.c
>>> @@ -596,7 +596,8 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
>>>          struct htab_elem *l;
>>>          u32 hash, key_size;
>>>
>>> -       WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
>>> +       WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
>>> +                    !rcu_read_lock_bh_held());
>> 
>> It's not clear to me whether rcu_read_lock_held() is still needed.
>> All comments sound like rcu_read_lock_bh_held() is a superset of rcu
>> that includes bh.
>> But reading rcu source code it looks like RCU_BH is its own rcu flavor...
>> which is confusing.
>
> The series is a bit confusing to me as well. I recall we had a discussion with
> Paul, but it was back in 2016 aka very early days of XDP to get some clarifications
> about RCU vs RCU-bh flavour on this. Paul, given the series in here, I assume the
> below is not true anymore, and in this case (since we're removing rcu_read_lock()
> from drivers), the RCU-bh acts as a real superset?
>
> Back then from your clarifications this was not the case:
>
>    On Mon, Jul 25, 2016 at 11:26:02AM -0700, Alexei Starovoitov wrote:
>    > On Mon, Jul 25, 2016 at 11:03 AM, Paul E. McKenney
>    > <paulmck@linux•vnet.ibm.com> wrote:
>    [...]
>    >>> The crux of the question is whether a particular driver rx handler, when
>    >>> called from __do_softirq, needs to add an additional rcu_read_lock or
>    >>> whether it can rely on the mechanics of softirq.
>    >>
>    >> If it was rcu_read_lock_bh(), you could.
>    >>
>    >> But you didn't say rcu_read_lock_bh(), you instead said rcu_read_lock(),
>    >> which means that you absolutely cannot rely on softirq semantics.
>    >>
>    >> In particular, in CONFIG_PREEMPT=y kernels, rcu_preempt_check_callbacks()
>    >> will notice that there is no rcu_read_lock() in effect and report
>    >> a quiescent state for that CPU.  Because rcu_preempt_check_callbacks()
>    >> is invoked from the scheduling-clock interrupt, it absolutely can
>    >> execute during do_softirq(), and therefore being in softirq context
>    >> in no way provides rcu_read_lock()-style protection.
>    >>
>    >> Now, Alexei's question was for CONFIG_PREEMPT=n kernels.  However, in
>    >> that case, rcu_read_lock() and rcu_read_unlock() generate no code
>    >> in recent production kernels, so there is no performance penalty for
>    >> using them.  (In older kernels, they implied a barrier().)
>    >>
>    >> So either way, with or without CONFIG_PREEMPT, you should use
>    >> rcu_read_lock() to get RCU protection.
>    >>
>    >> One alternative might be to switch to rcu_read_lock_bh(), but that
>    >> will add local_disable_bh() overhead to your read paths.
>    >>
>    >> Does that help, or am I missing the point of the question?
>    >
>    > thanks a lot for explanation.
>
>    Glad you liked it!
>
>    > I mistakenly assumed that _bh variants are 'stronger' and
>    > act as inclusive, but sounds like they're completely orthogonal
>    > especially with preempt_rcu=y.
>
>    Yes, they are pretty much orthogonal.
>
>    > With preempt_rcu=n and preempt=y, it would be the case, since
>    > bh disables preemption and rcu_read_lock does the same as well,
>    > right? Of course, the code shouldn't be relying on that, so we
>    > have to fix our stuff.
>
>    Indeed, especially given that the kernel currently won't allow you
>    to configure CONFIG_PREEMPT_RCU=n and CONFIG_PREEMPT=y.  If it does,
>    please let me know, as that would be a bug that needs to be fixed.
>    (For one thing, I do not test that combination.)
>
> 							Thanx, Paul
>
> And now, fast-forward again to 2021 ... :)

We covered this in the thread I linked from the cover letter.
Specifically, this seems to have been a change from v4.20, see Paul's
reply here:
https://lore.kernel.org/bpf/20210417002301.GO4212@paulmck-ThinkPad-P17-Gen-1/

and the follow-up covering -rt here:
https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/

-Toke


  reply	other threads:[~2021-06-10 22:27 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 01/17] rcu: Create an unrcu_pointer() to remove __rcu from a pointer Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context Toke Høiland-Jørgensen
2021-06-10 18:38   ` Alexei Starovoitov
2021-06-10 21:24     ` Daniel Borkmann
2021-06-10 22:27       ` Toke Høiland-Jørgensen [this message]
2021-06-10 19:33   ` Martin KaFai Lau
2021-06-09 10:33 ` [PATCH bpf-next 03/17] dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU dev ref Toke Høiland-Jørgensen
2021-06-10 19:37   ` Martin KaFai Lau
2021-06-10 23:05     ` Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 04/17] xdp: add proper __rcu annotations to redirect map entries Toke Høiland-Jørgensen
2021-06-10 21:09   ` Martin KaFai Lau
2021-06-10 23:19     ` Toke Høiland-Jørgensen
2021-06-10 23:32       ` Martin KaFai Lau
2021-06-10 23:41         ` Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 05/17] ena: remove rcu_read_lock() around XDP program invocation Toke Høiland-Jørgensen
2021-06-09 13:57   ` Paul E. McKenney
2021-06-09 10:33 ` [PATCH bpf-next 06/17] bnxt: " Toke Høiland-Jørgensen
2021-06-09 13:58   ` Paul E. McKenney
2021-06-10  8:47     ` Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 07/17] thunderx: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 08/17] freescale: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 09/17] net: intel: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 10/17] marvell: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 11/17] mlx4: " Toke Høiland-Jørgensen
2021-06-10  7:10   ` Tariq Toukan
2021-06-09 10:33 ` [PATCH bpf-next 12/17] nfp: " Toke Høiland-Jørgensen
2021-06-11 16:30   ` Simon Horman
2021-06-09 10:33 ` [PATCH bpf-next 13/17] qede: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 14/17] sfc: " Toke Høiland-Jørgensen
2021-06-09 12:15   ` Edward Cree
2021-06-09 10:33 ` [PATCH bpf-next 15/17] netsec: " Toke Høiland-Jørgensen
2021-06-10  5:30   ` Ilias Apalodimas
2021-06-09 10:33 ` [PATCH bpf-next 16/17] stmmac: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 17/17] net: ti: " Toke Høiland-Jørgensen
2021-06-09 17:04   ` Grygorii Strashko
2021-06-10  0:18 ` [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874ke5we1j.fsf@toke.dk \
    --to=toke@redhat$(echo .)com \
    --cc=alexei.starovoitov@gmail$(echo .)com \
    --cc=bpf@vger$(echo .)kernel.org \
    --cc=brouer@redhat$(echo .)com \
    --cc=daniel@iogearbox$(echo .)net \
    --cc=kafai@fb$(echo .)com \
    --cc=liuhangbin@gmail$(echo .)com \
    --cc=magnus.karlsson@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=paulmck@kernel$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox