public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat•com>
To: Kumar Kartikeya Dwivedi <memxor@gmail•com>,
	Joanne Koong <joannelkoong@gmail•com>
Cc: bpf <bpf@vger•kernel.org>, Alexei Starovoitov <ast@kernel•org>,
	Daniel Borkmann <daniel@iogearbox•net>,
	Andrii Nakryiko <andrii@kernel•org>,
	Jamal Hadi Salim <jhs@mojatatu•com>,
	Vlad Buslov <vladbu@nvidia•com>,
	Cong Wang <xiyou.wangcong@gmail•com>,
	Jesper Dangaard Brouer <brouer@redhat•com>,
	netdev <netdev@vger•kernel.org>
Subject: Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
Date: Fri, 10 Jun 2022 22:16:18 +0200	[thread overview]
Message-ID: <87h74s2s19.fsf@toke.dk> (raw)
In-Reply-To: <20220610193418.4kqpu7crwfb5efzy@apollo.legion>

Kumar Kartikeya Dwivedi <memxor@gmail•com> writes:

> On Sat, Jun 11, 2022 at 12:37:50AM IST, Joanne Koong wrote:
>> On Fri, Jun 10, 2022 at 10:23 AM Joanne Koong <joannelkoong@gmail•com> wrote:
>> >
>> > On Fri, Jun 10, 2022 at 5:58 AM Kumar Kartikeya Dwivedi
>> > <memxor@gmail•com> wrote:
>> > >
>> > > On Fri, Jun 10, 2022 at 05:54:27AM IST, Joanne Koong wrote:
>> > > > On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
>> > > > <memxor@gmail•com> wrote:
>> > > > >
>> > > > > This is the second (non-RFC) version.
>> > > > >
>> > > > > This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
>> > > > > introduces fd based ownership for such TC filters. Netlink cannot delete or
>> > > > > replace such filters, but the bpf_link is severed on indirect destruction of the
>> > > > > filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
>> > > > > that filters remain attached beyond process lifetime, the usual bpf_link fd
>> > > > > pinning approach can be used.
>> > > > >
>> > > > > The individual patches contain more details and comments, but the overall kernel
>> > > > > API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
>> > > > > merged recently. This means that we start by always setting direct action mode,
>> > > > > protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
>> > > > > options in the future, they can be easily exposed through the bpf_link API in
>> > > > > the future.
>> > > > >
>> > > > > Patch 1 refactors cls_bpf change function to extract two helpers that will be
>> > > > > reused in bpf_link creation.
>> > > > >
>> > > > > Patch 2 exports some bpf_link management functions to modules. This is needed
>> > > > > because our bpf_link object is tied to the cls_bpf_prog object. Tying it to
>> > > > > tcf_proto would be weird, because the update path has to replace offloaded bpf
>> > > > > prog, which happens using internal cls_bpf helpers, and would in general be more
>> > > > > code to abstract over an operation that is unlikely to be implemented for other
>> > > > > filter types.
>> > > > >
>> > > > > Patch 3 adds the main bpf_link API. A function in cls_api takes care of
>> > > > > obtaining block reference, creating the filter object, and then calls the
>> > > > > bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after
>> > > > > setting up the internal structures. An optimization is made to not keep around
>> > > > > resources for extended actions, which is explained in a code comment as it wasn't
>> > > > > immediately obvious.
>> > > > >
>> > > > > Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports
>> > > > > replacing the bpf_prog, we can skip tc filter's change path by reusing the
>> > > > > filter object but swapping its bpf_prog. This takes care of replacing the
>> > > > > offloaded prog as well (if that fails, update is aborted). So far however,
>> > > > > tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter
>> > > > > would never be modified concurrently. This is no longer true, and to not
>> > > > > penalize the classify hot path, we also cannot impose serialization around
>> > > > > its load. Hence the load is changed to READ_ONCE, so that the pointer value is
>> > > > > always consistent. Due to invocation in a RCU critical section, the lifetime of
>> > > > > the prog is guaranteed for the duration of the call.
>> > > > >
>> > > > > Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning
>> > > > > function to libbpf.
>> > > > >
>> > > > > Patch 7 adds a selftest that exercises all possible problematic interactions
>> > > > > that I could think of.
>> > > > >
>> > > > > Design:
>> > > > >
>> > > > > This is where in the object hierarchy our bpf_link object is attached.
>> > > > >
>> > > > >                                                                             ┌─────┐
>> > > > >                                                                             │     │
>> > > > >                                                                             │ BPF │
>> > > > >                                                                             program
>> > > > >                                                                             │     │
>> > > > >                                                                             └──▲──┘
>> > > > >                                                       ┌───────┐                │
>> > > > >                                                       │       │         ┌──────┴───────┐
>> > > > >                                                       │  mod  ├─────────► cls_bpf_prog │
>> > > > > ┌────────────────┐                                    │cls_bpf│         └────┬───▲─────┘
>> > > > > │    tcf_block   │                                    │       │              │   │
>> > > > > └────────┬───────┘                                    └───▲───┘              │   │
>> > > > >          │          ┌─────────────┐                       │                ┌─▼───┴──┐
>> > > > >          └──────────►  tcf_chain  │                       │                │bpf_link│
>> > > > >                     └───────┬─────┘                       │                └────────┘
>> > > > >                             │          ┌─────────────┐    │
>> > > > >                             └──────────►  tcf_proto  ├────┘
>> > > > >                                        └─────────────┘
>> > > > >
>> > > > > The bpf_link is detached on destruction of the cls_bpf_prog.  Doing it this way
>> > > > > allows us to implement update in a lightweight manner without having to recreate
>> > > > > a new filter, where we can just replace the BPF prog attached to cls_bpf_prog.
>> > > > >
>> > > > > The other way to do it would be to link the bpf_link to tcf_proto, there are
>> > > > > numerous downsides to this:
>> > > > >
>> > > > > 1. All filters have to embed the pointer even though they won't be using it when
>> > > > > cls_bpf is compiled in.
>> > > > > 2. This probably won't make sense to be extended to other filter types anyway.
>> > > > > 3. We aren't able to optimize the update case without adding another bpf_link
>> > > > > specific update operation to tcf_proto ops.
>> > > > >
>> > > > > The downside with tying this to the module is having to export bpf_link
>> > > > > management functions and introducing a tcf_proto op. Hopefully the cost of
>> > > > > another operation func pointer is not big enough (as there is only one ops
>> > > > > struct per module).
>> > > > >
>> > > > Hi Kumar,
>> > > >
>> > > > Do you have any plans / bandwidth to land this feature upstream? If
>> > > > so, do you have a tentative estimation for when you'll be able to work
>> > > > on this? And if not, are you okay with someone else working on this to
>> > > > get it merged in?
>> > > >
>> > >
>> > > I can have a look at resurrecting it later this month, if you're ok with waiting
>> > > until then, otherwise if someone else wants to pick this up before that it's
>> > > fine by me, just let me know so we avoid duplicated effort. Note that the
>> > > approach in v2 is dead/unlikely to get accepted by the TC maintainers, so we'd
>> > > have to implement the way Daniel mentioned in [0].
>> >
>> > Sounds great! We'll wait and check back in with you later this month.
>> >
>> After reading the linked thread (which I should have done before
>> submitting my previous reply :)),  if I'm understanding it correctly,
>> it seems then that the work needed for tc bpf_link will be in a new
>> direction that's not based on the code in this v2 patchset. I'm
>> interested in learning more about bpf link and tc - I can pick this up
>> to work on. But if this was something you wanted to work on though,
>> please don't hesitate to let me know; I can find some other bpf link
>> thing to work on instead if that's the case.
>>
>
> Feel free to take it. And yes, it's going to be much simpler than this. I think
> you can just add two bpf_prog pointers in struct net_device, use rtnl_lock to
> protect the updates, and invoke using bpf_prog_run in sch_handle_ingress and
> sch_handle_egress.

Except we'd want to also support multiple programs on different
priorities? I don't think requiring a libxdp-like dispatcher to achieve
this is a good idea if we can just have it be part of the API from the
get-go...

-Toke


  parent reply	other threads:[~2022-06-10 20:16 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-04  6:31 [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Kumar Kartikeya Dwivedi
2021-06-04  6:31 ` [PATCH bpf-next v2 1/7] net: sched: refactor cls_bpf creation code Kumar Kartikeya Dwivedi
2021-06-04  6:31 ` [PATCH bpf-next v2 2/7] bpf: export bpf_link functions for modules Kumar Kartikeya Dwivedi
2021-06-04  6:31 ` [PATCH bpf-next v2 3/7] net: sched: add bpf_link API for bpf classifier Kumar Kartikeya Dwivedi
2021-06-05  3:08   ` Yonghong Song
2021-06-05  4:52     ` Kumar Kartikeya Dwivedi
2021-06-07 23:23   ` Andrii Nakryiko
2021-06-04  6:31 ` [PATCH bpf-next v2 4/7] net: sched: add lightweight update path for cls_bpf Kumar Kartikeya Dwivedi
2021-06-04 17:54   ` Alexei Starovoitov
2021-06-05  4:42     ` Kumar Kartikeya Dwivedi
2021-06-07 23:32   ` Andrii Nakryiko
2021-06-10 14:14     ` Kumar Kartikeya Dwivedi
2021-06-04  6:31 ` [PATCH bpf-next v2 5/7] tools: bpf.h: sync with kernel sources Kumar Kartikeya Dwivedi
2021-06-04  6:31 ` [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API Kumar Kartikeya Dwivedi
2021-06-04 18:01   ` Alexei Starovoitov
2021-06-05  4:51     ` Kumar Kartikeya Dwivedi
2021-06-07 23:37       ` Andrii Nakryiko
2021-06-05 17:09   ` Yonghong Song
2021-06-07 23:41   ` Andrii Nakryiko
2021-06-04  6:31 ` [PATCH bpf-next v2 7/7] libbpf: add selftest for " Kumar Kartikeya Dwivedi
2021-06-05 17:26   ` Yonghong Song
2021-06-07 23:57   ` Andrii Nakryiko
2022-06-10  0:24 ` [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Joanne Koong
2022-06-10 12:58   ` Kumar Kartikeya Dwivedi
2022-06-10 17:23     ` Joanne Koong
2022-06-10 19:07       ` Joanne Koong
2022-06-10 19:34         ` Kumar Kartikeya Dwivedi
2022-06-10 20:04           ` Daniel Borkmann
2022-06-10 22:01             ` Joanne Koong
2022-06-10 20:16           ` Toke Høiland-Jørgensen [this message]
2022-06-10 20:35             ` Daniel Borkmann
2022-06-10 20:41               ` Toke Høiland-Jørgensen
2022-06-10 21:52                 ` Alexei Starovoitov
2022-06-10 22:02                   ` Daniel Borkmann
2022-06-11 10:54                     ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h74s2s19.fsf@toke.dk \
    --to=toke@redhat$(echo .)com \
    --cc=andrii@kernel$(echo .)org \
    --cc=ast@kernel$(echo .)org \
    --cc=bpf@vger$(echo .)kernel.org \
    --cc=brouer@redhat$(echo .)com \
    --cc=daniel@iogearbox$(echo .)net \
    --cc=jhs@mojatatu$(echo .)com \
    --cc=joannelkoong@gmail$(echo .)com \
    --cc=memxor@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=vladbu@nvidia$(echo .)com \
    --cc=xiyou.wangcong@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox