Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling

public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed

From: Daniel Borkmann <dborkman@redhat•com>
To: Alexei Starovoitov <ast@plumgrid•com>
Cc: "David S. Miller" <davem@davemloft•net>,
	Network Development <netdev@vger•kernel.org>
Subject: Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling
Date: Thu, 24 Apr 2014 00:21:31 +0200	[thread overview]
Message-ID: <53583CEB.3040203@redhat.com> (raw)
In-Reply-To: <CAMEtUux3YZW9G80Da2B3pBB2Sq7C7eC62ntnrVifN8_RJugzHw@mail.gmail.com>

On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat•com> wrote:
>> Currently, at initial setup in __sk_run_filter() we initialize the
>> BPF stack's frame-pointer and CTX register. However, instead of the
>> CTX register, we initialize context to ARG1, and during user filter
>> migration we emit *always* an instruction that copies the content
>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>> to ARG1 in these cases. So all in all, we always emit one instruction
>> at BPF program beginning that should have actually been avoided to
>> spare this overhead.
>>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat•com>
>> Cc: Alexei Starovoitov <ast@plumgrid•com>
>
> First 4 patches look great, but this one I have to disagree.
> See below.
>
>> ---
>>   net/core/filter.c | 10 +---------
>>   1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index eada3d5..6fed231 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -62,7 +62,6 @@
>>   #define A      regs[insn->a_reg]
>>   #define X      regs[insn->x_reg]
>>   #define FP     regs[BPF_REG_FP]
>> -#define ARG1   regs[BPF_REG_ARG1]
>>   #define CTX    regs[BPF_REG_CTX]
>>   #define K      insn->imm
>>
>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
>>   #define CONT_JMP ({ insn++; goto select_insn; })
>>
>>          FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>> -       ARG1 = (u64) (unsigned long) ctx;
>> +       CTX = (u64) (unsigned long) ctx;
>
> R1 (ARG1) is the register that used to pass first argument to the function.

Yes that's clear. Which is why f.e. in convert_bpf_extensions() we currently
copy ctx over to arg1 for calls, i.e.:

   /* arg1 = ctx */
   insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
   insn->a_reg = ARG1_REG;
   insn->x_reg = CTX_REG;
   insn++;

> For seamless kernel->bpf->kernel transition we have to follow calling
> convention, so 'void *ctx' has to go into R1 by design.
> Storing it into R6 (CTX) will only work for classic filters converted
> to extended.
> all native ebpf filters will be broken.

My objection was that currently, we do _not_ have _any_ users or even kernel
API for _native_ filters, at least not in mainline tree. The _main_ users we
have are currently _all_ being converted, hence this patch. Given that these
calls have likely just a minority of use cases triggered by tcpdump et al,
the majority of users still need to do this overhead/additional work.

> In documentation we say:
>      * R1 - R5   - arguments from BPF program to in-kernel function
> so llvm/gcc are following this ebpf ABI.
> Calling convention is the same whether to call kernel function from ebpf
> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.

Yes, that's clear and convert_bpf_extensions() is doing that. So far we're
not using llvm/gcc backend here and have the internal instruction set not
exposed to user space, but even there you would need to prepare R1 - R5 to
hand-over arguments for the BPF_CALL insns, so why can't we load CTX into R1
at that time just as we do with convert_bpf_extensions()?

> By convention ld_abs/ld_ind insns are using implicit input register 'ctx' (R6),
> that's why we do a copy from R1 into R6 as a first insn of the
> _converted_ filter.

Yep, and R6 stays as is here. So ld_abs/ld_ind insns are correctly using 'ctx'.

Best,

Daniel

next prev parent reply	other threads:[~2014-04-23 22:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-23 20:56 [PATCH net-next 0/5] BPF updates Daniel Borkmann
2014-04-23 20:56 ` [PATCH net-next 1/5] net: filter: simplify label names from jump-table Daniel Borkmann
2014-04-23 20:56 ` [PATCH net-next 2/5] net: filter: misc/various cleanups Daniel Borkmann
2014-04-23 20:56 ` [PATCH net-next 3/5] net: filter: get rid of sock_fprog_kern Daniel Borkmann
2014-04-23 20:57 ` [PATCH net-next 4/5] net: filter: make register namings more comprehensible Daniel Borkmann
2014-04-23 20:57 ` [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling Daniel Borkmann
2014-04-23 21:59   ` Alexei Starovoitov
2014-04-23 22:21     ` Daniel Borkmann [this message]
2014-04-23 22:37       ` Alexei Starovoitov
2014-04-23 22:57         ` Daniel Borkmann
2014-04-23 23:14           ` Alexei Starovoitov
2014-04-24  3:05             ` Alexei Starovoitov
2014-04-24  8:28               ` David Laight
2014-04-24 15:55                 ` David Miller
2014-04-24  5:56             ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53583CEB.3040203@redhat.com \
    --to=dborkman@redhat$(echo .)com \
    --cc=ast@plumgrid$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=netdev@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox