public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox•net>
To: Tycho Andersen <tycho.andersen@canonical•com>,
	Alexei Starovoitov <alexei.starovoitov@gmail•com>
Cc: Alexei Starovoitov <ast@kernel•org>,
	"David S. Miller" <davem@davemloft•net>,
	netdev@vger•kernel.org
Subject: Re: [PATCH] ebpf: emit correct src_reg for conditional jumps
Date: Fri, 11 Sep 2015 18:53:30 +0200	[thread overview]
Message-ID: <55F3070A.2090405@iogearbox.net> (raw)
In-Reply-To: <20150911155034.GO27574@smitten>

On 09/11/2015 05:50 PM, Tycho Andersen wrote:
> On Fri, Sep 11, 2015 at 08:40:43AM -0700, Alexei Starovoitov wrote:
>> On Fri, Sep 11, 2015 at 11:28:24AM +0200, Daniel Borkmann wrote:
>>> [off topic for this patch]
>>>
>>> ... this requirement also breaks down for cases where you have a single
>>> classic BPF instruction that maps into 2 or more eBPF instructions, hitting
>>> BPF_MAXINSNS early at the time when you try to call into bpf(2) again with
>>> the dumped result. If I recall correctly, Chrome seems to use up quite a lot
>>> of insns space on (classic) seccomp-BPF, so this could potentially be a real
>>> issue, next to artificially crafted examples that would fail.
>>>
>>> With regards to the latter, also classic programs that could have holes of
>>> dead code where you jump over them (see lib/test_bpf.c for examples) are
>>> unfortunately allowed on the ABI side, so while the classic -> eBPF converter
>>> may translate this dead region 1:1, it will be rejected by the verifier when
>>> you dump and try to reload that. ;) Anyway, it's perhaps a silly example, but
>>> it shows that this use-case can only work on a 'best-effort' basis, and not
>>> cover everything of the classic BPF ABI, as opposed to having an interface
>>> that can dump/restore classic BPF instructions directly.
>>>
>>> Do you need this for checkpoint/restore? Wouldn't this therefore be better
>>> done as dump/restore classic<->classic and eBPF<->eBPF directly? In socket
>>> filters we do this by keeping orig_prog around, I guess it's better to bite
>>> the bullet of additional memory overhead in case of classic seccomp-BPF, too.
>>
>> I don't think so.
>> When I played with libseccomp and chrome I saw that browser installed
>> several bpf programs for every new tab. The longest program was 275
>> classic insns which translated to slightly more eBPF insns
>> (because in eBPF we don't have < and <= instructions, so converter
>> needs to emit extra jump insns)

Yes, these cases, and also exit code translates into two and you have a
preamble moving ctx to arg1 for every classic program. So it should be
slightly more overall, depending on the program structure.

>> Also they don't produce unreachable code.
>> So getting over 4k limit and unreachable are rather hypothetical
>> problems. I wouldn't want to have two interfaces to criu seccomp.

Thinking out loud, is there such a use-case where you checkpoint your
application on kernel X (that allows, say, to dump /and/ inject classic
seccomp insns) and restore it elsewhere on kernel Y, where Y is older
than X (f.e. it can easily inject classic insns on seccomp as it's
present for some time, but /not/ dump). I guess that could be ignored
as you couldn't move it away from there w/o dumping insns then? Also,
if the *whole* environment is even to a little degree non-homogeneous,
then your own seccomp rules can already kill you. ;)

>> eBPF is going to be used by seccomp as well, so having two is extra
>> burden for user space criu.

I don't know how much burden it actually is for criu, it for sure already
would need to do exactly this for eBPF socket filters. If I could choose
between adding extra complexity on user space or kernel space, I would
choose user space when possible.

> I think the burden is not so huge once we have eBPF c/r in place (we
> could just check the classic flag first, then dump the eBPF version or
> something). However, it doesn't seem ideal to have to support two
> interfaces.
>
>> If we start hitting 4k limit for eBPF, we can easily bump it
>> and/or add < and <= insns to eBPF (which was on my todo list anyway).

I think bumping BPF_MAXINSNS seems fragile wrt user space breakage, it's
at least unclear to me whether applications interacting with the kernel
depend on this in some [even weird] way. I'd probably go for the <, <=.

> I was hoping you might say this (assuming you mean add < BPF_MAXINSNS
> to the converter).
>
> The dead code is certainly a problem, but perhaps we can wait on this
> until there become some critical application that has this issue. I
> was hoping we could get away without extra memory usage on the kernel
> side (indeed, this patchset is mostly to try and avoid that).
>
> Tycho
>

  reply	other threads:[~2015-09-11 16:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-11  0:25 [PATCH] ebpf: emit correct src_reg for conditional jumps Tycho Andersen
2015-09-11  8:45 ` Daniel Borkmann
2015-09-11  9:28   ` Daniel Borkmann
2015-09-11 15:40     ` Alexei Starovoitov
2015-09-11 15:50       ` Tycho Andersen
2015-09-11 16:53         ` Daniel Borkmann [this message]
2015-09-11 21:53 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F3070A.2090405@iogearbox.net \
    --to=daniel@iogearbox$(echo .)net \
    --cc=alexei.starovoitov@gmail$(echo .)com \
    --cc=ast@kernel$(echo .)org \
    --cc=davem@davemloft$(echo .)net \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=tycho.andersen@canonical$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox