From: "Toke Høiland-Jørgensen" <toke@redhat•com>
To: "Eric W. Biederman" <ebiederm@xmission•com>
Cc: David Ahern <dsahern@gmail•com>,
Stephen Hemminger <stephen@networkplumber•org>,
netdev@vger•kernel.org,
Nicolas Dichtel <nicolas.dichtel@6wind•com>,
Christian Brauner <brauner@kernel•org>,
David Laight <David.Laight@ACULAB•COM>
Subject: Re: [RFC PATCH iproute2-next 0/5] Persisting of mount namespaces along with network namespaces
Date: Tue, 10 Oct 2023 00:03:24 +0200 [thread overview]
Message-ID: <87jzrvzc5v.fsf@toke.dk> (raw)
In-Reply-To: <877cnvtu37.fsf@email.froward.int.ebiederm.org>
"Eric W. Biederman" <ebiederm@xmission•com> writes:
> Toke Høiland-Jørgensen <toke@redhat•com> writes:
>
>> The 'ip netns' command is used for setting up network namespaces with persistent
>> named references, and is integrated into various other commands of iproute2 via
>> the -n switch.
>>
>> This is useful both for testing setups and for simple script-based namespacing
>> but has one drawback: the lack of persistent mounts inside the spawned
>> namespace. This is particularly apparent when working with BPF programs that use
>> pinning to bpffs: by default no bpffs is available inside a namespace, and
>> even if mounting one, that fs disappears as soon as the calling
>> command exits.
>
> It would be entirely reasonable to copy mounts like /sys/fs/bpf from the
> original mount namespace into the temporary mount namespace used by
> "ip netns".
>
> I would call it a bug that "ip netns" doesn't do that already.
>
> I suspect that "ip netns" does copy the mounts from the old sysfs onto
> the new sysfs is your entire problem.
How would it do that? Walk mtab and remount everything identically after
remounting /sys? Or is there a smarter way to go about this?
> Or is their a reason that bpffs should be per network namespace?
Well, I first ran into this issue because of a bug report to
xdp-tools/libxdp about things not working correctly in network
namespaces:
https://github.com/xdp-project/xdp-tools/issues/364
And libxdp does assume that there's a separate bpffs per network
namespace: it persists things into the bpffs that is tied to the network
devices in the current namespace. So if the bpffs is shared, an
application running inside the network namespace could access XDP
programs loaded in the root namespace. I don't know, but suspect, that
such assumptions would be relatively common in networking BPF programs
that use pinning (the pinning support in libbpf and iproute2 itself at
least have the same leaking problem if the bpffs is shared).
>> The underlying cause for this is that iproute2 will create a new mount namespace
>> every time it switches into a network namespace. This is needed to be able to
>> mount a /sys filesystem that shows the correct network device information, but
>> has the unfortunate side effect of making mounts entirely transient for any 'ip
>> netns' invocation.
>
> Mount propagation can be made to work if necessary, that would solve the
> transient problem.
Is mount propagation different from the remount thing you mentioned
above, or is this something different?
(Sorry for being hopelessly naive about this, as you probably guessed
from my previous email asking about this, I'm only now learning about
all the intricacies fs mounts).
>> This series is an attempt to fix this situation, by persisting a mount namespace
>> alongside the persistent network namespace (in a separate directory,
>> /run/netns-mnt). Doing this allows us to still have a consistent /sys inside
>> the namespace, but with persistence so any mounts survive.
>
> I really don't like that direction.
>
> "ip netns" was designed and really should continue to be a command that
> makes the world look like it has a single network namespace, for
> compatibility with old code. Part of that old code "ip netns" supports
> is "ip" itself.
Well my idea with this change was to keep the functionality as close to
what 'ip' currently does, but just have mounts persist across
invocations.
> I think you are making bpffs unnecessarily per network namespace.
See above.
>> This mode does come with some caveats. I'm sending this as RFC to get feedback
>> on whether this is the right thing to do, especially considering backwards
>> compatibility. On balance, I think that the approach taken here of
>> unconditionally persisting the mount namespace, and using that persistent
>> reference whenever it exists, is better than the current behaviour, and that
>> while it does represent a change in behaviour it is backwards compatible in a
>> way that won't cause issues. But please do comment on this; see the patch
>> description of patch 4 for details.
>
> As I understand it this will cause a problem for any application that
> is network namespace aware and does not use "ip netns" to wrap itself.
>
> I am fairly certain that pinning the mount namespace will result in
> never seeing an update of /etc/resolve.conf. At least if you
> are on a system that has /etc/netns/NAME/resolve.conf
I was actually wondering about that /etc bind mounting support while I
was looking at this code. Could you please elaborate a bit on what that
is used for, exactly? :)
Also, if staleness of the /etc bind mounts is an issue, those could be
redone on every entry, couldn't they?
-Toke
next prev parent reply other threads:[~2023-10-09 22:03 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-09 18:27 [RFC PATCH iproute2-next 0/5] Persisting of mount namespaces along with network namespaces Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 1/5] ip: Mount netns in child process instead of from inside the new namespace Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 2/5] ip: Split out code creating namespace mount dir so it can be reused Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 3/5] lib/namespace: Factor out code for reuse Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 4/5] ip: Also create and persist mount namespace when creating netns Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 5/5] lib/namespace: Also mount a bpffs instance inside new mount namespaces Toke Høiland-Jørgensen
2023-10-09 20:32 ` [RFC PATCH iproute2-next 0/5] Persisting of mount namespaces along with network namespaces Eric W. Biederman
2023-10-09 22:03 ` Toke Høiland-Jørgensen [this message]
2023-10-10 0:14 ` Eric W. Biederman
2023-10-10 13:38 ` Toke Høiland-Jørgensen
2023-10-10 19:19 ` Eric W. Biederman
2023-10-11 13:49 ` Toke Høiland-Jørgensen
2023-10-11 14:55 ` Eric W. Biederman
2023-10-11 15:03 ` Toke Høiland-Jørgensen
2023-10-10 8:42 ` David Laight
2023-10-10 19:32 ` Eric W. Biederman
2023-10-10 21:51 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87jzrvzc5v.fsf@toke.dk \
--to=toke@redhat$(echo .)com \
--cc=David.Laight@ACULAB$(echo .)COM \
--cc=brauner@kernel$(echo .)org \
--cc=dsahern@gmail$(echo .)com \
--cc=ebiederm@xmission$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=nicolas.dichtel@6wind$(echo .)com \
--cc=stephen@networkplumber$(echo .)org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox