public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat•com>
To: "Eric W. Biederman" <ebiederm@xmission•com>
Cc: David Ahern <dsahern@gmail•com>,
	Stephen Hemminger <stephen@networkplumber•org>,
	netdev@vger•kernel.org,
	Nicolas Dichtel <nicolas.dichtel@6wind•com>,
	Christian Brauner <brauner@kernel•org>,
	David Laight <David.Laight@ACULAB•COM>
Subject: Re: [RFC PATCH iproute2-next 0/5] Persisting of mount namespaces along with network namespaces
Date: Wed, 11 Oct 2023 17:03:54 +0200	[thread overview]
Message-ID: <87lec9xkth.fsf@toke.dk> (raw)
In-Reply-To: <871qe1i4z7.fsf@email.froward.int.ebiederm.org>

"Eric W. Biederman" <ebiederm@xmission•com> writes:

> Toke Høiland-Jørgensen <toke@redhat•com> writes:
>
>> "Eric W. Biederman" <ebiederm@xmission•com> writes:
>>
>>> Toke Høiland-Jørgensen <toke@redhat•com> writes:
>>>
>>>> "Eric W. Biederman" <ebiederm@xmission•com> writes:
>>>>
>>>>> Toke Høiland-Jørgensen <toke@redhat•com> writes:
>>>>>
>>>>>> "Eric W. Biederman" <ebiederm@xmission•com> writes:
>>>>>>
>>>>>>> Toke Høiland-Jørgensen <toke@redhat•com> writes:
>
>>> My proposal:
>>>
>>> On "ip netns add NAME"
>>> - create the network namespace and mount it at /run/netns/NAME
>>> - mount the appropriate sysfs at /run/netns-mounts/NAME/sys
>>> - mount the appropriate bpffs at /run/netns-mounts/NAME/sys/fs/bpf
>>>
>>> On "ip netns delete NAME"
>>> - umount --recursive /run/netns-mounts/NAME
>>> - unlink /run/netns-mounts/NAME
>>> - cleanup /run/netns/NAME as we do today.
>>>
>>> On "ip netns exec NAME"
>>> - Walk through /run/netns-mounts/NAME like we do /etc/netns/NAME/
>>>   and perform bind mounts.
>>
>> If we setup the full /sys hierarchy in /run/netns-mounts/NAME this
>> basically becomes a single recursive bind mount, doesn't it?
>
> Yes.
>
>> What about if we also include bind mounts from the host namespace into
>> that separate /sys instance? Will those be included into a recursive
>> bind into /sys inside the mount-ns, or will we have to walk the tree and
>> do separate bind mounts for each directory?
>
> if /run/netns-mounts/NAME/sys has everything you want.
>
> mount --rbind /run/netns-mounts/NAME/sys /sys
>
> Will result in a /sys that has everything you want.
>
>> Anyway, this scheme sounds like it'll solve the issue I was trying to
>> address so I don't mind doing it this way. I'll try it out and respin
>> the patch series.
>
> Thanks that sounds like a way forward.
>
>
>>>>> Mount propagation is a way to configure a mount namespace (before
>>>>> creating a new one) that will cause mounts created in the first mount
>>>>> namespace to be created in it's children, and cause mounts created in
>>>>> the children to be created in the parent (depending on how things are
>>>>> configured).
>>>>>
>>>>> It is not my favorite feature (it makes locking of mount namespaces
>>>>> terrible) and it is probably too clever by half, unfortunately systemd
>>>>> started enabling mount propagation by default, so we are stuck with it.
>>>>
>>>> Right. AFAICT the current iproute2 code explicitly tries to avoid that
>>>> when creating a mountns (it does a 'mount --make-rslave /'); so you're
>>>> saying we should change that?
>>>
>>> If it makes sense.
>>>
>>> I believe I added the 'mount --make-rslave /' because otherwise all
>>> mount activity was propagating back, and making a mess.  Especially when
>>> I was unmounting /sys.
>>>
>>> I am not a huge fan of mount propagation it has lots of surprising
>>> little details that need to be set just right, to not cause problems.
>>
>> Ah, you were talking about propagation from inside the mountns to
>> outside? Didn't catch that at first...
>>
>>> With my proposal above I think we could in some carefully chosen
>>> places enable mount propagation without problem.
>>
>> One thing that comes to mind would be that if we create persistent /sys
>> instances in /run/netns-mounts per the above, it would make sense for
>> any modifications done inside the netns to be propagated back to the
>> mount in /run; is this possible with a bind mount? Not sure I quite
>> understand how propagation would work in this case (since it would be a
>> separate (bind) mount point inside the namespace).
>
> Basically yes, but the challenge is in the details.
>
> If the initial propagation is setup properly it will work.  The
> weirdness is how propagation works.  There is a weird detail that
> it needs to be setup on the parent and not on the mount point.
>
> I think the formula is something like:
>
> mount --bind /run/netns-mounts/NAME/sys/ /run/netns-mounts/NAME/sys/
> mount --make-rshared /run/netns-mounts/NAME/sys/
> mount -t sysfs /run/netns-mounts/NAME/sys
>
> My memory is that systemd by default does
>
> mount --make-rshared /
>
> So the challenge may be to simply limit what is propagated to a
> controlled subset.

Alright, I'll play around with it and bug you some more if I can't get
it to work properly; thanks for the pointers! :)

-Toke


  reply	other threads:[~2023-10-11 15:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-09 18:27 [RFC PATCH iproute2-next 0/5] Persisting of mount namespaces along with network namespaces Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 1/5] ip: Mount netns in child process instead of from inside the new namespace Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 2/5] ip: Split out code creating namespace mount dir so it can be reused Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 3/5] lib/namespace: Factor out code for reuse Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 4/5] ip: Also create and persist mount namespace when creating netns Toke Høiland-Jørgensen
2023-10-09 18:27 ` [RFC PATCH iproute2-next 5/5] lib/namespace: Also mount a bpffs instance inside new mount namespaces Toke Høiland-Jørgensen
2023-10-09 20:32 ` [RFC PATCH iproute2-next 0/5] Persisting of mount namespaces along with network namespaces Eric W. Biederman
2023-10-09 22:03   ` Toke Høiland-Jørgensen
2023-10-10  0:14     ` Eric W. Biederman
2023-10-10 13:38       ` Toke Høiland-Jørgensen
2023-10-10 19:19         ` Eric W. Biederman
2023-10-11 13:49           ` Toke Høiland-Jørgensen
2023-10-11 14:55             ` Eric W. Biederman
2023-10-11 15:03               ` Toke Høiland-Jørgensen [this message]
2023-10-10  8:42   ` David Laight
2023-10-10 19:32     ` Eric W. Biederman
2023-10-10 21:51       ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lec9xkth.fsf@toke.dk \
    --to=toke@redhat$(echo .)com \
    --cc=David.Laight@ACULAB$(echo .)COM \
    --cc=brauner@kernel$(echo .)org \
    --cc=dsahern@gmail$(echo .)com \
    --cc=ebiederm@xmission$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=nicolas.dichtel@6wind$(echo .)com \
    --cc=stephen@networkplumber$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox