public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public•gmane.org (Eric W. Biederman)
To: "Mahesh Bandewar (महेश बंडेवार)"
	<maheshb-hpIqsD4AKlfQT0dZR+AlfA@public•gmane.org>
Cc: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public•gmane.org>,
	Christian Brauner
	<christian.brauner-Z7WLFzj8eWMS+FvcfC7Uqw@public•gmane.org>,
	Boris Lukashev
	<blukashev-JNja4Z15B3SvB/ACxS1yDA@public•gmane.org>,
	Daniel Micay
	<danielmicay-Re5JQEeQqe8AvxtiuMwx3w@public•gmane.org>,
	Mahesh Bandewar <mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public•gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public•gmane.org>,
	Netdev <netdev-u79uwXL29TY76Z2rM5mHXA@public•gmane.org>,
	Kernel-hardening
	<kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8@public•gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public•gmane.org>,
	Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public•gmane.org>,
	Eric Dumazet <edumazet-hpIqsD4AKlfQT0dZR+AlfA@public•gmane.org>,
	David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public•gmane.org>
Subject: Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Date: Thu, 09 Nov 2017 15:58:43 -0600	[thread overview]
Message-ID: <871sl7dsh8.fsf@xmission.com> (raw)
In-Reply-To: <CAF2d9jgs5MYn1dMT2mbhF=6UB2Hoo5kwmJhXuE6memBfWzkWXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> (Mahesh Bandewar's message of "Thu, 9 Nov 2017 16:18:08 +0900")

"Mahesh Bandewar (महेश बंडेवार)" <maheshb-hpIqsD4AKlfQT0dZR+AlfA@public•gmane.org> writes:

> [resend response as earlier one failed because of formatting issues]
>
> On Thu, Nov 9, 2017 at 12:21 PM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public•gmane.org> wrote:
>>
>> On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) wrote:
>> > On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner
>> > <christian.brauner-Z7WLFzj8eWMS+FvcfC7Uqw@public•gmane.org> wrote:
>> > > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) wrote:
>> > >> Sorry folks I was traveling and seems like lot happened on this thread. :p
>> > >>
>> > >> I will try to response few of these comments selectively -
>> > >>
>> > >> > The thing that makes me hesitate with this set is that it is a
>> > >> > permanent new feature to address what (I hope) is a temporary
>> > >> > problem.
>> > >> I agree this is permanent new feature but it's not solving a temporary
>> > >> problem. It's impossible to assess what and when new vulnerability
>> > >> that could show up. I think Daniel summed it up appropriately in his
>> > >> response
>> > >>
>> > >> > Seems like there are two naive ways to do it, the first being to just
>> > >> > look at all code under ns_capable() plus code called from there.  It
>> > >> > seems like looking at the result of that could be fruitful.
>> > >> This is really hard. The main issue that there were features designed
>> > >> and developed before user-ns days with an assumption that unprivileged
>> > >> users will never get certain capabilities which only root user gets.
>> > >> Now that is not true anymore with user-ns creation with mapping root
>> > >> for any process. Also at the same time blocking user-ns creation for
>> > >> eveyone is a big-hammer which is not needed too. So it's not that easy
>> > >> to just perform a code-walk-though and correct those decisions now.
>> > >>
>> > >> > It seems to me that the existing control in
>> > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
>> > >> > in that case.
>> > >> This solution is essentially blocking unprivileged users from using
>> > >> the user-namespaces entirely. This is not really a solution that can
>> > >> work. The solution that this patch-set adds allows unprivileged users
>> > >> to create user-namespaces. Actually the proposed solution is more
>> > >> fine-grained approach than the unprivileged_userns_clone solution
>> > >> since you can selectively block capabilities rather than completely
>> > >> blocking the functionality.
>> > >
>> > > I've been talking to Stéphane today about this and we should also keep in mind
>> > > that we have:
>> > >
>> > > chb@conventiont|~
>> > >> ls -al /proc/sys/user/
>> > > total 0
>> > > dr-xr-xr-x 1 root root 0 Nov  6 23:32 .
>> > > dr-xr-xr-x 1 root root 0 Nov  2 22:13 ..
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_cgroup_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_instances
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_watches
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_ipc_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_mnt_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_net_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_pid_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_user_namespaces
>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_uts_namespaces
>> > >
>> > > These files allow you to limit the number of namespaces that can be created
>> > > *per namespace* type. So let's say your system runs a bunch of user namespaces
>> > > you can do:
>> > >
>> > > chb@conventiont|~
>> > >> echo 0 > /proc/sys/user/max_user_namespaces
>> > >
>> > > So that the next time you try to create a user namespaces you'd see:
>> > >
>> > > chb@conventiont|~
>> > >> unshare -U
>> > > unshare: unshare failed: No space left on device
>> > >
>> > > So there's not even a need to upstream a new sysctl since we have ways of
>> > > blocking this.
>> > >
>> > I'm not sure how it's solving the problem that my patch-set is addressing?
>> > I agree though that the need for unprivileged_userns_clone sysctl goes
>> > away as this is equivalent to setting that sysctl to 0 as you have
>> > described above.
>>
>> oh right that was the reasoning iirc for not needing the other sysctl.
>>
>> > However as I mentioned earlier, blocking processes from creating
>> > user-namespaces is not the solution. Processes should be able to
>> > create namespaces as they are designed but at the same time we need to
>> > have controls to 'contain' them if a need arise. Setting max_no to 0
>> > is not the solution that I'm looking for since it doesn't solve the
>> > problem.
>>
>> well yesterday we were told that was explicitly not the goal, but that was
>> not by you ... i just mention it to explain why we seem to be walking in
>> circles a bit.
>>
>> anyway the bounding set doesn't actually make sense so forget that.   the
>> question then is just whether it makes sense to allow things to continue
>> at all in this situation.  would you mind indulging me by giving one or two
>> concrete examples in the previous known cves of what capabilities you would
>> have dropped tto allow the rest to continue to be safely used?
>>
> Of course. Let's take an example of the CVE that I have mentioned in
> my cover-letter -
> CVE-2017-7308(https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-7308).
> It's well documented and even has a
> exploit(https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308)
> c-program that can demonstrate how it can be used against non-patched
> kernel. There is very nice blog
> post(https://googleprojectzero.blogspot.kr/2017/05/exploiting-linux-kernel-via-packet.html)
> about this vulnerability by Andrey Konovalov.
>
> This is about the AF_PACKET socket interface that is protected behind
> NET_RAW capability. This capability is not available to unprivileged
> user. However, any unprivileged user can get NET_RAW capability (as
> demonstrated in the cover-letter code that I have attached in this
> patch series) so this NET_RAW capability is available to any
> unprivileged user on the host if the kernel has user-namespaces
> available.
>
> With this patch-set applied, all that is needed is to flip a bit with
> the sysctl (kernel.controlled_userns_caps_whitelist) as demonstrated
> below -
>
> root@lphh6:~# uname -a
> Linux lphh6 4.14.0-smp-DEV #97 SMP @1510203579 x86_64 GNU/Linux
> root@lphh6:~# sysctl -q kernel.controlled_userns_caps_whitelist
> kernel.controlled_userns_caps_whitelist = 1f,ffffffff
>
> Now when I run the program (demo from the cover-letter) as a normal
> unprivileged user I can't create a RAW socket in init-ns but I can in
> the child-ns.
>
> dumbo@lphh6:~$ /tmp/acquire_raw
> Attempting to open RAW socket before unshare()...
> socket() SOCK_RAW failed: : Operation not permitted
> Attempting to open RAW socket after unshare()...
> Successfully opened RAW-Sock after unshare().
> dumbo@lphh6:~$
>
> Now as a root user. Take off CAP_NET_RAW
>
> root@lphh6:~# sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
> kernel.controlled_userns_caps_whitelist = 1f,ffffdfff
> root@lphh6:~#
>
> Now run the same program as an unprivileged user -
>
> dumbo@lphh6:~$ /tmp/acquire_raw
> Attempting to open RAW socket before unshare()...
> socket() SOCK_RAW failed: : Operation not permitted
> Attempting to open RAW socket after unshare()...
> socket() SOCK_RAW failed: : Operation not permitted
> dumbo@lphh6:~$
>
> Notice that it has failed to create a raw socket in init and in child
> namespace. It's not blocking creation of user-namespaces but allowing
> admin turn individual capability bits on and off.
>
> This is very simplistic example of just demonstrating how capability
> bits turn-on/off works. So let's assume a sandboxed environment where
> we don't know what a binary that we are about run in an environment
> which is identified as susceptible. By turning off the NET_RAW bit,
> the admin gets an assurance that system is safe and if binary fails
> because it's not getting this capability then that bad but a sad
> consequence (without compromising the host integrity) but if it
> doesn't use the NET_RAW capability but any other combination of
> remaining 36 capabilities, it would get whatever is necessary. This
> means we can safely allow processes to create user-namespaces by
> taking off certain capabilities in question for temporary/extended
> period until proper fix is applied without compromising the system
> integrity. The impact will vary based on which capability is taken off
> and admin would / should be ware of for the environment that he/she is
> dealing with.

My challenge with this reasoning is that I don't know that it meanifully
generalizes to any other capability.

I can in the sandbox today create a user namespace and then set
max_net_namespaces to 0, and drop CAP_NET_RAW and that blocks
the attack.  (Possibly with a little spice to prevent a suid root
program from reacquiring CAP_NET_RAW).

So while your solution doesn't look horrible especially if it can be
done at a user namespace level so the restrictions can be limited to a
single sandbox.  I am not at all certain that the capabilities is the
proper place to limit code reachability.

I would very much like to see which capabilities that are available with
ns_capable, are more meaningful to limit than just dropping the
capability during sandbox creation and denying the creation of the
corresponding namespace.

CAP_NET_RAW is one.  Are there any other capabilities that are
meanginful to limit?

Eric

  parent reply	other threads:[~2017-11-09 21:58 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-03  0:44 [PATCH resend 2/2] userns: control capabilities of some user namespaces Mahesh Bandewar
     [not found] ` <20171103004436.40026-1-mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org>
2017-11-04 23:53   ` Serge E. Hallyn
     [not found]     ` <20171104235346.GA17170-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2017-11-06  7:23       ` Mahesh Bandewar (महेश बंडेवार)
2017-11-06 15:03         ` Serge E. Hallyn
     [not found]           ` <20171106150302.GA26634-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2017-11-06 21:33             ` [kernel-hardening] " Daniel Micay
     [not found]               ` <1510003994.736.0.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-06 22:14                 ` Serge E. Hallyn
2017-11-06 23:17                   ` Boris Lukashev
2017-11-06 23:39                     ` Serge E. Hallyn
2017-11-07  0:01                       ` Boris Lukashev
     [not found]                         ` <CAFUG7CcW077LHcQEqk7qy7iVvmi-3J8psD1Kwj45XvHThiZC6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-07  3:28                           ` [kernel-hardening] " Serge E. Hallyn
     [not found]                             ` <20171107032802.GA6669-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2017-11-08 11:09                               ` Mahesh Bandewar (महेश बंडेवार)
2017-11-08 19:02                                 ` Christian Brauner
     [not found]                                   ` <20171108190223.vdkyepcaegmub6le-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-09  0:55                                     ` Mahesh Bandewar (महेश बंडेवार)
     [not found]                                       ` <CAF2d9jjed4Q7QvCD9Kpaa7L-Ngg3XFbJvt0jNVUUwt=52wDjjw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-09  3:21                                         ` Serge E. Hallyn
2017-11-09  7:13                                           ` Mahesh Bandewar (महेश बंडेवार)
2017-11-09  7:18                                           ` Mahesh Bandewar (महेश बंडेवार)
2017-11-09 16:14                                             ` Serge E. Hallyn
     [not found]                                             ` <CAF2d9jgs5MYn1dMT2mbhF=6UB2Hoo5kwmJhXuE6memBfWzkWXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-09 21:58                                               ` Eric W. Biederman [this message]
     [not found]                                                 ` <871sl7dsh8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-11-10  4:30                                                   ` [kernel-hardening] " Mahesh Bandewar (महेश बंडेवार)
2017-11-10  4:46                                                   ` Serge E. Hallyn
     [not found]                                                     ` <20171110044645.GA3694-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2017-11-10  5:28                                                       ` Mahesh Bandewar (महेश बंडेवार)
     [not found]                   ` <20171106221418.GA32543-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2017-11-06 22:42                     ` Christian Brauner
2017-11-07  2:16                     ` Daniel Micay
     [not found]                       ` <1510020963.736.42.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-07  3:23                         ` Serge E. Hallyn
2017-11-09 18:01                           ` chris hyser
     [not found]                             ` <da764cbf-7522-06a0-6c21-adfa3eaac9c2-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-11-09 18:05                               ` Serge E. Hallyn
2017-11-09 18:27                                 ` chris hyser
2017-11-09 17:25 ` Serge E. Hallyn
2017-11-10  1:49   ` Mahesh Bandewar (महेश बंडेवार)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871sl7dsh8.fsf@xmission.com \
    --to=ebiederm-as9lmozglivwk0htik3j/w@public$(echo .)gmane.org \
    --cc=blukashev-JNja4Z15B3SvB/ACxS1yDA@public$(echo .)gmane.org \
    --cc=christian.brauner-Z7WLFzj8eWMS+FvcfC7Uqw@public$(echo .)gmane.org \
    --cc=danielmicay-Re5JQEeQqe8AvxtiuMwx3w@public$(echo .)gmane.org \
    --cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public$(echo .)gmane.org \
    --cc=edumazet-hpIqsD4AKlfQT0dZR+AlfA@public$(echo .)gmane.org \
    --cc=keescook-F7+t8E8rja9g9hUCZPvPmw@public$(echo .)gmane.org \
    --cc=kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8@public$(echo .)gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public$(echo .)gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public$(echo .)gmane.org \
    --cc=mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public$(echo .)gmane.org \
    --cc=maheshb-hpIqsD4AKlfQT0dZR+AlfA@public$(echo .)gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public$(echo .)gmane.org \
    --cc=serge-A9i7LUbDfNHQT0dZR+AlfA@public$(echo .)gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox