public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Mickaël Salaün" <mic@digikod•net>
To: Mikhail Ivanov <ivanov.mikhail1@huawei-partners•com>
Cc: "Günther Noack" <gnoack3000@gmail•com>,
	"Günther Noack" <gnoack@google•com>,
	willemdebruijn.kernel@gmail•com,
	linux-security-module@vger•kernel.org, netdev@vger•kernel.org,
	netfilter-devel@vger•kernel.org, yusongping@huawei•com,
	artem.kuzin@huawei•com, konstantin.meskhidze@huawei•com,
	"Matthieu Buffet" <matthieu@buffet•re>
Subject: Re: [RFC PATCH v3 01/19] landlock: Support socket access-control
Date: Fri, 24 Jan 2025 15:02:47 +0100	[thread overview]
Message-ID: <20250124.sei0Aur6aegu@digikod.net> (raw)
In-Reply-To: <aac17a25-eb9e-342e-953a-094ae0e86b54@huawei-partners.com>

On Fri, Jan 24, 2025 at 03:28:02PM +0300, Mikhail Ivanov wrote:
> On 1/14/2025 9:31 PM, Mickaël Salaün wrote:
> > Happy new year!
> > 
> > On Fri, Jan 10, 2025 at 07:55:10PM +0300, Mikhail Ivanov wrote:
> > > On 1/10/2025 7:27 PM, Günther Noack wrote:
> > > > On Fri, Jan 10, 2025 at 04:02:42PM +0300, Mikhail Ivanov wrote:
> > > > > Correct, but we also agreed to use bitmasks for "family" field as well:
> > > > > 
> > > > > https://lore.kernel.org/all/af72be74-50c7-d251-5df3-a2c63c73296a@huawei-partners.com/
> > > > 
> > > > OK
> > > > 
> > > > 
> > > > > On 1/10/2025 2:12 PM, Günther Noack wrote:
> > > > > > I do not understand why this convenience feature in the UAPI layer
> > > > > > requires a change to the data structures that Landlock uses
> > > > > > internally?  As far as I can tell, struct landlock_socket_attr is only
> > > > > > used in syscalls.c and converted to other data structures already.  I
> > > > > > would have imagined that we'd "unroll" the specified bitmasks into the
> > > > > > possible combinations in the add_rule_socket() function and then call
> > > > > > landlock_append_socket_rule() multiple times with each of these?
> > 
> > I agree that UAPI should not necessarily dictate the kernel
> > implementation.
> > 
> > > > > 
> > > > > I thought about unrolling bitmask into multiple entries in rbtree, and
> > > > > came up with following possible issue:
> > > > > 
> > > > > Imagine that a user creates a rule that allows to create sockets of all
> > > > > possible families and types (with protocol=0 for example):
> > > > > {
> > > > > 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> > > > > 	.families = INT64_MAX, /* 64 set bits */
> > > > > 	.types = INT16_MAX, /* 16 set bits */
> > > > > 	.protocol = 0,
> > > > > },
> > > > > 
> > > > > This will add 64 x 16 = 1024 entries to the rbtree. Currently, the
> > > > > struct landlock_rule, which is used to store rules, weighs 40B. So, it
> > > > > will be 40kB by a single rule. Even if we allow rules with only
> > > > > "existing" families and types, it will be 46 x 7 = 322 entries and ~12kB
> > > > > by a single rule.
> > > > > 
> > > > > I understand that this may be degenerate case and most common rule will
> > > > > result in less then 8 (or 4) entries in rbtree, but I think API should
> > > > > be as intuitive as possible. User can expect to see the same
> > > > > memory usage regardless of the content of the rule.
> > > > > 
> > > > > I'm not really sure if this case is really an issue, so I'd be glad
> > > > > to hear your opinion on it.
> > > > 
> > > > I think there are two separate questions here:
> > > > 
> > > > (a) I think it is OK that it is *possible* to allocate 40kB of kernel
> > > >       memory.  At least, this is already possible today by calling
> > > >       landlock_add_rule() repeatedly.
> > > > 
> > > >       I assume that the GFP_KERNEL_ACCOUNT flag is limiting the
> > > >       potential damage to the caller here?  That flag was added in the
> > > >       Landlock v19 patch set [1] ("Account objects to kmemcg.").
> > > > 
> > > > (b) I agree it might be counterintuitive when a single
> > > >       landlock_add_rule() call allocates more space than expected.
> > > > 
> > > > Mickaël, since it is already possible today (but harder), I assume
> > > > that you have thought about this problem before -- is it a problem
> > > > when users allocate more kernel memory through Landlock than they
> > > > expected?
> > 
> > The imbalance between the user request and the required kernel memory is
> > interesting.  It would not be a security issue thanks to the
> > GFP_KERNEL_ACCOUNT, but it can be surprising for users that for some
> > requests they can receive -ENOMEM but not for quite-similar ones (e.g.
> > with only some bits different).  We should keep the principle of least
> > astonishment in mind, but also the fact that the kernel implementation
> > and the related required memory may change between two kernel versions
> > for the exact same user request (because of Landlock implementation
> > differences or other parts of the kernel).  In summary, we should be
> > careful to prevent users from being able to use a large amount of memory
> > with only one syscall.  This which would be an usability issue.
> > 
> > > > 
> > > > 
> > > > Naive proposal:
> > > > 
> > > > If this is an issue, how about we set a low limit to the number of
> > > > families and the number of types that can be used in a single
> > > > landlock_add_rule() invocation?  (It tends to be easier to start with
> > > > a restrictive API and open it up later than the other way around.)
> > > 
> > > Looks like a good approach! Better to return with an error (which almost
> > > always won't happen) and let the user to refactor the code than to
> > > allow ruleset to eat an unexpected amount of memory.
> > > 
> > > > 
> > > > For instance,
> > > > 
> > > > * In the families field, at most 2 bits may be set.
> > > > * In the types field, at most 2 bits may be set.
> > > 
> > > Or we can say that rule can contain not more than 4 combinations of
> > > family and type.
> > > 
> > > BTW, 4 seems to be sufficient, at least for IP protocols.
> > > 
> > > > 
> > > > In my understanding, the main use case of the bit vectors is that
> > > > there is a short way to say "all IPv4+v6 stream+dgram sockets", but we
> > > > do not know scenarios where much more than that is needed?  With that,
> > > > we would still keep people from accidentally allocating larger amounts
> > > > of memory, while permitting the main use case.
> > > 
> > > Agreed
> > > 
> > > > 
> > > > Having independent limits for the family and type fields is a bit
> > > > easier to understand and document than imposing a limit on the
> > > > multiplication result.
> > 
> > This is a safer approach that can indeed be documented, but it looks
> > unintuitive and an arbitrary limitation.  Here is another proposal:
> > 
> > Let's ignore my previous suggestion to use bitfields for families and
> > protocols.  To keep it simple, we can get back to the initial struct but
> > specifically handle the (i64)-1 value (which cannot be a family,
> > protocol, nor a type):
> > {
> > 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> > 	.family = AF_INET,
> > 	.type = SOCK_STREAM,
> > 	.protocol = -1,
> > },
> > 
> > This would read: deny socket creation except for AF_INET with
> > SOCK_STREAM (and any protocol).
> > 
> > Users might need to add several rules (e.g. one for AF_INET and another
> > for AF_INET6) but that's OK.  Unfortunately we cannot identify a TCP
> > socket with only protocol = IPPROTO_TCP because protocol definitions
> > are relative to network families.  Specifying the protocol without the
> > family should then return an error.
> > 
> > Before rule could be loaded, users define how much they want to match a
> > socket: at least the family, optionally the type, and if the type is
> > also set then the protocol can also be set.  These dependencies are
> > required to transform this triplet to a key number, see below.
> > 
> > A landlock_ruleset_attr.handled_socket_layers field would define how
> > much we want to match a socket:
> > - 1: family only
> > - 2: family and type
> > - 3: family, type, and protocol
> > 
> > According to this ruleset's property, users will be allowed to fill the
> > family, type, or protocol fields in landlock_socket_attr rules.  If a
> > socket layer is not handled, it should contain (i64)-1 for the kernel to
> > detect misuse of the API.
> > 
> > This enables us to get a key from this triplet:
> > 
> > family_bits = 6; // 45 values for now
> > type_bits = 3; // 7 values for now
> > protocol_bits = 5; // 28 values for now
> > 
> > // attr.* are the sanitized UAPI values, including -1 replaced with 0.
> > // In this example, landlock_ruleset_attr.handled_socket_layers is 3, so
> > // the key is composed of all the 3 properties.
> > landlock_key.data = attr.family << (type_bits + protocol_bits) |
> >                      attr.type << protocol_bits | attr.protocol;
> > 
> > For each layer of restriction in a domain, we know how precise they
> > define a socket (i.e. with how many "socket layers").  We can then look
> > for at most 3 entries in the red-black tree: one with only the family,
> > another with the family and the type, and potentially a third also
> > including the protocol.  Each key would have the same significant bits
> > but with the lower bits masked according to each
> > landlock_ruleset_attr.handled_socket_layers .  Composing the related
> > access masks according to the defined socket layers, we can create an
> > array of struct access_masks for the request and then check if such
> > request is allowed by the current domain.  As for the currently stored
> > data, we can also identify the domain layer that blocked the request
> > (required for audit).
> 
> I do not quite understand why we need socket_layers. Without it,
> user can set (i64)(-1) to type or protocol whenever he wants. While
> transforming triplet to a key we can replace (i64)(-1) with some
> constant (e.g. (1 << type_bits - 1) for the type if type_bits = 8 and
> (1 << protocol_bits - 1) for the protocol if protocol_bits = 16).

We can indead add a virtual any/wildcard type and protocol, translated
from user space's (i64)-1 .  The downside is that for the same domain
layer we would need to check 4 different keys for the same triplet (i.e.
famility is always set, but type and protocol are optionals).  With the
socket_layers number, we only have to check for one key per domain
layer.  However, in the worse case with at least 3 domain layers having
different socket_layers value, we would still need to check for 3
different keys.  So, I think it's reasonable to get rid of the
socket_layers as you suggested (while requiring the family to always be
set).

> 
> > 
> > With this design, each sandbox can define a socket as much as it wants.
> > 
> > The downside is that we lost the bitfields and we need several calls to
> > filter more complex sockets (e.g. 4 for UDP and TCP with IPv4 and IPv6),
> > which looks OK compared to the required calls for filesystem access
> > control.
> > 
> > > > 
> > > > > > That being said, I am not a big fan of red-black trees for such simple
> > > > > > integer lookups either, and I also think there should be something
> > > > > > better if we make more use of the properties of the input ranges. The
> > > > > > question is though whether you want to couple that to this socket type
> > > > > > patch set, or rather do it in a follow up?  (So far we have been doing
> > > > > > fine with the red black trees, and we are already contemplating the
> > > > > > possibility of changing these internal structures in [2].  We have
> > > > > > also used RB trees for the "port" rules with a similar reasoning,
> > > > > > IIRC.)
> > > > > 
> > > > > I think it'll be better to have a separate series for [2] if the socket
> > > > > restriction can be implemented without rbtree refactoring.
> > > > 
> > > > Sounds good to me. 👍
> > > > 
> > > > [1] https://lore.kernel.org/all/20200707180955.53024-2-mic@digikod.net/
> > 
> > red-black trees are a good generic data structure for the current main
> > use case (for dynamic rulesets and static domains), but we'll need to
> > use more appropriate data structures.  I think this should not be a
> > blocker for this patch series.  It will be required to match (port)
> > ranges though (even if the use case seems limited), and in general for
> > better performances.
> 

  reply	other threads:[~2025-01-24 14:12 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-04 10:48 [RFC PATCH v3 00/19] Support socket access-control Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 01/19] landlock: " Mikhail Ivanov
2024-09-06 13:09   ` Günther Noack
2024-09-09  7:23     ` Mikhail Ivanov
2024-11-11 16:29   ` Mikhail Ivanov
2024-11-22 17:45     ` Günther Noack
2024-11-25 11:04       ` Mikhail Ivanov
2024-11-27 18:43         ` Mickaël Salaün
2024-11-28 12:01           ` Mikhail Ivanov
2024-11-28 20:52             ` Mickaël Salaün
2024-12-02 11:32               ` Mikhail Ivanov
2024-12-24 16:55                 ` Mikhail Ivanov
2025-01-10 11:12                   ` Günther Noack
2025-01-10 13:02                     ` Mikhail Ivanov
2025-01-10 16:27                       ` Günther Noack
2025-01-10 16:55                         ` Mikhail Ivanov
2025-01-14 18:31                           ` Mickaël Salaün
2025-01-24 12:28                             ` Mikhail Ivanov
2025-01-24 14:02                               ` Mickaël Salaün [this message]
2024-09-04 10:48 ` [RFC PATCH v3 02/19] landlock: Add hook on socket creation Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 03/19] selftests/landlock: Test basic socket restriction Mikhail Ivanov
2024-09-10  9:53   ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 04/19] selftests/landlock: Test adding a rule with each supported access Mikhail Ivanov
2024-09-10  9:53   ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 05/19] selftests/landlock: Test adding a rule for each unknown access Mikhail Ivanov
2024-09-10  9:53   ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 06/19] selftests/landlock: Test adding a rule for unhandled access Mikhail Ivanov
2024-09-10  9:22   ` Günther Noack
2024-09-11  8:19     ` Mikhail Ivanov
2024-09-13 15:04       ` Günther Noack
2024-09-13 16:15         ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 07/19] selftests/landlock: Test adding a rule for empty access Mikhail Ivanov
2024-09-18 12:42   ` Günther Noack
2024-09-18 13:03     ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 08/19] selftests/landlock: Test overlapped restriction Mikhail Ivanov
2024-09-18 12:42   ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 09/19] selftests/landlock: Test creating a ruleset with unknown access Mikhail Ivanov
2024-09-18 12:44   ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 10/19] selftests/landlock: Test adding a rule with family and type outside the range Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 11/19] selftests/landlock: Test unsupported protocol restriction Mikhail Ivanov
2024-09-18 12:54   ` Günther Noack
2024-09-18 13:36     ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 12/19] selftests/landlock: Test that kernel space sockets are not restricted Mikhail Ivanov
2024-09-04 12:45   ` Mikhail Ivanov
2024-09-18 13:00   ` Günther Noack
2024-09-19 10:53     ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 13/19] selftests/landlock: Test packet protocol alias Mikhail Ivanov
2024-09-18 13:33   ` Günther Noack
2024-09-18 14:01     ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 14/19] selftests/landlock: Test socketpair(2) restriction Mikhail Ivanov
2024-09-18 13:47   ` Günther Noack
2024-09-23 12:57     ` Mikhail Ivanov
2024-09-25 12:17       ` Mikhail Ivanov
2024-09-27  9:48       ` Günther Noack
2024-09-28 20:06         ` Günther Noack
2024-09-29 17:31           ` Mickaël Salaün
2024-10-03 17:27             ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 15/19] selftests/landlock: Test SCTP peeloff restriction Mikhail Ivanov
2024-09-27 14:35   ` Günther Noack
2024-10-03 12:15     ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 16/19] selftests/landlock: Test that accept(2) is not restricted Mikhail Ivanov
2024-09-27 14:53   ` Günther Noack
2024-10-03 12:41     ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 17/19] samples/landlock: Replace atoi() with strtoull() in populate_ruleset_net() Mikhail Ivanov
2024-09-27 15:12   ` Günther Noack
2024-10-03 12:59     ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 18/19] samples/landlock: Support socket protocol restrictions Mikhail Ivanov
2024-10-01  7:56   ` Günther Noack
2024-10-03 13:15     ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 19/19] landlock: Document socket rule type support Mikhail Ivanov
2024-10-01  7:09   ` Günther Noack
2024-10-03 14:00     ` Mikhail Ivanov
2024-10-03 16:21       ` Günther Noack
2025-04-22 17:19 ` [RFC PATCH v3 00/19] Support socket access-control Mickaël Salaün
2025-04-25 13:58   ` Günther Noack
2025-04-29 11:59     ` Mikhail Ivanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250124.sei0Aur6aegu@digikod.net \
    --to=mic@digikod$(echo .)net \
    --cc=artem.kuzin@huawei$(echo .)com \
    --cc=gnoack3000@gmail$(echo .)com \
    --cc=gnoack@google$(echo .)com \
    --cc=ivanov.mikhail1@huawei-partners$(echo .)com \
    --cc=konstantin.meskhidze@huawei$(echo .)com \
    --cc=linux-security-module@vger$(echo .)kernel.org \
    --cc=matthieu@buffet$(echo .)re \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=netfilter-devel@vger$(echo .)kernel.org \
    --cc=willemdebruijn.kernel@gmail$(echo .)com \
    --cc=yusongping@huawei$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox