From: "Mickaël Salaün" <mic@digikod•net>
To: Mikhail Ivanov <ivanov.mikhail1@huawei-partners•com>
Cc: "Günther Noack" <gnoack3000@gmail•com>,
"Günther Noack" <gnoack@google•com>,
willemdebruijn.kernel@gmail•com,
linux-security-module@vger•kernel.org, netdev@vger•kernel.org,
netfilter-devel@vger•kernel.org, yusongping@huawei•com,
artem.kuzin@huawei•com, konstantin.meskhidze@huawei•com,
"Matthieu Buffet" <matthieu@buffet•re>
Subject: Re: [RFC PATCH v3 01/19] landlock: Support socket access-control
Date: Fri, 24 Jan 2025 15:02:47 +0100 [thread overview]
Message-ID: <20250124.sei0Aur6aegu@digikod.net> (raw)
In-Reply-To: <aac17a25-eb9e-342e-953a-094ae0e86b54@huawei-partners.com>
On Fri, Jan 24, 2025 at 03:28:02PM +0300, Mikhail Ivanov wrote:
> On 1/14/2025 9:31 PM, Mickaël Salaün wrote:
> > Happy new year!
> >
> > On Fri, Jan 10, 2025 at 07:55:10PM +0300, Mikhail Ivanov wrote:
> > > On 1/10/2025 7:27 PM, Günther Noack wrote:
> > > > On Fri, Jan 10, 2025 at 04:02:42PM +0300, Mikhail Ivanov wrote:
> > > > > Correct, but we also agreed to use bitmasks for "family" field as well:
> > > > >
> > > > > https://lore.kernel.org/all/af72be74-50c7-d251-5df3-a2c63c73296a@huawei-partners.com/
> > > >
> > > > OK
> > > >
> > > >
> > > > > On 1/10/2025 2:12 PM, Günther Noack wrote:
> > > > > > I do not understand why this convenience feature in the UAPI layer
> > > > > > requires a change to the data structures that Landlock uses
> > > > > > internally? As far as I can tell, struct landlock_socket_attr is only
> > > > > > used in syscalls.c and converted to other data structures already. I
> > > > > > would have imagined that we'd "unroll" the specified bitmasks into the
> > > > > > possible combinations in the add_rule_socket() function and then call
> > > > > > landlock_append_socket_rule() multiple times with each of these?
> >
> > I agree that UAPI should not necessarily dictate the kernel
> > implementation.
> >
> > > > >
> > > > > I thought about unrolling bitmask into multiple entries in rbtree, and
> > > > > came up with following possible issue:
> > > > >
> > > > > Imagine that a user creates a rule that allows to create sockets of all
> > > > > possible families and types (with protocol=0 for example):
> > > > > {
> > > > > .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> > > > > .families = INT64_MAX, /* 64 set bits */
> > > > > .types = INT16_MAX, /* 16 set bits */
> > > > > .protocol = 0,
> > > > > },
> > > > >
> > > > > This will add 64 x 16 = 1024 entries to the rbtree. Currently, the
> > > > > struct landlock_rule, which is used to store rules, weighs 40B. So, it
> > > > > will be 40kB by a single rule. Even if we allow rules with only
> > > > > "existing" families and types, it will be 46 x 7 = 322 entries and ~12kB
> > > > > by a single rule.
> > > > >
> > > > > I understand that this may be degenerate case and most common rule will
> > > > > result in less then 8 (or 4) entries in rbtree, but I think API should
> > > > > be as intuitive as possible. User can expect to see the same
> > > > > memory usage regardless of the content of the rule.
> > > > >
> > > > > I'm not really sure if this case is really an issue, so I'd be glad
> > > > > to hear your opinion on it.
> > > >
> > > > I think there are two separate questions here:
> > > >
> > > > (a) I think it is OK that it is *possible* to allocate 40kB of kernel
> > > > memory. At least, this is already possible today by calling
> > > > landlock_add_rule() repeatedly.
> > > >
> > > > I assume that the GFP_KERNEL_ACCOUNT flag is limiting the
> > > > potential damage to the caller here? That flag was added in the
> > > > Landlock v19 patch set [1] ("Account objects to kmemcg.").
> > > >
> > > > (b) I agree it might be counterintuitive when a single
> > > > landlock_add_rule() call allocates more space than expected.
> > > >
> > > > Mickaël, since it is already possible today (but harder), I assume
> > > > that you have thought about this problem before -- is it a problem
> > > > when users allocate more kernel memory through Landlock than they
> > > > expected?
> >
> > The imbalance between the user request and the required kernel memory is
> > interesting. It would not be a security issue thanks to the
> > GFP_KERNEL_ACCOUNT, but it can be surprising for users that for some
> > requests they can receive -ENOMEM but not for quite-similar ones (e.g.
> > with only some bits different). We should keep the principle of least
> > astonishment in mind, but also the fact that the kernel implementation
> > and the related required memory may change between two kernel versions
> > for the exact same user request (because of Landlock implementation
> > differences or other parts of the kernel). In summary, we should be
> > careful to prevent users from being able to use a large amount of memory
> > with only one syscall. This which would be an usability issue.
> >
> > > >
> > > >
> > > > Naive proposal:
> > > >
> > > > If this is an issue, how about we set a low limit to the number of
> > > > families and the number of types that can be used in a single
> > > > landlock_add_rule() invocation? (It tends to be easier to start with
> > > > a restrictive API and open it up later than the other way around.)
> > >
> > > Looks like a good approach! Better to return with an error (which almost
> > > always won't happen) and let the user to refactor the code than to
> > > allow ruleset to eat an unexpected amount of memory.
> > >
> > > >
> > > > For instance,
> > > >
> > > > * In the families field, at most 2 bits may be set.
> > > > * In the types field, at most 2 bits may be set.
> > >
> > > Or we can say that rule can contain not more than 4 combinations of
> > > family and type.
> > >
> > > BTW, 4 seems to be sufficient, at least for IP protocols.
> > >
> > > >
> > > > In my understanding, the main use case of the bit vectors is that
> > > > there is a short way to say "all IPv4+v6 stream+dgram sockets", but we
> > > > do not know scenarios where much more than that is needed? With that,
> > > > we would still keep people from accidentally allocating larger amounts
> > > > of memory, while permitting the main use case.
> > >
> > > Agreed
> > >
> > > >
> > > > Having independent limits for the family and type fields is a bit
> > > > easier to understand and document than imposing a limit on the
> > > > multiplication result.
> >
> > This is a safer approach that can indeed be documented, but it looks
> > unintuitive and an arbitrary limitation. Here is another proposal:
> >
> > Let's ignore my previous suggestion to use bitfields for families and
> > protocols. To keep it simple, we can get back to the initial struct but
> > specifically handle the (i64)-1 value (which cannot be a family,
> > protocol, nor a type):
> > {
> > .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> > .family = AF_INET,
> > .type = SOCK_STREAM,
> > .protocol = -1,
> > },
> >
> > This would read: deny socket creation except for AF_INET with
> > SOCK_STREAM (and any protocol).
> >
> > Users might need to add several rules (e.g. one for AF_INET and another
> > for AF_INET6) but that's OK. Unfortunately we cannot identify a TCP
> > socket with only protocol = IPPROTO_TCP because protocol definitions
> > are relative to network families. Specifying the protocol without the
> > family should then return an error.
> >
> > Before rule could be loaded, users define how much they want to match a
> > socket: at least the family, optionally the type, and if the type is
> > also set then the protocol can also be set. These dependencies are
> > required to transform this triplet to a key number, see below.
> >
> > A landlock_ruleset_attr.handled_socket_layers field would define how
> > much we want to match a socket:
> > - 1: family only
> > - 2: family and type
> > - 3: family, type, and protocol
> >
> > According to this ruleset's property, users will be allowed to fill the
> > family, type, or protocol fields in landlock_socket_attr rules. If a
> > socket layer is not handled, it should contain (i64)-1 for the kernel to
> > detect misuse of the API.
> >
> > This enables us to get a key from this triplet:
> >
> > family_bits = 6; // 45 values for now
> > type_bits = 3; // 7 values for now
> > protocol_bits = 5; // 28 values for now
> >
> > // attr.* are the sanitized UAPI values, including -1 replaced with 0.
> > // In this example, landlock_ruleset_attr.handled_socket_layers is 3, so
> > // the key is composed of all the 3 properties.
> > landlock_key.data = attr.family << (type_bits + protocol_bits) |
> > attr.type << protocol_bits | attr.protocol;
> >
> > For each layer of restriction in a domain, we know how precise they
> > define a socket (i.e. with how many "socket layers"). We can then look
> > for at most 3 entries in the red-black tree: one with only the family,
> > another with the family and the type, and potentially a third also
> > including the protocol. Each key would have the same significant bits
> > but with the lower bits masked according to each
> > landlock_ruleset_attr.handled_socket_layers . Composing the related
> > access masks according to the defined socket layers, we can create an
> > array of struct access_masks for the request and then check if such
> > request is allowed by the current domain. As for the currently stored
> > data, we can also identify the domain layer that blocked the request
> > (required for audit).
>
> I do not quite understand why we need socket_layers. Without it,
> user can set (i64)(-1) to type or protocol whenever he wants. While
> transforming triplet to a key we can replace (i64)(-1) with some
> constant (e.g. (1 << type_bits - 1) for the type if type_bits = 8 and
> (1 << protocol_bits - 1) for the protocol if protocol_bits = 16).
We can indead add a virtual any/wildcard type and protocol, translated
from user space's (i64)-1 . The downside is that for the same domain
layer we would need to check 4 different keys for the same triplet (i.e.
famility is always set, but type and protocol are optionals). With the
socket_layers number, we only have to check for one key per domain
layer. However, in the worse case with at least 3 domain layers having
different socket_layers value, we would still need to check for 3
different keys. So, I think it's reasonable to get rid of the
socket_layers as you suggested (while requiring the family to always be
set).
>
> >
> > With this design, each sandbox can define a socket as much as it wants.
> >
> > The downside is that we lost the bitfields and we need several calls to
> > filter more complex sockets (e.g. 4 for UDP and TCP with IPv4 and IPv6),
> > which looks OK compared to the required calls for filesystem access
> > control.
> >
> > > >
> > > > > > That being said, I am not a big fan of red-black trees for such simple
> > > > > > integer lookups either, and I also think there should be something
> > > > > > better if we make more use of the properties of the input ranges. The
> > > > > > question is though whether you want to couple that to this socket type
> > > > > > patch set, or rather do it in a follow up? (So far we have been doing
> > > > > > fine with the red black trees, and we are already contemplating the
> > > > > > possibility of changing these internal structures in [2]. We have
> > > > > > also used RB trees for the "port" rules with a similar reasoning,
> > > > > > IIRC.)
> > > > >
> > > > > I think it'll be better to have a separate series for [2] if the socket
> > > > > restriction can be implemented without rbtree refactoring.
> > > >
> > > > Sounds good to me. 👍
> > > >
> > > > [1] https://lore.kernel.org/all/20200707180955.53024-2-mic@digikod.net/
> >
> > red-black trees are a good generic data structure for the current main
> > use case (for dynamic rulesets and static domains), but we'll need to
> > use more appropriate data structures. I think this should not be a
> > blocker for this patch series. It will be required to match (port)
> > ranges though (even if the use case seems limited), and in general for
> > better performances.
>
next prev parent reply other threads:[~2025-01-24 14:12 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-04 10:48 [RFC PATCH v3 00/19] Support socket access-control Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 01/19] landlock: " Mikhail Ivanov
2024-09-06 13:09 ` Günther Noack
2024-09-09 7:23 ` Mikhail Ivanov
2024-11-11 16:29 ` Mikhail Ivanov
2024-11-22 17:45 ` Günther Noack
2024-11-25 11:04 ` Mikhail Ivanov
2024-11-27 18:43 ` Mickaël Salaün
2024-11-28 12:01 ` Mikhail Ivanov
2024-11-28 20:52 ` Mickaël Salaün
2024-12-02 11:32 ` Mikhail Ivanov
2024-12-24 16:55 ` Mikhail Ivanov
2025-01-10 11:12 ` Günther Noack
2025-01-10 13:02 ` Mikhail Ivanov
2025-01-10 16:27 ` Günther Noack
2025-01-10 16:55 ` Mikhail Ivanov
2025-01-14 18:31 ` Mickaël Salaün
2025-01-24 12:28 ` Mikhail Ivanov
2025-01-24 14:02 ` Mickaël Salaün [this message]
2024-09-04 10:48 ` [RFC PATCH v3 02/19] landlock: Add hook on socket creation Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 03/19] selftests/landlock: Test basic socket restriction Mikhail Ivanov
2024-09-10 9:53 ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 04/19] selftests/landlock: Test adding a rule with each supported access Mikhail Ivanov
2024-09-10 9:53 ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 05/19] selftests/landlock: Test adding a rule for each unknown access Mikhail Ivanov
2024-09-10 9:53 ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 06/19] selftests/landlock: Test adding a rule for unhandled access Mikhail Ivanov
2024-09-10 9:22 ` Günther Noack
2024-09-11 8:19 ` Mikhail Ivanov
2024-09-13 15:04 ` Günther Noack
2024-09-13 16:15 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 07/19] selftests/landlock: Test adding a rule for empty access Mikhail Ivanov
2024-09-18 12:42 ` Günther Noack
2024-09-18 13:03 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 08/19] selftests/landlock: Test overlapped restriction Mikhail Ivanov
2024-09-18 12:42 ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 09/19] selftests/landlock: Test creating a ruleset with unknown access Mikhail Ivanov
2024-09-18 12:44 ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 10/19] selftests/landlock: Test adding a rule with family and type outside the range Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 11/19] selftests/landlock: Test unsupported protocol restriction Mikhail Ivanov
2024-09-18 12:54 ` Günther Noack
2024-09-18 13:36 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 12/19] selftests/landlock: Test that kernel space sockets are not restricted Mikhail Ivanov
2024-09-04 12:45 ` Mikhail Ivanov
2024-09-18 13:00 ` Günther Noack
2024-09-19 10:53 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 13/19] selftests/landlock: Test packet protocol alias Mikhail Ivanov
2024-09-18 13:33 ` Günther Noack
2024-09-18 14:01 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 14/19] selftests/landlock: Test socketpair(2) restriction Mikhail Ivanov
2024-09-18 13:47 ` Günther Noack
2024-09-23 12:57 ` Mikhail Ivanov
2024-09-25 12:17 ` Mikhail Ivanov
2024-09-27 9:48 ` Günther Noack
2024-09-28 20:06 ` Günther Noack
2024-09-29 17:31 ` Mickaël Salaün
2024-10-03 17:27 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 15/19] selftests/landlock: Test SCTP peeloff restriction Mikhail Ivanov
2024-09-27 14:35 ` Günther Noack
2024-10-03 12:15 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 16/19] selftests/landlock: Test that accept(2) is not restricted Mikhail Ivanov
2024-09-27 14:53 ` Günther Noack
2024-10-03 12:41 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 17/19] samples/landlock: Replace atoi() with strtoull() in populate_ruleset_net() Mikhail Ivanov
2024-09-27 15:12 ` Günther Noack
2024-10-03 12:59 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 18/19] samples/landlock: Support socket protocol restrictions Mikhail Ivanov
2024-10-01 7:56 ` Günther Noack
2024-10-03 13:15 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 19/19] landlock: Document socket rule type support Mikhail Ivanov
2024-10-01 7:09 ` Günther Noack
2024-10-03 14:00 ` Mikhail Ivanov
2024-10-03 16:21 ` Günther Noack
2025-04-22 17:19 ` [RFC PATCH v3 00/19] Support socket access-control Mickaël Salaün
2025-04-25 13:58 ` Günther Noack
2025-04-29 11:59 ` Mikhail Ivanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250124.sei0Aur6aegu@digikod.net \
--to=mic@digikod$(echo .)net \
--cc=artem.kuzin@huawei$(echo .)com \
--cc=gnoack3000@gmail$(echo .)com \
--cc=gnoack@google$(echo .)com \
--cc=ivanov.mikhail1@huawei-partners$(echo .)com \
--cc=konstantin.meskhidze@huawei$(echo .)com \
--cc=linux-security-module@vger$(echo .)kernel.org \
--cc=matthieu@buffet$(echo .)re \
--cc=netdev@vger$(echo .)kernel.org \
--cc=netfilter-devel@vger$(echo .)kernel.org \
--cc=willemdebruijn.kernel@gmail$(echo .)com \
--cc=yusongping@huawei$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox