public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner•de>
To: Daniel Borkmann <daniel@iogearbox•net>
Cc: Pablo Neira Ayuso <pablo@netfilter•org>,
	netfilter-devel@vger•kernel.org, davem@davemloft•net,
	netdev@vger•kernel.org, kuba@kernel•org, kadlec@netfilter•org,
	fw@strlen•de, ast@kernel•org, edumazet@google•com, tgraf@suug•ch,
	nevola@gmail•com, john.fastabend@gmail•com, willemb@google•com
Subject: Re: [PATCH nf-next v5 0/6] Netfilter egress hook
Date: Thu, 30 Sep 2021 08:52:38 +0200	[thread overview]
Message-ID: <20210930065238.GA28709@wunner.de> (raw)
In-Reply-To: <e4f1700c-c299-7091-1c23-60ec329a5b8d@iogearbox.net>

On Thu, Sep 30, 2021 at 08:08:53AM +0200, Daniel Borkmann wrote:
> Hm, so in the case of SRv6 users were running into a similar issue
> and commit 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6
> data plane") [0] added a new hook along with a sysctl which defaults
> the new hook to off.
> 
> The rationale for it was given as "the hooks are enabled via
> nf_hooks_lwtunnel sysctl to make sure existing netfilter rulesets
>  do not break." [0,1]
> 
> If the suggestion to flag the skb [2] one way or another from the
> tc forwarding path (e.g. skb bit or per-cpu marker) is not
> technically feasible, then why not do a sysctl toggle like in the
> SRv6 case?

The skb flag *is* technically feasible.  I amended the patches with
the flag and was going to post them this week, but Pablo beat me to
the punch and posted his alternative version, which lacks the flag
but modularizes netfilter ingress/egress processing instead.

Honestly I think a hodge-podge of config options and sysctl toggles
is awful and I would prefer the skb flag you suggested.  I kind of
like your idea of considering tc and netfilter as layers.

FWIW the finished patches *with* the flag are on this branch:
https://github.com/l1k/linux/commits/nft_egress_v5

Below is the "git range-diff" between Pablo's patches and mine
(just the hunks which pertain to the skb flag, plus excerpts
from the commit message).

Would you find the patch set acceptable with this skb flag?

-- >8 --

    +    Alternatively or in addition to netfilter, packets can be classified
    +    with traffic control (tc).  On ingress, packets are classified first by
    +    tc, then by netfilter.  On egress, the order is reversed for symmetry.
    +    Conceptually, tc and netfilter can be thought of as layers, with
    +    netfilter layered above tc.
     
    +    Traffic control is capable of redirecting packets to another interface
    +    (man 8 tc-mirred).  E.g., an ingress packet may be redirected from the
    +    host namespace to a container via a veth connection:
    +    tc ingress (host) -> tc egress (veth host) -> tc ingress (veth container)
     
    +    In this case, netfilter egress classifying is not performed when leaving
    +    the host namespace!  That's because the packet is still on the tc layer.
    +    If tc redirects the packet to a physical interface in the host namespace
    +    such that it leaves the system, the packet is never subjected to
    +    netfilter egress classifying.  That is only logical since it hasn't
    +    passed through netfilter ingress classifying either.
     
    +    Packets can alternatively be redirected at the netfilter layer using
    +    nft fwd.  Such a packet *is* subjected to netfilter egress classifying.
    +    Internally, the skb->nf_skip_egress flag controls whether netfilter is
    +    invoked on egress by __dev_queue_xmit().
    +
    +    Interaction between tc and netfilter is possible by setting and querying
    +    skb->mark.
     
    @@ include/linux/netfilter_netdev.h: static inline int nf_hook_ingress(struct sk_bu
     +static inline struct sk_buff *nf_hook_egress(struct sk_buff *skb, int *rc,
     +					     struct net_device *dev)
     +{
    -+	struct nf_hook_entries *e = rcu_dereference(dev->nf_hooks_egress);
    ++	struct nf_hook_entries *e;
     +	struct nf_hook_state state;
     +	int ret;
     +
    ++	if (skb->nf_skip_egress)
    ++		return skb;
    ++
    ++	e = rcu_dereference(dev->nf_hooks_egress);
     +	if (!e)
     +		return skb;
     +
    @@ include/linux/netfilter_netdev.h: static inline int nf_hook_ingress(struct sk_bu
     +		return NULL;
     +	}
     +}
    ++
    ++static inline void nf_skip_egress(struct sk_buff *skb, bool skip)
    ++{
    ++	skb->nf_skip_egress = skip;
    ++}
     +#else /* CONFIG_NETFILTER_EGRESS */
     +static inline bool nf_hook_egress_active(void)
     +{
    @@ include/linux/netfilter_netdev.h: static inline int nf_hook_ingress(struct sk_bu
     +{
     +	return skb;
     +}
    ++
    ++static inline void nf_skip_egress(struct sk_buff *skb, bool skip)
    ++{
    ++}
     +#endif /* CONFIG_NETFILTER_EGRESS */
     +
      static inline void nf_hook_netdev_init(struct net_device *dev)
    @@ include/linux/netfilter_netdev.h: static inline int nf_hook_ingress(struct sk_bu
      
      #endif /* _NETFILTER_NETDEV_H_ */
     
    + ## include/linux/skbuff.h ##
    +@@ include/linux/skbuff.h: typedef unsigned char *sk_buff_data_t;
    +  *	@tc_at_ingress: used within tc_classify to distinguish in/egress
    +  *	@redirected: packet was redirected by packet classifier
    +  *	@from_ingress: packet was redirected from the ingress path
    ++ *	@nf_skip_egress: packet shall skip netfilter egress processing
    +  *	@peeked: this packet has been seen already, so stats have been
    +  *		done for it, don't do them again
    +  *	@nf_trace: netfilter packet trace flag
    +@@ include/linux/skbuff.h: struct sk_buff {
    + #ifdef CONFIG_NET_REDIRECT
    + 	__u8			from_ingress:1;
    + #endif
    ++#ifdef CONFIG_NETFILTER_EGRESS
    ++	__u8			nf_skip_egress:1;
    ++#endif
    + #ifdef CONFIG_TLS_DEVICE
    + 	__u8			decrypted:1;
    + #endif
    +
      ## include/uapi/linux/netfilter.h ##
     @@ include/uapi/linux/netfilter.h: enum nf_inet_hooks {
      
    @@ net/core/dev.c: static int __dev_queue_xmit(struct sk_buff *skb, struct net_devi
     +			if (!skb)
     +				goto out;
     +		}
    ++		nf_skip_egress(skb, true);
      		skb = sch_handle_egress(skb, &rc, dev);
      		if (!skb)
      			goto out;
    @@ net/core/dev.c: static int __dev_queue_xmit(struct sk_buff *skb, struct net_devi
      #endif
      	/* If device/qdisc don't need skb->dst, release it right now while
      	 * its hot in this cpu cache.
    +@@ net/core/dev.c: static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
    + 	if (static_branch_unlikely(&ingress_needed_key)) {
    + 		bool another = false;
    + 
    ++		nf_skip_egress(skb, true);
    + 		skb = sch_handle_ingress(skb, &pt_prev, &ret, orig_dev,
    + 					 &another);
    + 		if (another)
    +@@ net/core/dev.c: static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
    + 		if (!skb)
    + 			goto out;
    + 
    ++		nf_skip_egress(skb, false);
    + 		if (nf_ingress(skb, &pt_prev, &ret, orig_dev) < 0)
    + 			goto out;
    + 	}


  reply	other threads:[~2021-09-30  7:00 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-28  9:55 [PATCH nf-next v5 0/6] Netfilter egress hook Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 1/6] netfilter: Rename ingress hook include file Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 2/6] netfilter: Generalize " Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 3/6] netfilter: nf_tables: move netdev ingress filter chain to nf_tables_netdev.c Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 4/6] netfilter: Introduce egress hook Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 5/6] af_packet: " Pablo Neira Ayuso
2021-09-28  9:55 ` [PATCH nf-next v5 6/6] netfilter: nf_tables: add egress support Pablo Neira Ayuso
2021-09-30  6:08 ` [PATCH nf-next v5 0/6] Netfilter egress hook Daniel Borkmann
2021-09-30  6:52   ` Lukas Wunner [this message]
2021-09-30  7:10     ` Daniel Borkmann
2021-09-30  7:21     ` Pablo Neira Ayuso
2021-09-30  7:19   ` Pablo Neira Ayuso
2021-09-30  7:33     ` Daniel Borkmann
2021-09-30  9:21       ` Pablo Neira Ayuso
2021-09-30 14:28         ` Jakub Kicinski
2021-09-30 15:13           ` Pablo Neira Ayuso
2021-09-30 16:06             ` Jakub Kicinski
2021-09-30 18:00               ` Pablo Neira Ayuso
2021-09-30 19:17                 ` Jakub Kicinski
2021-09-30 17:12           ` Lukas Wunner
2021-09-30 17:19             ` Jakub Kicinski
2021-09-30 17:36               ` Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210930065238.GA28709@wunner.de \
    --to=lukas@wunner$(echo .)de \
    --cc=ast@kernel$(echo .)org \
    --cc=daniel@iogearbox$(echo .)net \
    --cc=davem@davemloft$(echo .)net \
    --cc=edumazet@google$(echo .)com \
    --cc=fw@strlen$(echo .)de \
    --cc=john.fastabend@gmail$(echo .)com \
    --cc=kadlec@netfilter$(echo .)org \
    --cc=kuba@kernel$(echo .)org \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=netfilter-devel@vger$(echo .)kernel.org \
    --cc=nevola@gmail$(echo .)com \
    --cc=pablo@netfilter$(echo .)org \
    --cc=tgraf@suug$(echo .)ch \
    --cc=willemb@google$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox