public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox•net>
To: Jamal Hadi Salim <jhs@mojatatu•com>,
	Shmulik Ladkani <shmulik.ladkani@gmail•com>
Cc: "David S. Miller" <davem@davemloft•net>,
	WANG Cong <xiyou.wangcong@gmail•com>,
	Eric Dumazet <edumazet@google•com>,
	netdev@vger•kernel.org, Florian Westphal <fw@strlen•de>
Subject: Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
Date: Sun, 25 Sep 2016 18:26:47 +0200	[thread overview]
Message-ID: <57E7FAC7.6090904@iogearbox.net> (raw)
In-Reply-To: <6d2bd45a-a8a0-846d-5934-5e246522cab8@mojatatu.com>

On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote:
> On 16-09-23 11:40 AM, Shmulik Ladkani wrote:
>> On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim <jhs@mojatatu•com> wrote:
>>>> Even today, one may create loops using existing 'egress redirect',
>>>> e.g. this rediculously errorneous construct:
>>>>
>>>>  # ip l add v0 type veth peer name v0p
>>>>  # tc filter add dev v0p parent ffff: basic \
>>>>     action mirred egress redirect dev v0
>>>
>>> I think we actually recover from this one by eventually
>>> dropping (theres a ttl field).
>>
>> [off topic]
>>
>> Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
>> and after one second I got:
>>
>> # ip -s l show type veth
>> 16: v0p@v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>     link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     71660305923 469890864 0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     3509       24       0       0       0       0
>> 17: v0@v0p: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>     link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     3509       24       0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     71660713017 469893555 0       0       0       0
>>
>
> I think this is still on topic!
>
> Now I realize that code we took out around 4.2.x is still useful
> for such a use case (I wasnt thinking about veth when Florian was
> slimming the skb);
> +Cc Florian W.
>
> This snippet from 4.2:
> -------------
> 3525 static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
> 3526 {
> 3527         struct net_device *dev = skb->dev;
> 3528         u32 ttl = G_TC_RTTL(skb->tc_verd);
> 3529         int result = TC_ACT_OK;
> 3530         struct Qdisc *q;
> 3531
> 3532         if (unlikely(MAX_RED_LOOP < ttl++)) {
> 3533                 net_warn_ratelimited("Redir loop detected Dropping packet (%d->%d)\n",
> 3534                                      skb->skb_iif, dev->ifindex);
> 3535                 return TC_ACT_SHOT;
> 3536         }
> 3537
> 3538         skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
> 3539         skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
> 3540
> 3541         q = rcu_dereference(rxq->qdisc);
> 3542         if (q != &noop_qdisc) {
> 3543                 spin_lock(qdisc_lock(q));
> 3544                 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
> 3545                         result = qdisc_enqueue_root(skb, q);
> 3546                 spin_unlock(qdisc_lock(q));
> 3547         }
> 3548
> 3549         return result;
> 3550 }
> --------------------
>
> MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
> current code. The idea above was that we would increment the rttl
> counter  once and if we saw it again upto MAX_RED_LOOP we would assume
> a loop and drop the packet (at the time i didnt think it was wise to
> let the actions be in charge of setting the RTTL; it had to be central
> core code - but it may not be neccessary)
>
> Florian, when we discussed I said it was fine to reclaim those 3 bits
> on tc verdict for RTTL at the time because i had taken out the
> feature and never added it back. Your comment at the time was we can
> add it back when someone shows up with the feature.
> Shmulik is looking to add it.

Why not just reuse xmit_recursion, which is what we did in tc cls_bpf
programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on
this in the skb.

>> Similarly to all constructs injecting skbs to device rx (bond/team,
>> vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are
>> obligated to assign 'skb2->dev' as the new rx device.
>>
>> Regarding 'skb2->skb_iif', original act_mirred code already has:
>>
>>      skb2->skb_iif = skb->dev->ifindex;   <--- THIS IS ORIG DEV IIF
>>      skb2->dev = dev;                     <--- THIS IS TARGET DEV
>>     err = dev_queue_xmit(skb2);
>>
>> I'm preserving this; OTOH the suggested modification in the patch is
>>
>> -    err = dev_queue_xmit(skb2);
>> +    if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
>> +        err = dev_queue_xmit(skb2);
>> +    else
>> +        netif_receive_skb(skb2);
>>
>> now, the call to 'netif_receive_skb' will eventually override skb_iif to
>> the target RX dev's index, upon entry to __netif_receive_skb_core.
>>
>> I think this IS the expected behavior - as done by other "rx injection"
>> constructs.
>
> Sounds fine.
> I am wondering if we can have a tracing feature to show the lifetime of
> the packet as it is being cycled around the kernel? It would help
> debugging if some policy misbehaves.
>
>> My doubts were around whether we should call 'dev_forward_skb' instead
>> of 'netif_receive_skb'.
>> The former does some things I assumed we're not interested of, like
>> testing 'is_skb_forwardable' and re-running 'eth_type_trans'.
>> OTOH, it DOES scrub the skb.
>> Maybe we should scrub it as well prior the netif_receive_skb call?
>
> Scrubbing the skb could be a bad idea if it gets rid of global state
> like the RTTL if you add it back.
>
> cheers,
> jamal
>

  reply	other threads:[~2016-09-25 16:26 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-22 13:21 [PATCH net-next 0/4] act_mirred: Ingress actions support Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 1/4] net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit Shmulik Ladkani
2016-09-27 10:30   ` Daniel Borkmann
2016-09-27 18:24     ` Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 2/4] net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 3/4] net/sched: tc_mirred: Rename public predicates 'is_tcf_mirred_redirect' and 'is_tcf_mirred_mirror' Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions Shmulik Ladkani
2016-09-22 14:54   ` Eric Dumazet
2016-09-22 18:27     ` Shmulik Ladkani
2016-09-22 18:42       ` Eric Dumazet
2016-09-22 23:40   ` Jamal Hadi Salim
2016-09-23  5:11     ` Shmulik Ladkani
2016-09-23 12:48       ` Jamal Hadi Salim
2016-09-23 15:40         ` Shmulik Ladkani
2016-09-25  0:20           ` Cong Wang
2016-09-25 13:05           ` Jamal Hadi Salim
2016-09-25 16:26             ` Daniel Borkmann [this message]
2016-09-25 18:33               ` Florian Westphal
2016-09-25 23:47                 ` Jamal Hadi Salim
2016-09-25 23:31               ` Jamal Hadi Salim
2016-09-25 17:33             ` Shmulik Ladkani
2016-09-25 18:31               ` Florian Westphal
2016-09-26  1:15                 ` Jamal Hadi Salim
2016-09-26  1:35                   ` Florian Westphal
2016-09-26  1:40                     ` Jamal Hadi Salim
2016-09-26 14:43                     ` Hannes Frederic Sowa
2016-09-26 14:53                       ` Daniel Borkmann
2016-09-26 15:12                         ` Hannes Frederic Sowa
2016-09-26 15:53                           ` Daniel Borkmann
2016-09-26 15:26                       ` Shmulik Ladkani
2016-09-25 23:45               ` Jamal Hadi Salim
2016-09-25  0:07       ` Cong Wang
2016-09-25 13:39         ` Jamal Hadi Salim
2016-09-26  4:55           ` Cong Wang
2016-09-25 17:59         ` Shmulik Ladkani
2016-09-26  4:56           ` Cong Wang
2016-09-24 23:50   ` Cong Wang
2016-09-27  5:56   ` David Miller
2016-09-27  8:07     ` Shmulik Ladkani
2016-09-27 10:39       ` Daniel Borkmann
2016-09-27 13:44         ` David Miller
2016-09-27 14:18           ` Shmulik Ladkani
2016-09-27 14:47             ` Daniel Borkmann
2016-09-27 14:06       ` Jamal Hadi Salim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57E7FAC7.6090904@iogearbox.net \
    --to=daniel@iogearbox$(echo .)net \
    --cc=davem@davemloft$(echo .)net \
    --cc=edumazet@google$(echo .)com \
    --cc=fw@strlen$(echo .)de \
    --cc=jhs@mojatatu$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=shmulik.ladkani@gmail$(echo .)com \
    --cc=xiyou.wangcong@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox