From: Nicolas Dichtel <nicolas.dichtel@6wind•com>
To: "Eric W. Biederman" <ebiederm@xmission•com>
Cc: netdev@vger•kernel.org, davem@davemloft•net, bcrl@kvack•org,
ravi.mlists@gmail•com
Subject: Re: [RFC PATCH net-next 2/2] sit: add support of x-netns
Date: Tue, 25 Jun 2013 16:10:43 +0200 [thread overview]
Message-ID: <51C9A4E3.2060906@6wind.com> (raw)
In-Reply-To: <874ncni114.fsf@xmission.com>
Le 25/06/2013 00:42, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind•com> writes:
>
>> Le 24/06/2013 21:28, Eric W. Biederman a écrit :
>>> Nicolas Dichtel <nicolas.dichtel@6wind•com> writes:
>>>
>>>> This patch allows to switch the netns when packet is encapsulated or
>>>> decapsulated. In other word, the encapsulated packet is received in a netns,
>>>> where the lookup is done to find the tunnel. Once the tunnel is found, the
>>>> packet is decapsulated and injecting into the corresponding interface which
>>>> stands to another netns.
>>>>
>>>> When one of the two netns is removed, the tunnel is destroyed.
>>>
>>> I don't see any fundamental problems with this code. There are bugs
>>> with the cleanup noted below.
>>>
>>> The primary sit interface is marked as NETNS_LOCAL which is good. A
>>> comment might be nice explaining the reasonsing but for code
>>> archeologists.
>> Ok.
>>
>>>
>>> Conditionally calling dev_cleanup_skb bugs me. The extra conditional
>>> looks like a maintenance hazard. Unless I have missed some subtle
>>> detail either we don't need the cleanup at all or actually it is a bug
>>> that we aren't scrubbing our packets as they progress through tunnels
>>> even in the same network namespace.
>>>
>>> Can we just make that function the skb scrubbing needed for packets to
>>> traverse a tunnel?
>>>
>>> My concern going into this was that we would get code that would break
>>> because it would not be tested enough. If we can remove the conditional
>>> from dev_cleanup_skb we won't have any code that is conditionally run
>>> and the logic looks simple enough not to bitrot in routine maintenance.
>> My idea was to have the same level of cleanup/scrubbing that when a packet is
>> sent from a netns to another netns through a veth. I cannot use
>> dev_forward_skb() because this function expects to have an ethernet header, it's
>> why I split it in the patch #1.
>>
>> If we leave all information attached to the skb, we may have, for example, an
>> skb with a conntrack from netns1 and a netdevice from netns2. It seems not safe,
>> but maybe I'm wrong. And in fact, the conntrack will not be created in the
>> second netns (nf_conntrack_in() => skb->nfct is not null and not a template =>
>> stats ignore++).
>> Another example is a socket from a netns and the netdevice or conntrack from
>> another netns.
>
> All of that I agree with.
>
> I just don't see any need to make that scrubbing/cleaning of the packet
> conditional.
>
> Semantically going through a tunnel is the same as crossing between
> network namespaces. So you can change
>
>>>> + if (tunnel->net != dev_net(tunnel->dev))
>>>> + dev_cleanup_skb(skb);
>
> to just:
>
> dev_cleanup_skb(skb);
>
>> I was thinking that when a packet enter a namespace, it must not be associated
>> to any object from the previous namespace, it should be like if we just receive
>> it on the host.
>
> Overall agree. Tunnels have the same properties.
>
> Which leads me to conclude either we are missing something or the
> current tunnel code is mildly buggy because it does not do this level of
> scrubbing.
I'm afraid to break an existing scenario, but you're probably right. Let's
remove this test.
Nicolas
>
> Eric
>
>>>> -static void __net_exit sit_destroy_tunnels(struct sit_net *sitn, struct list_head *head)
>>>> +static void __net_exit sit_destroy_tunnels(struct net *net,
>>>> + struct list_head *head)
>>>> {
>>>> - int prio;
>>>> + struct net_device *dev, *aux;
>>>>
>>>> - for (prio = 1; prio < 4; prio++) {
>>>> - int h;
>>>> - for (h = 0; h < HASH_SIZE; h++) {
>>>> - struct ip_tunnel *t;
>>>> -
>>>> - t = rtnl_dereference(sitn->tunnels[prio][h]);
>>>> - while (t != NULL) {
>>>> - unregister_netdevice_queue(t->dev, head);
>>>> - t = rtnl_dereference(t->next);
>>>> - }
>>>> - }
>>>> - }
>>>> + for_each_netdev_safe(net, dev, aux)
>>>> + if (dev->rtnl_link_ops &&
>>>> + !strcmp(dev->rtnl_link_ops->kind, "sit"))
>>>> + unregister_netdevice_queue(dev, head);
>>>
>>> This entire idiom change is a bit ugly, and it is wrong.
>>>
>>> You need to look for two classes of tunnels to take down. Tunnels that
>>> originate in net and tunnels whose netdevice is in net.
>>>
>>> For tunnles that reside in net you should be able to just compare the
>>> rtnl_link_ops pointer, rather than an ascii name.
>>>
>>> Tunnels that originate in this network namespace most definitely need to
>>> be taken down as among other things you wisely do not keep a reference
>>> count on the originating network namespace.
>> Yes sure. My beta version was doing the right things, but I change this code
>> before sending the patch :/
>
> Bahahaha! The dangers of the last minute cleanup.
>
> Eric
next prev parent reply other threads:[~2013-06-25 14:10 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-23 17:49 switching network namespace midway rsa
2012-10-24 21:11 ` Eric W. Biederman
2012-10-24 21:21 ` Benjamin LaHaise
2012-10-25 1:37 ` Eric W. Biederman
2012-10-25 14:38 ` Benjamin LaHaise
2012-10-25 16:21 ` Stephen Hemminger
2012-10-28 5:43 ` Eric W. Biederman
2012-10-29 14:23 ` Stephen Hemminger
2012-10-30 0:21 ` Eric W. Biederman
2012-10-30 8:55 ` James Chapman
2012-10-25 15:12 ` rsa
2012-10-25 15:29 ` rsa
2012-10-25 15:59 ` Benjamin LaHaise
2012-10-25 16:15 ` Eric W. Biederman
2012-11-02 2:25 ` Benjamin LaHaise
2012-11-02 6:18 ` Eric W. Biederman
2012-11-02 14:03 ` Benjamin LaHaise
2012-11-02 20:45 ` Eric W. Biederman
2013-06-24 14:13 ` [RFC PATCH net-next 0/2] sit: allow to switch netns during encap/decap Nicolas Dichtel
2013-06-24 14:13 ` [RFC PATCH net-next 1/2] dev: introduce dev_cleanup_skb() Nicolas Dichtel
2013-06-24 18:13 ` Ben Hutchings
2013-06-24 19:05 ` Eric W. Biederman
2013-06-24 14:13 ` [RFC PATCH net-next 2/2] sit: add support of x-netns Nicolas Dichtel
2013-06-24 19:28 ` Eric W. Biederman
2013-06-24 21:11 ` Nicolas Dichtel
2013-06-24 22:42 ` Eric W. Biederman
2013-06-25 14:10 ` Nicolas Dichtel [this message]
2013-06-25 14:24 ` [PATCH v2 net-next 0/2] sit: allow to switch netns during encap/decap Nicolas Dichtel
2013-06-25 14:24 ` [PATCH v2 net-next 1/2] dev: introduce skb_scrub_packet() Nicolas Dichtel
2013-06-25 14:24 ` [PATCH v2 net-next 2/2] sit: add support of x-netns Nicolas Dichtel
2013-06-25 23:56 ` David Miller
2013-06-26 1:35 ` Eric W. Biederman
2013-06-26 5:48 ` David Miller
2013-06-26 10:03 ` Eric W. Biederman
2013-06-26 10:22 ` Eric Dumazet
2013-06-26 12:15 ` Nicolas Dichtel
2013-06-26 14:11 ` [PATCH v3 net-next 0/2] sit: allow to switch netns during encap/decap Nicolas Dichtel
2013-06-26 14:11 ` [PATCH v3 net-next 1/2] dev: introduce skb_scrub_packet() Nicolas Dichtel
2013-06-26 14:11 ` [PATCH v3 net-next 2/2] sit: add support of x-netns Nicolas Dichtel
2013-06-28 5:36 ` [PATCH v3 net-next 0/2] sit: allow to switch netns during encap/decap David Miller
2013-07-03 15:00 ` [PATCH net-next 0/3] ipip/ip6tnl: " Nicolas Dichtel
2013-07-03 15:00 ` [PATCH net-next 1/3] sit: fix tunnel update via netlink Nicolas Dichtel
2013-07-03 15:00 ` [PATCH net-next 2/3] ipip: add x-netns support Nicolas Dichtel
2013-07-03 15:00 ` [PATCH net-next 3/3] ip6tnl: " Nicolas Dichtel
2013-07-04 21:56 ` [PATCH net-next 0/3] ipip/ip6tnl: allow to switch netns during encap/decap David Miller
2013-08-13 15:51 ` [PATCH net-next v2 0/4] " Nicolas Dichtel
2013-08-13 15:51 ` [PATCH net-next v2 1/4] dev: move skb_scrub_packet() after eth_type_trans() Nicolas Dichtel
2013-08-13 15:51 ` [PATCH net-next v2 2/4] ipv4 tunnels: use net_eq() helper to check netns Nicolas Dichtel
2013-08-13 15:51 ` [PATCH net-next v2 3/4] ipip: add x-netns support Nicolas Dichtel
2013-08-13 15:51 ` [PATCH net-next v2 4/4] ip6tnl: " Nicolas Dichtel
2013-08-15 8:01 ` [PATCH net-next v2 0/4] ipip/ip6tnl: allow to switch netns during encap/decap David Miller
2013-06-26 13:49 ` [PATCH v2 net-next 2/2] sit: add support of x-netns Nicolas Dichtel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51C9A4E3.2060906@6wind.com \
--to=nicolas.dichtel@6wind$(echo .)com \
--cc=bcrl@kvack$(echo .)org \
--cc=davem@davemloft$(echo .)net \
--cc=ebiederm@xmission$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=ravi.mlists@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox