public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Nicolas Dichtel <nicolas.dichtel@6wind•com>
To: "Eric W. Biederman" <ebiederm@xmission•com>
Cc: netdev@vger•kernel.org, davem@davemloft•net, bcrl@kvack•org,
	ravi.mlists@gmail•com
Subject: Re: [RFC PATCH net-next 2/2] sit: add support of x-netns
Date: Tue, 25 Jun 2013 16:10:43 +0200	[thread overview]
Message-ID: <51C9A4E3.2060906@6wind.com> (raw)
In-Reply-To: <874ncni114.fsf@xmission.com>

Le 25/06/2013 00:42, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind•com> writes:
>
>> Le 24/06/2013 21:28, Eric W. Biederman a écrit :
>>> Nicolas Dichtel <nicolas.dichtel@6wind•com> writes:
>>>
>>>> This patch allows to switch the netns when packet is encapsulated or
>>>> decapsulated. In other word, the encapsulated packet is received in a netns,
>>>> where the lookup is done to find the tunnel. Once the tunnel is found, the
>>>> packet is decapsulated and injecting into the corresponding interface which
>>>> stands to another netns.
>>>>
>>>> When one of the two netns is removed, the tunnel is destroyed.
>>>
>>> I don't see any fundamental problems with this code.  There are bugs
>>> with the cleanup noted below.
>>>
>>> The primary sit interface is marked as NETNS_LOCAL which is good.  A
>>> comment might be nice explaining the reasonsing but for code
>>> archeologists.
>> Ok.
>>
>>>
>>> Conditionally calling dev_cleanup_skb bugs me.  The extra conditional
>>> looks like a maintenance hazard.   Unless I have missed some subtle
>>> detail either we don't need the cleanup at all or actually it is a bug
>>> that we aren't scrubbing our packets as they progress through tunnels
>>> even in the same network namespace.
>>>
>>> Can we just make that function the skb scrubbing needed for packets to
>>> traverse a tunnel?
>>>
>>> My concern going into this was that we would get code that would break
>>> because it would not be tested enough.  If we can remove the conditional
>>> from dev_cleanup_skb we won't have any code that is conditionally run
>>> and the logic looks simple enough not to bitrot in routine maintenance.
>> My idea was to have the same level of cleanup/scrubbing that when a packet is
>> sent from a netns to another netns through a veth. I cannot use
>> dev_forward_skb() because this function expects to have an ethernet header, it's
>> why I split it in the patch #1.
>>
>> If we leave all information attached to the skb, we may have, for example, an
>> skb with a conntrack from netns1 and a netdevice from netns2. It seems not safe,
>> but maybe I'm wrong. And in fact, the conntrack will not be created in the
>> second netns (nf_conntrack_in() => skb->nfct is not null and not a template =>
>> stats ignore++).
>> Another example is a socket from a netns and the netdevice or conntrack from
>> another netns.
>
> All of that I agree with.
>
> I just don't see any need to make that scrubbing/cleaning of the packet
> conditional.
>
> Semantically going through a tunnel is the same as crossing between
> network namespaces.  So you can change
>
>>>> +	if (tunnel->net != dev_net(tunnel->dev))
>>>> +		dev_cleanup_skb(skb);
>
> to just:
>
> 	dev_cleanup_skb(skb);
>
>> I was thinking that when a packet enter a namespace, it must not be associated
>> to any object from the previous namespace, it should be like if we just receive
>> it on the host.
>
> Overall agree.  Tunnels have the same properties.
>
> Which leads me to conclude either we are missing something or the
> current tunnel code is mildly buggy because it does not do this level of
> scrubbing.
I'm afraid to break an existing scenario, but you're probably right. Let's 
remove this test.


Nicolas

>
> Eric
>
>>>> -static void __net_exit sit_destroy_tunnels(struct sit_net *sitn, struct list_head *head)
>>>> +static void __net_exit sit_destroy_tunnels(struct net *net,
>>>> +					   struct list_head *head)
>>>>    {
>>>> -	int prio;
>>>> +	struct net_device *dev, *aux;
>>>>
>>>> -	for (prio = 1; prio < 4; prio++) {
>>>> -		int h;
>>>> -		for (h = 0; h < HASH_SIZE; h++) {
>>>> -			struct ip_tunnel *t;
>>>> -
>>>> -			t = rtnl_dereference(sitn->tunnels[prio][h]);
>>>> -			while (t != NULL) {
>>>> -				unregister_netdevice_queue(t->dev, head);
>>>> -				t = rtnl_dereference(t->next);
>>>> -			}
>>>> -		}
>>>> -	}
>>>> +	for_each_netdev_safe(net, dev, aux)
>>>> +		if (dev->rtnl_link_ops &&
>>>> +		    !strcmp(dev->rtnl_link_ops->kind, "sit"))
>>>> +			unregister_netdevice_queue(dev, head);
>>>
>>> This entire idiom change is a bit ugly, and it is wrong.
>>>
>>> You need to look for two classes of tunnels to take down.  Tunnels that
>>> originate in net and tunnels whose netdevice is in net.
>>>
>>> For tunnles that reside in net you should be able to just compare the
>>> rtnl_link_ops pointer, rather than an ascii name.
>>>
>>> Tunnels that originate in this network namespace most definitely need to
>>> be taken down as among other things you wisely do not keep a reference
>>> count on the originating network namespace.
>> Yes sure. My beta version was doing the right things, but I change this code
>> before sending the patch :/
>
> Bahahaha!  The dangers of the last minute cleanup.
>
> Eric

  reply	other threads:[~2013-06-25 14:10 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-23 17:49 switching network namespace midway rsa
2012-10-24 21:11 ` Eric W. Biederman
2012-10-24 21:21   ` Benjamin LaHaise
2012-10-25  1:37     ` Eric W. Biederman
2012-10-25 14:38       ` Benjamin LaHaise
2012-10-25 16:21         ` Stephen Hemminger
2012-10-28  5:43           ` Eric W. Biederman
2012-10-29 14:23             ` Stephen Hemminger
2012-10-30  0:21               ` Eric W. Biederman
2012-10-30  8:55                 ` James Chapman
2012-10-25 15:12     ` rsa
2012-10-25 15:29     ` rsa
2012-10-25 15:59       ` Benjamin LaHaise
2012-10-25 16:15         ` Eric W. Biederman
2012-11-02  2:25           ` Benjamin LaHaise
2012-11-02  6:18             ` Eric W. Biederman
2012-11-02 14:03               ` Benjamin LaHaise
2012-11-02 20:45                 ` Eric W. Biederman
2013-06-24 14:13                   ` [RFC PATCH net-next 0/2] sit: allow to switch netns during encap/decap Nicolas Dichtel
2013-06-24 14:13                     ` [RFC PATCH net-next 1/2] dev: introduce dev_cleanup_skb() Nicolas Dichtel
2013-06-24 18:13                       ` Ben Hutchings
2013-06-24 19:05                         ` Eric W. Biederman
2013-06-24 14:13                     ` [RFC PATCH net-next 2/2] sit: add support of x-netns Nicolas Dichtel
2013-06-24 19:28                       ` Eric W. Biederman
2013-06-24 21:11                         ` Nicolas Dichtel
2013-06-24 22:42                           ` Eric W. Biederman
2013-06-25 14:10                             ` Nicolas Dichtel [this message]
2013-06-25 14:24                               ` [PATCH v2 net-next 0/2] sit: allow to switch netns during encap/decap Nicolas Dichtel
2013-06-25 14:24                                 ` [PATCH v2 net-next 1/2] dev: introduce skb_scrub_packet() Nicolas Dichtel
2013-06-25 14:24                                 ` [PATCH v2 net-next 2/2] sit: add support of x-netns Nicolas Dichtel
2013-06-25 23:56                                   ` David Miller
2013-06-26  1:35                                     ` Eric W. Biederman
2013-06-26  5:48                                       ` David Miller
2013-06-26 10:03                                         ` Eric W. Biederman
2013-06-26 10:22                                           ` Eric Dumazet
2013-06-26 12:15                                             ` Nicolas Dichtel
2013-06-26 14:11                                               ` [PATCH v3 net-next 0/2] sit: allow to switch netns during encap/decap Nicolas Dichtel
2013-06-26 14:11                                                 ` [PATCH v3 net-next 1/2] dev: introduce skb_scrub_packet() Nicolas Dichtel
2013-06-26 14:11                                                 ` [PATCH v3 net-next 2/2] sit: add support of x-netns Nicolas Dichtel
2013-06-28  5:36                                                 ` [PATCH v3 net-next 0/2] sit: allow to switch netns during encap/decap David Miller
2013-07-03 15:00                                                   ` [PATCH net-next 0/3] ipip/ip6tnl: " Nicolas Dichtel
2013-07-03 15:00                                                     ` [PATCH net-next 1/3] sit: fix tunnel update via netlink Nicolas Dichtel
2013-07-03 15:00                                                     ` [PATCH net-next 2/3] ipip: add x-netns support Nicolas Dichtel
2013-07-03 15:00                                                     ` [PATCH net-next 3/3] ip6tnl: " Nicolas Dichtel
2013-07-04 21:56                                                     ` [PATCH net-next 0/3] ipip/ip6tnl: allow to switch netns during encap/decap David Miller
2013-08-13 15:51                                                       ` [PATCH net-next v2 0/4] " Nicolas Dichtel
2013-08-13 15:51                                                         ` [PATCH net-next v2 1/4] dev: move skb_scrub_packet() after eth_type_trans() Nicolas Dichtel
2013-08-13 15:51                                                         ` [PATCH net-next v2 2/4] ipv4 tunnels: use net_eq() helper to check netns Nicolas Dichtel
2013-08-13 15:51                                                         ` [PATCH net-next v2 3/4] ipip: add x-netns support Nicolas Dichtel
2013-08-13 15:51                                                         ` [PATCH net-next v2 4/4] ip6tnl: " Nicolas Dichtel
2013-08-15  8:01                                                         ` [PATCH net-next v2 0/4] ipip/ip6tnl: allow to switch netns during encap/decap David Miller
2013-06-26 13:49                                     ` [PATCH v2 net-next 2/2] sit: add support of x-netns Nicolas Dichtel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51C9A4E3.2060906@6wind.com \
    --to=nicolas.dichtel@6wind$(echo .)com \
    --cc=bcrl@kvack$(echo .)org \
    --cc=davem@davemloft$(echo .)net \
    --cc=ebiederm@xmission$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=ravi.mlists@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox