public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Nicolas Dichtel <nicolas.dichtel@6wind•com>
To: Eric Dumazet <eric.dumazet@gmail•com>
Cc: netdev <netdev@vger•kernel.org>, Octavian Purdila <opurdila@ixiacom•com>
Subject: Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
Date: Tue, 28 Sep 2010 18:45:58 +0200	[thread overview]
Message-ID: <4CA21BC6.5070300@6wind.com> (raw)
In-Reply-To: <1285691629.3154.80.camel@edumazet-laptop>

Eric Dumazet wrote:
> Le mardi 28 septembre 2010 à 17:24 +0200, Nicolas Dichtel a écrit :
>> Hi,
>>
>> I face a problem when I try to remove an interface, 
>> netdev_wait_allrefs() complains about refcount.
>>
>> Here is a trivial scenario to reproduce the problem:
>> # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
>> # ./a.out tunl1
>> # ip tunnel del tunl1
>>
>> Note: a.out binary create an IPv4 raw socket, attach it to tunl1 
>> (SO_BINDTODEVICE), set it as multicast (IP_MULTICAST_LOOP), set the 
>> multicast interface to tunl1 (IP_MULTICAST_IF), build the IP header 
>> (IP_HDRINCL) and then send a single packet (192.168.6.1 -> 224.0.0.18).
>>
>> Note2: when a.out is executed, tunl1 has no ip address and is down.
>>
> 
> CC Octavian Purdila, the patch author.
> 
> I am just wondering why this route is created in the first place.
At first, I asked myself the same question, but it seems that this is 
allowed to send a packet through this kind of socket, even if interface 
is down. Packet will be destroyed by the noop qdisk.
But I agree that it is strange to perform route lookup and everything to 
   destroy the packet at the end ...
Maybe raw_sendmsg() can delete it directly ;-) ... or maybe 
ip_route_output_flow().

Any suggestions welcome.

Regards,
Nicolas

> 
> Maybe a fix would be to forbid this ?
> 
> Some machines have a giant route cache, so its very important to avoid
> expensive scans.
> 
>> Then, I got a serie of "kernel:[1206699.728010] unregister_netdevice: 
>> waiting for tunl1 to become free. Usage count = 3" and after some time, 
>> interface is removed.
>>
>> The problem is that route cache entries are only invalidate on 
>> UNREGISTER event, and not removed (introduced by commit 
>> e2ce146848c81af2f6d42e67990191c284bf0c33). We must wait that 
>> rt_check_expire() remove the remaining route cache entries.
>>
>> To fix the problem, I propose to remove a part of the previous commit.
>>
>> Regards,
>> Nicolas
>> pièce jointe différences entre fichiers
>> (0001-ipv4-remove-all-rt-cache-entries-on-UNREGISTER-even.patch)
>> From 3344e2e0431fe803c4dac8757a8746908357d780 Mon Sep 17 00:00:00 2001
>> From: Nicolas Dichtel <nicolas.dichtel@6wind•com>
>> Date: Tue, 28 Sep 2010 16:38:19 +0200
>> Subject: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
>>
>> Commit e2ce146848c81af2f6d42e67990191c284bf0c33 (ipv4: factorize cache clearing
>> for batched unregister operations) add a new parameter to fib_disable_ip() to
>> only invalidate route cache entries on unregister event.
>> This is wrong, we should ensure that all cache entries are removed on
>> unregister event, else netdev_wait_allrefs() may complain. A cache entry
>> can be created between event DOWN and UNREGISTER.
>>
>> So, I revert a part of the patch.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind•com>
>> ---
>>  net/ipv4/fib_frontend.c |   10 +++++-----
>>  1 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index 7d02a9f..377e815 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -917,11 +917,11 @@ static void nl_fib_lookup_exit(struct net *net)
>>  	net->ipv4.fibnl = NULL;
>>  }
>>  
>> -static void fib_disable_ip(struct net_device *dev, int force, int delay)
>> +static void fib_disable_ip(struct net_device *dev, int force)
>>  {
>>  	if (fib_sync_down_dev(dev, force))
>>  		fib_flush(dev_net(dev));
>> -	rt_cache_flush(dev_net(dev), delay);
>> +	rt_cache_flush(dev_net(dev), 0);
>>  	arp_ifdown(dev);
>>  }
>>  
>> @@ -944,7 +944,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
>>  			/* Last address was deleted from this interface.
>>  			   Disable IP.
>>  			 */
>> -			fib_disable_ip(dev, 1, 0);
>> +			fib_disable_ip(dev, 1);
>>  		} else {
>>  			rt_cache_flush(dev_net(dev), -1);
>>  		}
>> @@ -959,7 +959,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>>  	struct in_device *in_dev = __in_dev_get_rtnl(dev);
>>  
>>  	if (event == NETDEV_UNREGISTER) {
>> -		fib_disable_ip(dev, 2, -1);
>> +		fib_disable_ip(dev, 2);
>>  		return NOTIFY_DONE;
>>  	}
>>  
>> @@ -977,7 +977,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>>  		rt_cache_flush(dev_net(dev), -1);
>>  		break;
>>  	case NETDEV_DOWN:
>> -		fib_disable_ip(dev, 0, 0);
>> +		fib_disable_ip(dev, 0);
>>  		break;
>>  	case NETDEV_CHANGEMTU:
>>  	case NETDEV_CHANGE:
> 
> 

  reply	other threads:[~2010-09-28 16:46 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-28 15:24 [PATCH] ipv4: remove all rt cache entries on UNREGISTER event Nicolas Dichtel
2010-09-28 16:33 ` Eric Dumazet
2010-09-28 16:45   ` Nicolas Dichtel [this message]
2010-09-28 16:56     ` Eric Dumazet
2010-09-29  7:49       ` Nicolas Dichtel
2010-09-29  8:35         ` Eric Dumazet
2010-09-29  9:18           ` Eric Dumazet
2010-09-30 11:49             ` Nicolas Dichtel
2010-12-22  8:32             ` Nicolas Dichtel
2010-12-22  9:55               ` Eric Dumazet
2010-12-22 10:07                 ` Eric Dumazet
2010-12-22 13:43                   ` Nicolas Dichtel
2010-12-22 14:39                     ` [PATCH] ipv4: dont create routes on down devices Eric Dumazet
2010-12-23  8:50                       ` Octavian Purdila
2010-12-26  4:05                         ` David Miller
2010-09-28 17:35   ` [PATCH] ipv4: remove all rt cache entries on UNREGISTER event Octavian Purdila

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CA21BC6.5070300@6wind.com \
    --to=nicolas.dichtel@6wind$(echo .)com \
    --cc=eric.dumazet@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=opurdila@ixiacom$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox