public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: ebiederm@xmission•com (Eric W. Biederman)
To: Francesco Ruggeri <fruggeri@aristanetworks•com>
Cc: "David S. Miller" <davem@davemloft•net>,
	Eric Dumazet <edumazet@google•com>, Jiri Pirko <jiri@resnulli•us>,
	Alexander Duyck <alexander.h.duyck@intel•com>,
	Cong Wang <amwang@redhat•com>,
	netdev@vger•kernel.org
Subject: Re: [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering
Date: Tue, 17 Sep 2013 02:38:56 -0700	[thread overview]
Message-ID: <87mwnb949b.fsf@xmission.com> (raw)
In-Reply-To: <CA+HUmGjg9jE1fjSMgkfVtiWVRD3d2DuwfbPU3Owrk_cTCe0FQA@mail.gmail.com> (Francesco Ruggeri's message of "Mon, 16 Sep 2013 23:54:41 -0700")

Francesco Ruggeri <fruggeri@aristanetworks•com> writes:

> On Mon, Sep 16, 2013 at 8:49 PM, Eric W. Biederman
> <ebiederm@xmission•com> wrote:

>> Francesco could you take a look at this.  I am about 99% certain this is
>> right but I am starting to fade.  So it is entirely possible I missed
>> something.
>
> Same here ...
> The logic looks right to me and I think it should address the original
> issue I ran into.
> Would it make sense to have netdev_unregistering and
> netdev_unregistering_wait be per-namespace, and have
> default_device_exit_batch only wait for the namespaces in net_list? It
> would require some extra loops and locking, but it may help avoid
> unnecessary waits.

It might be worth it, and it certainly would give a good progress
guarantee, as activity in an exiting network namespace is guaranteed to
wind down.  Progress guarantees are always good to have.

The downside of a per netns unregistering count is that we will have
extra wake ups which will could result in extra contention on the
rtnl_lock.

The wait queue definitely makes sense to be global as there only ever
can be a single waiter, because the netns_wq is single threaded and
everything we are doing is happening with the net_mutex held.

The count doesn't necessarily need to be atomic as it can be protected
by the rtnl_lock.

Like I said the logic is a little rough as I was tired. 

I am heading off to New Orleans for Linux Plumbers Conference in a
couple hours so I don't expect I will be particularly responsive
again until next week.

If you could test this patch perhaps refine it I think we are almost at
a final point of fixing this.

Just to be clear my reason for prefering this approach is that because
it adds no extra wait points (we already wait for the rtnl_lock), the
logic is unconditional and explicit and not hidden in the loopback
device's reference count.  Which should allow anyone reading the code
to discover and understand this guarantee.  Although a big fat comment
in default_device_exit_batch that we are guaranteeing we don't allow
the network namespace to exit while there are still network devices in
it (or something to that effect) is probably appropriate.

Eric

> Francesco
>
>>
>>  net/core/dev.c |   12 ++++++++++++
>>  1 files changed, 12 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 5d702fe..c25e6f3 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -5002,10 +5002,13 @@ static int dev_new_index(struct net *net)
>>
>>  /* Delayed registration/unregisteration */
>>  static LIST_HEAD(net_todo_list);
>> +static atomic_t netdev_unregistering = ATOMIC_INIT(0);
>> +static DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wait);
>>
>>  static void net_set_todo(struct net_device *dev)
>>  {
>>         list_add_tail(&dev->todo_list, &net_todo_list);
>> +       atomic_inc(&netdev_unregistering);
>>  }
>>
>>  static void rollback_registered_many(struct list_head *head)
>> @@ -5673,6 +5676,9 @@ void netdev_run_todo(void)
>>                 if (dev->destructor)
>>                         dev->destructor(dev);
>>
>> +               if (atomic_dec_and_test(&netdev_unregistering))
>> +                       wake_up(&netdev_unregistering_wait);
>> +
>>                 /* Free network device */
>>                 kobject_put(&dev->dev.kobj);
>>         }
>> @@ -6369,7 +6375,13 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
>>         struct net *net;
>>         LIST_HEAD(dev_kill_list);
>>
>> +retry:
>> +       wait_event(netdev_unregistering_wait, (atomic_read(&netdev_unregistering) == 0));
>>         rtnl_lock();
>> +       if (atomic_read(&netdev_unregistering) != 0) {
>> +               __rtnl_unlock();
>> +               goto retry;
>> +       }
>>         list_for_each_entry(net, net_list, exit_list) {
>>                 for_each_netdev_reverse(net, dev) {
>>                         if (dev->rtnl_link_ops)
>> --
>> 1.7.5.4
>>

  reply	other threads:[~2013-09-17  9:39 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1379008796-2121-1-git-send-email-fruggeri@aristanetworks.com>
2013-09-12 20:06 ` [PATCH 1/1] net: race condition when removing virtual net_device Eric W. Biederman
2013-09-12 21:48   ` Francesco Ruggeri
2013-09-12 22:02     ` Francesco Ruggeri
2013-09-13  5:50     ` Eric W. Biederman
2013-09-13 17:54       ` Francesco Ruggeri
2013-09-14  1:46         ` Eric W. Biederman
2013-09-16  2:54           ` Francesco Ruggeri
2013-09-16 10:45             ` Eric W. Biederman
2013-09-16 20:30               ` Francesco Ruggeri
2013-09-16 23:52                 ` [PATCH net-next] net loopback: Set loopback_dev to NULL when freed Eric W. Biederman
2013-09-17  0:50                   ` Eric Dumazet
2013-09-17  1:34                     ` David Miller
2013-09-17  1:41                       ` Eric Dumazet
2013-09-17  1:52                         ` Eric W. Biederman
2013-09-17 23:05                           ` David Miller
2013-09-17  0:25                 ` [PATCH 1/1] net: race condition when removing virtual net_device Eric W. Biederman
2013-09-17  5:12                   ` Francesco Ruggeri
2013-09-17  3:49                 ` [CFT][PATCH] net: Delay default_device_exit_batch until no devices are unregistering Eric W. Biederman
2013-09-17  6:54                   ` Francesco Ruggeri
2013-09-17  9:38                     ` Eric W. Biederman [this message]
2013-09-17 17:14                       ` Francesco Ruggeri
2013-09-17 23:21                   ` David Miller
2013-09-17 23:41                     ` Eric W. Biederman
2013-09-18  0:15                       ` David Miller
2013-09-18  3:50                         ` Francesco Ruggeri
2013-09-18  3:52                           ` David Miller
2013-09-18  8:19                             ` Eric W. Biederman
2013-09-20 16:34                               ` Francesco Ruggeri
2013-09-24  4:19                             ` [PATCH] net: Delay default_device_exit_batch until no devices are unregistering v2 Eric W. Biederman
2013-09-24 17:54                               ` Francesco Ruggeri
2013-09-28 22:14                               ` David Miller
2013-09-14  0:16   ` [PATCH 1/1] net: race condition when removing virtual net_device David Miller
2013-09-14  1:32     ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mwnb949b.fsf@xmission.com \
    --to=ebiederm@xmission$(echo .)com \
    --cc=alexander.h.duyck@intel$(echo .)com \
    --cc=amwang@redhat$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=edumazet@google$(echo .)com \
    --cc=fruggeri@aristanetworks$(echo .)com \
    --cc=jiri@resnulli$(echo .)us \
    --cc=netdev@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox