public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public•gmane.org>
To: Christoph Lameter
	<cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public•gmane.org>,
	"Paul E. McKenney"
	<paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public•gmane.org>
Cc: Nick Piggin <nickpiggin-/E1597aS9LT0CCvOHzKKcA@public•gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public•gmane.org>,
	Ingo Molnar <mingo-X9Un+BFzKDI@public•gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public•gmane.org>,
	David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public•gmane.org>,
	"Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public•gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public•gmane.org,
	"kernel-testers-u79uwXL29TY76Z2rM5mHXA@public•gmane.org >>
	Kernel Testers List"
	<kernel-testers-u79uwXL29TY76Z2rM5mHXA@public•gmane.org>,
	Mike Galbraith <efault-Mmb7MZpHnFY@public•gmane.org>,
	Peter Zijlstra
	<a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public•gmane.org>,
	Linux Netdev List
	<netdev-u79uwXL29TY76Z2rM5mHXA@public•gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public•gmane.org,
	Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public•gmane.org>
Subject: Re: [PATCH v3 6/7] fs: struct file move from call_rcu() to SLAB_DESTROY_BY_RCU
Date: Fri, 12 Dec 2008 17:48:06 +0100	[thread overview]
Message-ID: <494295C6.2020906@cosmosbay.com> (raw)
In-Reply-To: <4941EC65.5040903-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>

Eric Dumazet a écrit :
> Nick Piggin a écrit :
>> On Friday 12 December 2008 09:40, Eric Dumazet wrote:
>>> From: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public•gmane.org>
>>>
>>> [PATCH] fs: struct file move from call_rcu() to SLAB_DESTROY_BY_RCU
>>>
>>> Currently we schedule RCU frees for each file we free separately. That has
>>> several drawbacks against the earlier file handling (in 2.6.5 f.e.), which
>>> did not require RCU callbacks:
>>>
>>> 1. Excessive number of RCU callbacks can be generated causing long RCU
>>>   queues that in turn cause long latencies. We hit SLUB page allocation
>>>   more often than necessary.
>>>
>>> 2. The cache hot object is not preserved between free and realloc. A close
>>>   followed by another open is very fast with the RCUless approach because
>>>   the last freed object is returned by the slab allocator that is
>>>   still cache hot. RCU free means that the object is not immediately
>>>   available again. The new object is cache cold and therefore open/close
>>>   performance tests show a significant degradation with the RCU
>>>   implementation.
>>>
>>> One solution to this problem is to move the RCU freeing into the Slab
>>> allocator by specifying SLAB_DESTROY_BY_RCU as an option at slab creation
>>> time. The slab allocator will do RCU frees only when it is necessary
>>> to dispose of slabs of objects (rare). So with that approach we can cut
>>> out the RCU overhead significantly.
>>>
>>> However, the slab allocator may return the object for another use even
>>> before the RCU period has expired under SLAB_DESTROY_BY_RCU. This means
>>> there is the (unlikely) possibility that the object is going to be
>>> switched under us in sections protected by rcu_read_lock() and
>>> rcu_read_unlock(). So we need to verify that we have acquired the correct
>>> object after establishing a stable object reference (incrementing the
>>> refcounter does that).
>>>
>>>
>>> Signed-off-by: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public•gmane.org>
>>> Signed-off-by: Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public•gmane.org>
>>> Signed-off-by: Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public•gmane.org>
>>> ---
>>>  Documentation/filesystems/files.txt |   21 ++++++++++++++--
>>>  fs/file_table.c                     |   33 ++++++++++++++++++--------
>>>  include/linux/fs.h                  |    5 ---
>>>  3 files changed, 42 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/Documentation/filesystems/files.txt
>>> b/Documentation/filesystems/files.txt index ac2facc..6916baa 100644
>>> --- a/Documentation/filesystems/files.txt
>>> +++ b/Documentation/filesystems/files.txt
>>> @@ -78,13 +78,28 @@ the fdtable structure -
>>>     that look-up may race with the last put() operation on the
>>>     file structure. This is avoided using atomic_long_inc_not_zero()
>>>     on ->f_count :
>>> +   As file structures are allocated with SLAB_DESTROY_BY_RCU,
>>> +   they can also be freed before a RCU grace period, and reused,
>>> +   but still as a struct file.
>>> +   It is necessary to check again after getting
>>> +   a stable reference (ie after atomic_long_inc_not_zero()),
>>> +   that fcheck_files(files, fd) points to the same file.
>>>
>>>  	rcu_read_lock();
>>>  	file = fcheck_files(files, fd);
>>>  	if (file) {
>>> -		if (atomic_long_inc_not_zero(&file->f_count))
>>> +		if (atomic_long_inc_not_zero(&file->f_count)) {
>>>  			*fput_needed = 1;
>>> -		else
>>> +			/*
>>> +			 * Now we have a stable reference to an object.
>>> +			 * Check if other threads freed file and reallocated it.
>>> +			 */
>>> +			if (file != fcheck_files(files, fd)) {
>>> +				*fput_needed = 0;
>>> +				put_filp(file);
>>> +				file = NULL;
>>> +			}
>>> +		} else
>>>  		/* Didn't get the reference, someone's freed */
>>>  			file = NULL;
>>>  	}
>>> @@ -95,6 +110,8 @@ the fdtable structure -
>>>     atomic_long_inc_not_zero() detects if refcounts is already zero or
>>>     goes to zero during increment. If it does, we fail
>>>     fget()/fget_light().
>>> +   The second call to fcheck_files(files, fd) checks that this filp
>>> +   was not freed, then reused by an other thread.
>>>
>>>  6. Since both fdtable and file structures can be looked up
>>>     lock-free, they must be installed using rcu_assign_pointer()
>>> diff --git a/fs/file_table.c b/fs/file_table.c
>>> index a46e880..3e9259d 100644
>>> --- a/fs/file_table.c
>>> +++ b/fs/file_table.c
>>> @@ -37,17 +37,11 @@ static struct kmem_cache *filp_cachep __read_mostly;
>>>
>>>  static struct percpu_counter nr_files __cacheline_aligned_in_smp;
>>>
>>> -static inline void file_free_rcu(struct rcu_head *head)
>>> -{
>>> -	struct file *f =  container_of(head, struct file, f_u.fu_rcuhead);
>>> -	kmem_cache_free(filp_cachep, f);
>>> -}
>>> -
>>>  static inline void file_free(struct file *f)
>>>  {
>>>  	percpu_counter_dec(&nr_files);
>>>  	file_check_state(f);
>>> -	call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
>>> +	kmem_cache_free(filp_cachep, f);
>>>  }
>>>
>>>  /*
>>> @@ -306,6 +300,14 @@ struct file *fget(unsigned int fd)
>>>  			rcu_read_unlock();
>>>  			return NULL;
>>>  		}
>>> +		/*
>>> +		 * Now we have a stable reference to an object.
>>> +		 * Check if other threads freed file and re-allocated it.
>>> +		 */
>>> +		if (unlikely(file != fcheck_files(files, fd))) {
>>> +			put_filp(file);
>>> +			file = NULL;
>>> +		}
>> This is a non-trivial change, because that put_filp may drop the last
>> reference to the file. So now we have the case where we free the file
>> from a context in which it had never been allocated.
> 
> If we got at this point, we :
> 
> Found a non NULL pointer in our fd table.
> Then, another thread came, closed the file while we not yet added our reference.
> This file was freed (kmem_cache_free(filp_cachep, file))
> This file was reused and inserted on another thread fd table.
> We added our reference on refcount.
> We checked if this file is still ours (in our fd tab).
> We found this file is not anymore the file we wanted.
> Calling put_filp() here is our only choice to safely remove the reference on
> a truly allocated file. At this point the file is
> a truly allocated file but not anymore ours.
> Unfortunatly we added a reference on it : we must release it.
> If the other thread already called put_filp() because it wanted to close its new file,
> we must see f_refcnt going to zero, and we must call __fput(), to perform
> all the relevant file cleanup ourself.

Reading again this mail I realise we call put_filp(file), while this should
be fput(file) or put_filp(file), we dont know.

Damned, this patch is wrong as is.

Christoph, Paul, do you see the problem ?

In fget()/fget_light() we dont know if the other thread (the one who re-allocated the file,
and tried to close it while we got a reference on file) had to call put_filp() or fput()
to release its own reference. So we call atomic_long_dec_and_test() and cannot
take the appropriate action (calling the full __fput() version or the small one,
that some systems use to 'close' an not really opened file.

void put_filp(struct file *file)
{
        if (atomic_long_dec_and_test(&file->f_count)) {
                security_file_free(file);
                file_kill(file);
                file_free(file);
        }
}

void fput(struct file *file)
{
        if (atomic_long_dec_and_test(&file->f_count))
                __fput(file);
}

I believe put_filp() is only called on slowpath (error cases).

Should we just zap it and always call fput() ?

  parent reply	other threads:[~2008-12-12 16:48 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.64.0811201727070.9089@quilx.com>
     [not found] ` <20081121083044.GL16242@elte.hu>
     [not found]   ` <49267694.1030506@cosmosbay.com>
     [not found]     ` <20081121.010508.40225532.davem@davemloft.net>
     [not found]       ` <4926AEDB.10007@cosmosbay.com>
     [not found]         ` <4926AEDB.10007-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-21 15:13           ` [PATCH] fs: pipe/sockets/anon dentries should not have a parent Eric Dumazet
     [not found]             ` <4926D022.5060008-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-21 15:21               ` Ingo Molnar
     [not found]                 ` <20081121152148.GA20388-X9Un+BFzKDI@public.gmane.org>
2008-11-21 15:28                   ` Eric Dumazet
     [not found]                     ` <4926D39D.9050603-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-21 15:34                       ` Ingo Molnar
2008-11-26 23:27                         ` [PATCH 0/6] fs: Scalability of sockets/pipes allocation/deallocation on SMP Eric Dumazet
2008-11-27  9:39                           ` Christoph Hellwig
2008-11-28 18:03                           ` Ingo Molnar
     [not found]                             ` <20081128180318.GL10487-X9Un+BFzKDI@public.gmane.org>
2008-11-28 18:47                               ` Peter Zijlstra
2008-11-29  6:38                                 ` Christoph Hellwig
     [not found]                                   ` <20081129063816.GA869-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2008-11-29  8:07                                     ` Eric Dumazet
2008-11-29  8:43                           ` [PATCH v2 0/5] " Eric Dumazet
2008-12-11 22:38                             ` [PATCH v3 0/7] " Eric Dumazet
2008-12-11 22:38                             ` [PATCH v3 1/7] fs: Use a percpu_counter to track nr_dentry Eric Dumazet
2007-07-24  1:24                               ` Nick Piggin
     [not found]                               ` <49419680.8010409-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-12-16 21:04                                 ` Paul E. McKenney
2008-12-11 22:39                             ` [PATCH v3 2/7] fs: Use a percpu_counter to track nr_inodes Eric Dumazet
     [not found]                               ` <4941968E.3020201-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2007-07-24  1:30                                 ` Nick Piggin
     [not found]                                   ` <200707241130.56767.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
2008-12-12  5:11                                     ` Eric Dumazet
2008-12-16 21:10                                 ` Paul E. McKenney
2008-12-11 22:39                             ` [PATCH v3 3/7] fs: Introduce a per_cpu last_ino allocator Eric Dumazet
2007-07-24  1:34                               ` Nick Piggin
2008-12-16 21:26                               ` Paul E. McKenney
2008-12-11 22:39                             ` [PATCH v3 4/7] fs: Introduce SINGLE dentries for pipes, socket, anon fd Eric Dumazet
     [not found]                               ` <494196AA.6080002-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-12-16 21:40                                 ` Paul E. McKenney
2008-12-11 22:40                             ` [PATCH v3 5/7] fs: new_inode_single() and iput_single() Eric Dumazet
2008-12-16 21:41                               ` Paul E. McKenney
     [not found]                             ` <493100B0.6090104-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-12-11 22:40                               ` [PATCH v3 6/7] fs: struct file move from call_rcu() to SLAB_DESTROY_BY_RCU Eric Dumazet
2007-07-24  1:13                                 ` Nick Piggin
2008-12-12  2:50                                   ` Nick Piggin
2008-12-12  4:45                                   ` Eric Dumazet
     [not found]                                     ` <4941EC65.5040903-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-12-12 16:48                                       ` Eric Dumazet [this message]
     [not found]                                         ` <494295C6.2020906-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-12-13  2:07                                           ` Christoph Lameter
     [not found]                                             ` <Pine.LNX.4.64.0812121958470.15781-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
2008-12-17 20:25                                               ` Eric Dumazet
2008-12-13  1:41                                       ` Christoph Lameter
2008-12-11 22:41                             ` [PATCH v3 7/7] fs: MS_NOREFCOUNT Eric Dumazet
     [not found]                           ` <492DDB6A.8090806-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-27  1:37                             ` [PATCH 0/6] fs: Scalability of sockets/pipes allocation/deallocation on SMP Christoph Lameter
     [not found]                               ` <Pine.LNX.4.64.0811261935330.31159-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
2008-11-27  6:27                                 ` Eric Dumazet
     [not found]                                   ` <492E3DEF.8030602-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-27 14:44                                     ` Christoph Lameter
2008-11-29  8:43                             ` [PATCH v2 1/5] fs: Use a percpu_counter to track nr_dentry Eric Dumazet
2008-11-29  8:43                             ` [PATCH v2 2/5] fs: Use a percpu_counter to track nr_inodes Eric Dumazet
2008-11-29  8:44                             ` [PATCH v2 4/5] fs: Introduce SINGLE dentries for pipes, socket, anon fd Eric Dumazet
     [not found]                               ` <493100E7.3030907-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-29 10:38                                 ` Jörn Engel
     [not found]                                   ` <20081129103836.GA11959-PCqxUs/MD9bYtjvyW6yDsg@public.gmane.org>
2008-11-29 11:14                                     ` Eric Dumazet
2008-11-29  8:45                             ` [PATCH v2 5/5] fs: new_inode_single() and iput_single() Eric Dumazet
2008-11-29 11:14                               ` Jörn Engel
2008-11-29  8:44                           ` [PATCH v2 3/5] fs: Introduce a per_cpu last_ino allocator Eric Dumazet
     [not found]                         ` <20081121153453.GA23713-X9Un+BFzKDI@public.gmane.org>
2008-11-26 23:30                           ` [PATCH 1/6] fs: Introduce a per_cpu nr_dentry Eric Dumazet
     [not found]                             ` <492DDC0B.8060804-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-27  9:41                               ` Christoph Hellwig
2008-11-26 23:32                           ` [PATCH 4/6] fs: Introduce a per_cpu nr_inodes Eric Dumazet
2008-11-27  9:32                             ` Peter Zijlstra
2008-11-27  9:39                               ` Peter Zijlstra
2008-11-27  9:48                                 ` Christoph Hellwig
2008-11-27 10:01                               ` Eric Dumazet
2008-11-27 10:07                               ` Andi Kleen
2008-11-27 14:46                               ` Christoph Lameter
2008-11-26 23:32                           ` [PATCH 5/6] fs: Introduce special inodes Eric Dumazet
     [not found]                             ` <492DDC99.5060106-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-27  8:20                               ` David Miller
2008-11-26 23:32                         ` [PATCH 3/6] fs: Introduce a per_cpu last_ino allocator Eric Dumazet
     [not found]                           ` <492DDC88.2050305-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-27  9:46                             ` Christoph Hellwig
2008-11-26 23:32                         ` [PATCH 6/6] fs: Introduce kern_mount_special() to mount special vfs Eric Dumazet
2008-11-27  9:53                           ` Christoph Hellwig
     [not found]                             ` <20081127095321.GE13860-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2008-11-27 10:04                               ` Eric Dumazet
     [not found]                                 ` <492E70B6.70108-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-27 10:10                                   ` Christoph Hellwig
     [not found]                           ` <492DDCAB.1070204-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-27  8:21                             ` David Miller
2008-11-28  9:26                             ` Al Viro
     [not found]                               ` <20081128092604.GL28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-11-28  9:34                                 ` Al Viro
2008-11-28 18:02                                 ` Ingo Molnar
2008-11-28 18:58                                   ` Ingo Molnar
     [not found]                                   ` <20081128180220.GK10487-X9Un+BFzKDI@public.gmane.org>
2008-11-28 22:20                                     ` Eric Dumazet
2008-11-28 22:37                               ` Eric Dumazet
2008-11-28 22:43                                 ` Eric Dumazet
2008-11-21 15:36             ` [PATCH] fs: pipe/sockets/anon dentries should not have a parent Christoph Hellwig
2008-11-21 17:58               ` [PATCH] fs: pipe/sockets/anon dentries should have themselves as parent Eric Dumazet
     [not found]                 ` <4926F6C5.9030108-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-21 18:43                   ` Matthew Wilcox
2008-11-23  3:53                     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=494295C6.2020906@cosmosbay.com \
    --to=dada1-fplkhrcr87vqlbn2x/ywag@public$(echo .)gmane.org \
    --cc=a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public$(echo .)gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public$(echo .)gmane.org \
    --cc=cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public$(echo .)gmane.org \
    --cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public$(echo .)gmane.org \
    --cc=efault-Mmb7MZpHnFY@public$(echo .)gmane.org \
    --cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public$(echo .)gmane.org \
    --cc=kernel-testers-u79uwXL29TY76Z2rM5mHXA@public$(echo .)gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public$(echo .)gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public$(echo .)gmane.org \
    --cc=mingo-X9Un+BFzKDI@public$(echo .)gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public$(echo .)gmane.org \
    --cc=nickpiggin-/E1597aS9LT0CCvOHzKKcA@public$(echo .)gmane.org \
    --cc=paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public$(echo .)gmane.org \
    --cc=rjw-KKrjLPT3xs0@public$(echo .)gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public$(echo .)gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox