From: Stephen Hemminger <shemminger@vyatta•com>
To: ebiederm@xmission•com (Eric W. Biederman)
Cc: Francesco Ruggeri <fruggeri@aristanetworks•com>,
netdev@vger•kernel.org, Eric Dumazet <eric.dumazet@gmail•com>,
David Miller <davem@davemloft•ne>
Subject: Re: Race condition in ipv6 code
Date: Thu, 12 Jan 2012 17:57:58 -0800 [thread overview]
Message-ID: <20120112175758.773c8e85@nehalam.linuxnetplumber.net> (raw)
In-Reply-To: <m1ty40tn83.fsf@fess.ebiederm.org>
On Thu, 12 Jan 2012 17:17:32 -0800
ebiederm@xmission•com (Eric W. Biederman) wrote:
> Francesco Ruggeri <fruggeri@aristanetworks•com> writes:
>
> > We have hit a race condition in ipv6 code when setting
> > /proc/sys/net/ipv6/conf/*/forwarding. This happens when the syscall
> > has to be restarted.
> >
> > I wonder if anyone else has run into the same issue.
> >
> > The current sequence in addrconf_sysctl_forward() and
> > addrconf_fixup_forwarding() is as follows:
> > - change the parameter in idev->cnf.forwarding (using proc_dointvec())
> > - try to get the rtnl lock
> > - if cannot get the lock then restore the original value in
> > idev->cnf.forwarding and restart the syscall.
> >
> > While this is going on, the ipv6 code may access idev->cnf.forwarding
> > and get an incorrect value.
> > In our case we were in addrconf_ifdown (holding the rtnl lock) and
> > calling __ipv6_ifa_notify(RTM_DELADDR, ifa) on the idev->addr_list
> > entries.
> > __ipv6_ifa_notify() only invokes addrconf_leave_anycast() if
> > idev->cnf.forwarding is set. Because a process trying to set
> > forwarding to 0 was stuck in the restart_syscall sequence above
> > flipping the flag on and off, we erroneously read the flag as 0, with
> > the result that addrconf_leave_anycast() was not invoked, some
> > idev->ac_list entries were never released, idev was never freed and
> > kept a reference to its net_device, and the net_device was never freed
> > and caused the "unregister_netdevice: waiting for xxx to become free"
> > message forever. In our case this was a vlan interfaces that was being
> > deleted, so we ended up getting stuck in vlan_ioctl_handler() holding
> > vlan_ioctl_mutex with further bad consequences.
> > The following diffs (for 2.6.38, but the same logic seems to be used
> > in 3.2) address the issue by modifying idev->cnf.forwarding only after
> > the rtnl lock is acquired. There is a similar situation for
> > disable_ipv6.
> > Any comments are appreciated.
>
> Interesting. So ultimately the problem is not the syscall restart
> although that exacerbates it, the problem is that we expect
> idev->cnf.forwarding to be protected by the rtnl_lock and it is not.
>
> At first read through your patch looks good. I am a bit worried that
> we have some versions of the value: aka
> net->ipv6.devconf_dflt->forwarding not protected by the rtnl_lock
> and other version of the value protected by the rtnl_lock.
>
> That just seems confusing.
>
> We can't hold the rtnl_lock around proc_dointvec because that can sleep
> indefinitely in copy_from_user. So it looks like your change to create
> a temporary ctl_table and call proc_dointvec seems very reasonable,
> and necessary however we do this.
>
> I don't know if there are other places that need the rtnl_lock that
> but your patch below looks like it makes things better for all of
> the right reasons. So on that score.
>
> Acked-by: "Eric W. Biederman" <ebiederm@xmission•com>
>
> Unless someone wants to volunteer to sort out the impedance mismatch
> between these tunables and the sysctl infrastructure. I suggest
> you resend this patch to David with [PATCH] in the subject line.
>
> I would also suggest a little clearer description why
> idev->cnf.forwarding and idev->cnf.disable_ipv6 need rntl_lock
> protection.
>
> But overall this looks like a pretty obvious bug fix, to the
> problem that we need the rtnl_lock to protect idev->cnf.forwarding,
> and we currently allow updates to idev->cnf.forwarding without
> holding the rtnl_lock.
>
> Eric
>
Looks like a better function (proc_doint_rtnl?) needs to be built
that has the locking in the right place. I.e:
get value from user
get lock (with restart)
do changes
unlock
next prev parent reply other threads:[~2012-01-13 1:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-12 2:13 Race condition in ipv6 code Francesco Ruggeri
2012-01-12 6:31 ` Eric Dumazet
2012-01-12 6:44 ` David Miller
2012-01-12 20:48 ` Francesco Ruggeri
2012-01-13 0:11 ` Eric W. Biederman
2012-01-13 6:02 ` Eric Dumazet
2012-01-13 7:40 ` Eric W. Biederman
2012-01-13 17:04 ` Ben Greear
2012-01-14 5:46 ` Eric W. Biederman
2012-01-14 18:31 ` Ben Greear
2012-01-20 2:54 ` Eric W. Biederman
2012-01-13 1:17 ` Eric W. Biederman
2012-01-13 1:57 ` Stephen Hemminger [this message]
2012-01-13 22:02 ` Francesco Ruggeri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120112175758.773c8e85@nehalam.linuxnetplumber.net \
--to=shemminger@vyatta$(echo .)com \
--cc=davem@davemloft$(echo .)ne \
--cc=ebiederm@xmission$(echo .)com \
--cc=eric.dumazet@gmail$(echo .)com \
--cc=fruggeri@aristanetworks$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox