public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Ding Tianhong <dingtianhong@huawei•com>
To: Veaceslav Falico <vfalico@redhat•com>
Cc: Jay Vosburgh <fubar@us•ibm.com>,
	Andy Gospodarek <andy@greyhouse•net>,
	"David S. Miller" <davem@davemloft•net>,
	Nikolay Aleksandrov <nikolay@redhat•com>,
	Netdev <netdev@vger•kernel.org>
Subject: Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
Date: Mon, 28 Oct 2013 09:15:52 +0800	[thread overview]
Message-ID: <526DBAC8.5000009@huawei.com> (raw)
In-Reply-To: <20131027225317.GB11209@redhat.com>

On 2013/10/28 6:53, Veaceslav Falico wrote:
> On Thu, Oct 24, 2013 at 11:08:35AM +0800, Ding Tianhong wrote:
>> Hi:
>>
>> The slave list will add and del by bond_master_upper_dev_link() and bond_upper_dev_unlink(),
>> which will call call_netdevice_notifiers(), even it is safe to call it in write bond lock now,
>> but we can't sure that whether it is safe later, because other drivers may deal NETDEV_CHANGEUPPER
>> in sleep way, so I didn't admit move the bond_upper_dev_unlink() in write bond lock.
>>
>> now the bond_for_each_slave only protect by rtnl_lock(), maybe use bond_for_each_slave_rcu is a good
>> way to protect slave list for bond, but as a system slow path, it is no need to transform bond_for_each_slave()
>> to bond_for_each_slave_rcu() in slow path, so in the patchset, I will remove the unused read bond lock
>> for monitor function, maybe it is a better way, I will wait to accept any relay for it.
>>
>> Thanks for the Veaceslav Falico opinion.
>>
>> v2: add and modify commit for patchset and patch, it will be the first step for the whole patchset.
>>
>> Ding Tianhong (5):
>>  bonding: remove bond read lock for bond_mii_monitor()
>>  bonding: remove bond read lock for bond_alb_monitor()
>>  bonding: remove bond read lock for bond_loadbalance_arp_mon()
>>  bonding: remove bond read lock for bond_activebackup_arp_mon()
> 
> This patch introduces a regression by boot-test with active backup mode:
> 
> bond_activebackup_arp_mon() is already not holding the bond->lock, however
> it might call bond_change_active_slave(), which does (in case of new_active):
> 
>  912                         write_unlock_bh(&bond->curr_slave_lock);
>  913                         read_unlock(&bond->lock);
>  914  915                         call_netdevice_notifiers(NETDEV_BONDING_FAILOVER, bond->dev);
>  916                         if (should_notify_peers)
>  917                                 call_netdevice_notifiers(NETDEV_NOTIFY_PEERS,
>  918                                                          bond->dev);
>  919  920                         read_lock(&bond->lock);
>  921                         write_lock_bh(&bond->curr_slave_lock);
> 
> so it drops the bond->lock (which wasn't taken previously), and then takes
> it (without anyone dropping it afterwards).
> 
> I don't know how to fix it - cause a lot of other callers already take it,
> and we can't just drop them (we'd race), and we can't remove it here (cause
> we can't call notifiers while atomic).
> 
> Which begs the question - was this patchset tested at all?
> 
> [   21.796823] =====================================
> [   21.796823] [ BUG: bad unlock balance detected! ]
> [   21.796823] 3.12.0-rc6+ #305 Tainted: G          I [   21.796823] -------------------------------------
> [   21.796823] kworker/u8:5/59 is trying to release lock (&bond->lock) at:
> [   21.796823] [<ffffffffa00b6c38>] bond_change_active_slave+0x2c8/0x390 [bonding]
> [   21.796823] but there are no more locks to release!
> [   21.796823] [   21.796823] other info that might help us debug this:
> [   21.796823] 3 locks held by kworker/u8:5/59:
> [   21.796823]  #0:  (%s#4){.+.+..}, at: [<ffffffff810cfeb9>] process_one_work+0x189/0x580
> [   21.796823]  #1:  ((&(&bond->arp_work)->work)){+.+...}, at: [<ffffffff810cfeb9>] process_one_work+0x189/0x580
> [   21.796823]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff8169ea05>] rtnl_trylock+0x15/0x20
> [   21.796823] [   21.796823] stack backtrace:
> [   21.796823] CPU: 0 PID: 59 Comm: kworker/u8:5 Tainted: G          I  3.12.0-rc6+ #305
> [   21.796823] Hardware name: Hewlett-Packard HP xw4600 Workstation/0AA0h, BIOS 786F3 v01.15 08/28/2008
> [   21.796823] Workqueue: bond0 bond_activebackup_arp_mon [bonding]
> [   21.796823]  ffffffffa00b6c38 ffff880079ecdae8 ffffffff817aa048 0000000000000002
> [   21.796823]  ffff880079ec4b40 ffff880079ecdb18 ffffffff81129af9 00000000001d5400
> [   21.796823]  ffff880079ec4b40 ffff880078a36c88 ffff880079ec5440 ffff880079ecdba8
> [   21.796823] Call Trace:
> [   21.796823]  [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
> [   21.796823]  [<ffffffff817aa048>] dump_stack+0x59/0x81
> [   21.796823]  [<ffffffff81129af9>] print_unlock_imbalance_bug+0xf9/0x100
> [   21.796823]  [<ffffffff8112d67f>] lock_release_non_nested+0x26f/0x3f0
> [   21.796823]  [<ffffffff810f3aa8>] ? sched_clock_cpu+0xb8/0x120
> [   21.796823]  [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
> [   21.796823]  [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
> [   21.796823]  [<ffffffff8112d892>] __lock_release+0x92/0x1b0
> [   21.796823]  [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
> [   21.796823]  [<ffffffff8112da0b>] lock_release+0x5b/0x130
> [   21.796823]  [<ffffffff817b0553>] _raw_read_unlock+0x23/0x50
> [   21.796823]  [<ffffffffa00b6c38>] bond_change_active_slave+0x2c8/0x390 [bonding]
> [   21.796823]  [<ffffffffa00b6df7>] bond_select_active_slave+0xf7/0x1d0 [bonding]
> [   21.796823]  [<ffffffffa00b7006>] bond_ab_arp_commit+0x136/0x200 [bonding]
> [   21.796823]  [<ffffffffa00b9dd8>] bond_activebackup_arp_mon+0xc8/0xd0 [bonding]
> [   21.796823]  [<ffffffff810cff2a>] process_one_work+0x1fa/0x580
> [   21.796823]  [<ffffffff810cfeb9>] ? process_one_work+0x189/0x580
> [   21.796823]  [<ffffffff810d231f>] worker_thread+0x11f/0x3a0
> [   21.796823]  [<ffffffff810d2200>] ? manage_workers+0x170/0x170
> [   21.796823]  [<ffffffff810dbdfe>] kthread+0xee/0x100
> [   21.796823]  [<ffffffff8112d93b>] ? __lock_release+0x13b/0x1b0
> [   21.796823]  [<ffffffff810dbd10>] ? __init_kthread_worker+0x70/0x70
> [   21.796823]  [<ffffffff817ba3ec>] ret_from_fork+0x7c/0xb0
> [   21.796823]  [<ffffffff810dbd10>] ? __init_kthread_worker+0x70/0x70
> 
> 
>>  bonding: remove bond read lock for bond_3ad_state_machine_handler()
>>
>> drivers/net/bonding/bond_3ad.c  |   9 ++--
>> drivers/net/bonding/bond_alb.c  |  20 ++------
>> drivers/net/bonding/bond_main.c | 100 +++++++++++++---------------------------
>> 3 files changed, 40 insertions(+), 89 deletions(-)
>>
>> -- 
>> 1.8.2.1
>>
>>

Hi David:
yes, exactly I miss it and make a mistake, the bond_select_active_slave is still have the protect problem and 
need to be processed, I miss it, sorry, I will send a patch to fix the bug soon.

Hi Veaceslav:
sorry about the commit, I will pay more attention to the commit and test, thanks for your advise and report the bug,
I have to admin that I was too careless.




>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger•kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

  reply	other threads:[~2013-10-28  1:16 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-24  3:08 [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding Ding Tianhong
2013-10-24  9:35 ` Veaceslav Falico
2013-10-27 20:37 ` David Miller
2013-10-27 21:10   ` Veaceslav Falico
2013-10-27 21:44     ` David Miller
2013-10-27 22:10       ` Veaceslav Falico
2013-10-28  4:00         ` David Miller
2013-10-27 22:53 ` Veaceslav Falico
2013-10-28  1:15   ` Ding Tianhong [this message]
2013-10-28  1:34     ` Veaceslav Falico
2013-10-28  3:02       ` Ding Tianhong
2013-10-28  4:01   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=526DBAC8.5000009@huawei.com \
    --to=dingtianhong@huawei$(echo .)com \
    --cc=andy@greyhouse$(echo .)net \
    --cc=davem@davemloft$(echo .)net \
    --cc=fubar@us$(echo .)ibm.com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=nikolay@redhat$(echo .)com \
    --cc=vfalico@redhat$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox