public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Nikolay Aleksandrov <nikolay@cumulusnetworks•com>
To: clsoto@linux•vnet.ibm.com, davem@davemloft•net
Cc: netdev@vger•kernel.org, brking@linux•vnet.ibm.com,
	j.vosburgh@gmail•com, gospo@cumulusnetworks•com
Subject: Re: [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one
Date: Wed, 15 Jul 2015 19:49:11 +0200	[thread overview]
Message-ID: <55A69D17.5030800@cumulusnetworks.com> (raw)
In-Reply-To: <55A42962.9080004@cumulusnetworks.com>

On 07/13/2015 11:10 PM, Nikolay Aleksandrov wrote:
> On 07/13/2015 11:05 PM, Nikolay Aleksandrov wrote:
>> On 07/13/2015 08:57 PM, clsoto@linux•vnet.ibm.com wrote:
>>> From: Carol L Soto <clsoto@linux•vnet.ibm.com>
>>>
>>> Add function bond_remove_proc_entry at __bond_release_one to avoid stack 
>>> trace at rmmod bonding.
>>>
>>> [68830.202239] remove_proc_entry: removing non-empty directory
>>> 'net/bonding', leaking at least 'bond0'
>>> [68830.202257] ------------[ cut here ]------------
>>> [68830.202260] WARNING: at fs/proc/generic.c:562
>>> [68830.202412] NIP [c0000000002abf6c] .remove_proc_entry+0x1fc/0x240
>>> [68830.202416] LR [c0000000002abf68] .remove_proc_entry+0x1f8/0x240
>>> [68830.202419] PACATMSCRATCH [8000000000009032]
>>> [68830.202421] Call Trace:
>>> [68830.202424] [c000000179277940] [c0000000002abf68] 
>>> .remove_proc_entry+0x1f8/0x240 (unreliable)
>>> [68830.202434] [c0000001792779f0] [d0000000053229a4] 
>>> .bond_destroy_proc_dir+0x34/0x54 [bonding]
>>> [68830.202440] [c000000179277a70] [d0000000053130e0] 
>>> .bond_net_exit+0x90/0x120 [bonding]
>>> [68830.202445] [c000000179277b10] [c00000000059944c] 
>>> .ops_exit_list.isra.0+0x6c/0xd0
>>> [68830.202450] [c000000179277ba0] [c000000000599774] 
>>> .unregister_pernet_operations+0x94/0x100
>>> [68830.202454] [c000000179277c40] [c000000000599814] 
>>> .unregister_pernet_subsys+0x34/0x60
>>> [68830.202460] [c000000179277cc0] [d000000005323758] 
>>> .bonding_exit+0x48/0x2328 [bonding]
>>> [68830.202466] [c000000179277d30] [c00000000010dcc4] 
>>> .SyS_delete_module+0x1f4/0x340
>>> [68830.202471] [c000000179277e30] [c000000000009e7c] 
>>> syscall_exit+0x0/0x7c
>>> [68830.202491] ---[ end trace 9bd1d810219c9875 ]---
>>>
>>> Signed-off-by: Carol L Soto <clsoto@linux•vnet.ibm.com>
>>> ---
>>>  drivers/net/bonding/bond_main.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 19eb990..ace105a 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device *bond_dev,
>>>  		dev_set_mac_address(slave_dev, &addr);
>>>  	}
>>>  
>>> +	bond_remove_proc_entry(bond);
>>> +
>>>  	dev_set_mtu(slave_dev, slave->original_mtu);
>>>  
>>>  	slave_dev->priv_flags &= ~IFF_BONDING;
>>>
>>
>> This is incorrect, it tries to remove the bond entry on every slave release
>> so if we have a bonding device with >= 2 slaves and release one of them then
>> the whole bond device entry will be removed from /proc/net/bonding.
> <<<<>>>>
>> You can hit this case only if you had created a bonding device while doing the
>> rmmod bonding (it's an old race condition which was fixed long time ago, but
>> the procfs was apparently missed) and only after the notifier has been
>> unregistered but before the sysfs has been removed.
>>
> Scratch this part, it should be triggered in a different way.
> Could you provide a way to reproduce ?
> 
>> Since the bonding netdevice notifier is handling the procfs creation/destruction
>> we could try moving the unregister after the pernet destruction which should
>> help avoid such problems. Could you try the following patch:
>>
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 19eb990d398c..d515ee38b77f 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -4682,12 +4682,10 @@ err_link:
>>  
>>  static void __exit bonding_exit(void)
>>  {
>> -	unregister_netdevice_notifier(&bond_netdev_notifier);
>> -
>>  	bond_destroy_debugfs();
>> -
>>  	bond_netlink_fini();
>>  	unregister_pernet_subsys(&bond_net_ops);
>> +	unregister_netdevice_notifier(&bond_netdev_notifier);
>>  
>>  #ifdef CONFIG_NET_POLL_CONTROLLER
>>  	/* Make sure we don't have an imbalance on our netpoll blocking */
>>

After we had a private discussion I was able to reproduce this issue with
tap devices (!= ARPHRD_ETHER and can be enslaved):

[14446.539000] bond0: Releasing active interface tun1
[14446.548333] bond0 (unregistering): Released all slaves
[14446.564200] ------------[ cut here ]------------
[14446.564208] WARNING: CPU: 0 PID: 6319 at fs/proc/generic.c:575 remove_proc_entry+0x112/0x160()
[14446.564211] remove_proc_entry: removing non-empty directory 'net/bonding', leaking at least 'bond0'
[14446.564212] Modules linked in: tun bonding(-) eql(O) 9p stp llc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev hid_generic usbhid hid snd_hda_codec_generic ppdev aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse pcspkr snd_hda_intel evdev snd_hda_codec snd_hwdep qxl snd_hda_core serio_raw snd_pcm snd_timer snd 9pnet_virtio 9pnet virtio_balloon soundcore i2c_piix4 drm_kms_helper ttm drm i2c_core virtio_console pvpanic parport_pc parport acpi_cpufreq processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 sg sr_mod cdrom ata_generic virtio_blk virtio_net e1000 ata_piix floppy ehci_pci uhci_hcd ehci_hcd libata scsi_mod usbcore usb_common virtio_pci virtio_ring virtio [l
 ast unloaded: bridge]

[14446.564295] CPU: 0 PID: 6319 Comm: rmmod Tainted: G           O    4.2.0-rc2+ #6
[14446.564296] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[14446.564299]  0000000000000000 ffffffff81732d41 ffffffff81525b34 ffff880035bcbda8
[14446.564302]  ffffffff8106c521 ffff8800367c5f78 ffff8800367c5f40 ffff88003e3a4280
[14446.564304]  ffffffffa05a5040 0000000000000000 ffffffff8106c59a ffffffff8172ebd0
[14446.564307] Call Trace:
[14446.564313]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
[14446.564317]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
[14446.564320]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
[14446.564323]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
[14446.564329]  [<ffffffffa059d0d6>] ? bond_destroy_proc_dir+0x26/0x30 [bonding]
[14446.564332]  [<ffffffffa058d40e>] ? bond_net_exit+0x8e/0xa0 [bonding]
[14446.564336]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
[14446.564340]  [<ffffffff8142f52d>] ? unregister_pernet_operations+0x8d/0xd0
[14446.564343]  [<ffffffff8142f58d>] ? unregister_pernet_subsys+0x1d/0x30
[14446.564346]  [<ffffffffa059d259>] ? bonding_exit+0x23/0xdca [bonding]
[14446.564350]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
[14446.564354]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
[14446.564357]  [<ffffffff8152b732>] ? entry_SYSCALL_64_fastpath+0x16/0x75
[14446.564360] ---[ end trace a911dbcedf315986 ]---

The problem is in bond_release_and_destroy() and I'll post a proper fix
for -net in a few minutes after I run some tests.

Cheers,
 Nik

  reply	other threads:[~2015-07-15 17:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-13 18:57 [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one clsoto
2015-07-13 21:05 ` Nikolay Aleksandrov
2015-07-13 21:10   ` Nikolay Aleksandrov
2015-07-15 17:49     ` Nikolay Aleksandrov [this message]
2015-07-15 19:52       ` [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether Nikolay Aleksandrov
2015-07-15 21:01         ` Carol Soto
2015-07-15 22:39         ` Eric W. Biederman
2015-07-15 22:54           ` Nikolay Aleksandrov
2015-07-15 22:58             ` Carol Soto
2015-07-16  6:14             ` Veaceslav Falico
2015-07-20 19:56         ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55A69D17.5030800@cumulusnetworks.com \
    --to=nikolay@cumulusnetworks$(echo .)com \
    --cc=brking@linux$(echo .)vnet.ibm.com \
    --cc=clsoto@linux$(echo .)vnet.ibm.com \
    --cc=davem@davemloft$(echo .)net \
    --cc=gospo@cumulusnetworks$(echo .)com \
    --cc=j.vosburgh@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox