CPU: 0 Not tainted (3.1.9+ #1) when ifconfig rose0 down

public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed

* CPU: 0 Not tainted  (3.1.9+ #1) when ifconfig rose0 down
@ 2012-07-31 14:11 Bernard Pidoux
  2012-07-31 15:27 ` Bernard Pidoux
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Bernard Pidoux @ 2012-07-31 14:11 UTC (permalink / raw)
  To: linux-hams, Linux Netdev List

Hi,

I observe systematically a kernel panic when I try to shutdown rose0 
device using ifconfig rose0 down

This is happening on two very different ROSE implementation, one is on a 
machine with x86-64 kernel 4.6.3 on an Intel core 2 duo CPU
the other is on a RaspBerry Pi with Raspbian and 3.1.9+ wheezy kernel
recompiled with AX.25 modules (ax25, rose, netrom, 6pack, kiss) enabled.

Here is an image of the screen dump :

http://f6bvp.org/photos/rose_device_event.JPG

It can be noticed that PC is at rose_device_event and
LR is at sock_def_wakeup

One thing to be noticed is that when I close before all ROSE and AX.25 
applications, there are still a few populated sockets, probably for one 
of the program did not close the sockets properly.

I that case, does rose module should accept to shutdown rose0 device ?
However, I guess that it should not create a kernel panic due to a 
kernel NULL pointer.

I don't know what to do in order to debug that issue.

Bernard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU: 0 Not tainted  (3.1.9+ #1) when ifconfig rose0 down
  2012-07-31 14:11 CPU: 0 Not tainted (3.1.9+ #1) when ifconfig rose0 down Bernard Pidoux
@ 2012-07-31 15:27 ` Bernard Pidoux
  2012-07-31 17:58   ` folkert
  2012-08-02 23:03   ` Eric W. Biederman
  2012-07-31 17:56 ` folkert
  2012-08-02 22:55 ` Eric W. Biederman
  2 siblings, 2 replies; 10+ messages in thread
From: Bernard Pidoux @ 2012-07-31 15:27 UTC (permalink / raw)
  To: linux-hams, Linux Netdev List

Here is a complementary observation.
Trying to remove rose module with rmmod rose did not create any kernel 
panic.
However, there is an endless message from the kernel saying :

Message from syslogd@raspberrypi at Jul 31 17:22:40 ...
  kernel:[  831.579007] unregister_netdevice: waiting for rose0 to 
become free. Usage count = 23

Message from syslogd@raspberrypi at Jul 31 17:22:50 ...
  kernel:[  841.739390] unregister_netdevice: waiting for rose0 to 
become free. Usage count = 23

Message from syslogd@raspberrypi at Jul 31 17:23:00 ...
  kernel:[  851.899758] unregister_netdevice: waiting for rose0 to 
become free. Usage count = 23
.....

As observed at many occasions, count number seems to be random ! and
the same message keeps going without any change of count number.
At the same time, there is no possibility to recover the command line on 
any console.
However I could loggin via ssh and I noticed that rose0 device is 
actually no more in the ifconfig list.

If I try to remove rose with rmmod rose I get :

root@raspberrypi:/home/pi# rmmod rose
libkmod: ERROR ../libkmod/libkmod-module.c:753 
kmod_module_remove_module: could not remove 'rose': Device or resource busy
Error: could not remove module rose: Device or resource busy

Does this help ?

On 31/07/2012 16:11, Bernard Pidoux wrote:
> Hi,
>
> I observe systematically a kernel panic when I try to shutdown rose0
> device using ifconfig rose0 down
>
> This is happening on two very different ROSE implementation, one is on a
> machine with x86-64 kernel 4.6.3 on an Intel core 2 duo CPU
> the other is on a RaspBerry Pi with Raspbian and 3.1.9+ wheezy kernel
> recompiled with AX.25 modules (ax25, rose, netrom, 6pack, kiss) enabled.
>
> Here is an image of the screen dump :
>
> http://f6bvp.org/photos/rose_device_event.JPG
>
> It can be noticed that PC is at rose_device_event and
> LR is at sock_def_wakeup
>
> One thing to be noticed is that when I close before all ROSE and AX.25
> applications, there are still a few populated sockets, probably for one
> of the program did not close the sockets properly.
>
> I that case, does rose module should accept to shutdown rose0 device ?
> However, I guess that it should not create a kernel panic due to a
> kernel NULL pointer.
>
> I don't know what to do in order to debug that issue.
>
> Bernard
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU: 0 Not tainted  (3.1.9+ #1) when ifconfig rose0 down
  2012-07-31 15:27 ` Bernard Pidoux
@ 2012-07-31 17:58   ` folkert
  2012-08-02 23:03   ` Eric W. Biederman
  1 sibling, 0 replies; 10+ messages in thread
From: folkert @ 2012-07-31 17:58 UTC (permalink / raw)
  To: Bernard Pidoux; +Cc: linux-hams, Linux Netdev List

> Here is a complementary observation.
> Trying to remove rose module with rmmod rose did not create any
> kernel panic.

A while ago I wrote a script that at random removed and inserted modules
into the system and that gave surprising results: not only crashing
system, also corrupt filesystems etc. Filed a bug-report and I was told
that removing a module might work but there's no guarantee.



Folkert van Heusden

-- 
Feeling generous? -> http://www.vanheusden.com/wishlist.php
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU: 0 Not tainted  (3.1.9+ #1) when ifconfig rose0 down
  2012-07-31 15:27 ` Bernard Pidoux
  2012-07-31 17:58   ` folkert
@ 2012-08-02 23:03   ` Eric W. Biederman
  1 sibling, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 2012-08-02 23:03 UTC (permalink / raw)
  To: Bernard Pidoux; +Cc: linux-hams, Linux Netdev List

Bernard Pidoux <bernard.pidoux@free•fr> writes:

> Here is a complementary observation.
> Trying to remove rose module with rmmod rose did not create any kernel panic.
> However, there is an endless message from the kernel saying :
>
> Message from syslogd@raspberrypi at Jul 31 17:22:40 ...
>  kernel:[  831.579007] unregister_netdevice: waiting for rose0 to become
> free. Usage count = 23
>
> Message from syslogd@raspberrypi at Jul 31 17:22:50 ...
>  kernel:[  841.739390] unregister_netdevice: waiting for rose0 to become
> free. Usage count = 23
>
> Message from syslogd@raspberrypi at Jul 31 17:23:00 ...
>  kernel:[  851.899758] unregister_netdevice: waiting for rose0 to become
> free. Usage count = 23
> .....
>
> As observed at many occasions, count number seems to be random ! and
> the same message keeps going without any change of count number.
> At the same time, there is no possibility to recover the command line on any
> console.
> However I could loggin via ssh and I noticed that rose0 device is actually no
> more in the ifconfig list.
>
> If I try to remove rose with rmmod rose I get :
>
> root@raspberrypi:/home/pi# rmmod rose
> libkmod: ERROR ../libkmod/libkmod-module.c:753 kmod_module_remove_module: could
> not remove 'rose': Device or resource busy
> Error: could not remove module rose: Device or resource busy
>
>
> Does this help ?

Something presumably in the rose code is leaking a reference to the rose
device.  That is probably a separate bug than your other one, but it may
be related.  These are some of my least favorite bugs to track down.

That said in rose there is only one call to dev_hold and only one call
to dev_put.

Hmm.  Looking at it dev_hold happens in rose_dev_get, and of which there
are several callers and only one of those callers calls dev_put.  So
it should just be a matter of adding some more dev_put calls in the
appropriate places.

The usage in rose_loopback_timer looks easy to fix, the usage in
rose_bind seems more of a challenge.

Are you enough of a developer to take that observation and look at
the code and fix it?

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU: 0 Not tainted  (3.1.9+ #1) when ifconfig rose0 down
  2012-07-31 14:11 CPU: 0 Not tainted (3.1.9+ #1) when ifconfig rose0 down Bernard Pidoux
  2012-07-31 15:27 ` Bernard Pidoux
@ 2012-07-31 17:56 ` folkert
  2012-08-02 22:55 ` Eric W. Biederman
  2 siblings, 0 replies; 10+ messages in thread
From: folkert @ 2012-07-31 17:56 UTC (permalink / raw)
  To: Bernard Pidoux; +Cc: linux-hams, Linux Netdev List

> I observe systematically a kernel panic when I try to shutdown rose0
> device using ifconfig rose0 down

That's funny: I always get a kernel panic(!) when doing an ifconfig
rose0 up!


Folkert van Heusden

-- 
Nagios user? Check out CoffeeSaint - the versatile Nagios status
viewer! http://www.vanheusden.com/java/CoffeeSaint/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU: 0 Not tainted  (3.1.9+ #1) when ifconfig rose0 down
  2012-07-31 14:11 CPU: 0 Not tainted (3.1.9+ #1) when ifconfig rose0 down Bernard Pidoux
  2012-07-31 15:27 ` Bernard Pidoux
  2012-07-31 17:56 ` folkert
@ 2012-08-02 22:55 ` Eric W. Biederman
  2012-08-03 14:32   ` folkert
  2 siblings, 1 reply; 10+ messages in thread
From: Eric W. Biederman @ 2012-08-02 22:55 UTC (permalink / raw)
  To: Bernard Pidoux; +Cc: linux-hams, Linux Netdev List

Bernard Pidoux <bernard.pidoux@free•fr> writes:

> Hi,
>
> I observe systematically a kernel panic when I try to shutdown rose0 device
> using ifconfig rose0 down
>
> This is happening on two very different ROSE implementation, one is on a machine
> with x86-64 kernel 4.6.3 on an Intel core 2 duo CPU
> the other is on a RaspBerry Pi with Raspbian and 3.1.9+ wheezy kernel
> recompiled with AX.25 modules (ax25, rose, netrom, 6pack, kiss) enabled.
>
> Here is an image of the screen dump :
>
> http://f6bvp.org/photos/rose_device_event.JPG

For some reason I can not connect f6bvp.org.

> It can be noticed that PC is at rose_device_event and
> LR is at sock_def_wakeup
>
> One thing to be noticed is that when I close before all ROSE and AX.25
> applications, there are still a few populated sockets, probably for one of the
> program did not close the sockets properly.
>
> I that case, does rose module should accept to shutdown rose0 device ?
> However, I guess that it should not create a kernel panic due to a kernel NULL
> pointer.

No.  The kernel should not panic.

> I don't know what to do in order to debug that issue.

I assume that rose is interesting to you and you would like for it to
work better?

In general you can read the code and figure out what it is doing ref
counting wise.  It doesn't look like anyone has done much with the rose
code for years except very basic maintenance across the kernel.  It
scares me to know that I was the last one to touch the rose code.

Part of what is happening at the time of the panic is
unregister_netdevice_notifier now generates synthetic removal for all of
the network devices in the system to remove the need for a special path
to handle network device removal in modules.

Unfortunately it looks like one of those modules is a problem.

You might want to simply try moving unregister_netdevice_notifier a bit
earlier in rose_exit and see if that helps.  Otherwise I would recommend
instrumenting the code up with some printk so you can understand what
part of unregistration is failing.

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU: 0 Not tainted  (3.1.9+ #1) when ifconfig rose0 down
  2012-08-02 22:55 ` Eric W. Biederman
@ 2012-08-03 14:32   ` folkert
  2012-08-05 16:23     ` Bernard Pidoux
  0 siblings, 1 reply; 10+ messages in thread
From: folkert @ 2012-08-03 14:32 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Bernard Pidoux, linux-hams, Linux Netdev List

> You might want to simply try moving unregister_netdevice_notifier a bit
> earlier in rose_exit and see if that helps.  Otherwise I would recommend
> instrumenting the code up with some printk so you can understand what
> part of unregistration is failing.

Or bisect! Give e.g. the oldest 2.6 kernel a try to find a last known
good and then do a bisect upto a known bad kernel.


Folkert van Heusden

-- 
To MultiTail einai ena polymorfiko ergaleio gia ta logfiles kai tin
eksodo twn entolwn. Prosferei: filtrarisma, xrwmatismo, sygxwneysi,
diaforetikes provoles. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU: 0 Not tainted  (3.1.9+ #1) when ifconfig rose0 down
  2012-08-03 14:32   ` folkert
@ 2012-08-05 16:23     ` Bernard Pidoux
  2012-08-05 17:23       ` Eric W. Biederman
  0 siblings, 1 reply; 10+ messages in thread
From: Bernard Pidoux @ 2012-08-05 16:23 UTC (permalink / raw)
  To: folkert; +Cc: Eric W. Biederman, linux-hams, Linux Netdev List

Thanks for suggesting a bissect. However I guess that this bug has 
always been there !

I am not professionaly involved in programming, however I committed a 
few patches for ROSE, AX.25 and NetRom modules since a few years.

I reactivated netconfig and here is the report showing that kernel panic 
occurs when rose_device_event is triggered when issuing command
ifconfig rose0 down

[ 1215.153302] rose_kill_by_device() rose->neighbour->use 0
[ 1215.153316] BUG: unable to handle kernel NULL pointer dereference at 
000000000000002a
[ 1215.153321] IP: [<ffffffffa065e37d>] rose_device_event+0x11d/0x160 [rose]
[ 1215.153333] PGD 36340067 PUD 359fa067 PMD 0
[ 1215.153338] Oops: 0002 [#1] SMP
[ 1215.153343] CPU 1
[ 1215.153344] Modules linked in: af_packet rose mkiss ax25 nfsd 
exportfs nfs nfs_acl auth_rpcgss fscache lockd sunrpc netconsole 
configfs bnep bluetooth rfkill snd_hda_codec_idt snd_hda_intel 
snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd 
i82975x_edac soundcore e1000e ppdev parport_pc parport edac_core 
iTCO_wdt iTCO_vendor_support serio_raw i2c_i801 processor coretemp evdev 
ipv6 autofs4 usbhid hid ext4 crc16 jbd2 sd_mod crc_t10dif firewire_ohci 
firewire_core ehci_hcd crc_itu_t uhci_hcd usbcore usb_common nouveau 
button video mxm_wmi wmi i2c_algo_bit drm_kms_helper ttm drm i2c_core 
ahci libahci ata_piix pata_marvell libata scsi_mod [last unloaded: 
microcode]
[ 1215.153395]
[ 1215.153398] Pid: 18637, comm: ifconfig Not tainted 3.4.7 #8 
/D975XBX2
[ 1215.153404] RIP: 0010:[<ffffffffa065e37d>]  [<ffffffffa065e37d>] 
rose_device_event+0x11d/0x160 [rose]
[ 1215.153411] RSP: 0000:ffff880035271ca8  EFLAGS: 00010296
[ 1215.153414] RAX: 0000000000000000 RBX: ffff88003d0c2838 RCX: 
0000000230924000
[ 1215.153417] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 
ffffffffa0665f28
[ 1215.153420] RBP: ffff880035271cb8 R08: 0000000000000002 R09: 
0000000000000000
[ 1215.153422] R10: 0000000000000003 R11: 0000000000000000 R12: 
ffff88003a9e1000
[ 1215.153425] R13: 00000000fffffff1 R14: ffffffffa05b0000 R15: 
0000000000000000
[ 1215.153429] FS:  00007fb7a318f700(0000) GS:ffff88003fa80000(0000) 
knlGS:0000000000000000
[ 1215.153433] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1215.153435] CR2: 000000000000002a CR3: 00000000393fd000 CR4: 
00000000000007e0
[ 1215.153438] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[ 1215.153441] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[ 1215.153445] Process ifconfig (pid: 18637, threadinfo 
ffff880035270000, task ffff88003f5f5d90)
[ 1215.153448] Stack:
[ 1215.153450]  0000000000000002 ffff88003a9e1000 ffff880035271cf8 
ffffffff8146d86d
[ 1215.153455]  ffff880036040d00 0000000000000002 ffff88003a9e1000 
0000000000000000
[ 1215.153459]  00000000ffffff9d 0000000000000000 ffff880035271d08 
ffffffff8107a2b6
[ 1215.153464] Call Trace:
[ 1215.153470]  [<ffffffff8146d86d>] notifier_call_chain+0x4d/0x70
[ 1215.153476]  [<ffffffff8107a2b6>] raw_notifier_call_chain+0x16/0x20
[ 1215.153483]  [<ffffffff81395136>] call_netdevice_notifiers+0x36/0x60
[ 1215.153487]  [<ffffffff8139b9ea>] __dev_notify_flags+0x6a/0x90
[ 1215.153491]  [<ffffffff8139ba55>] dev_change_flags+0x45/0x70
[ 1215.153496]  [<ffffffff81403aed>] devinet_ioctl+0x61d/0x7b0
[ 1215.153500]  [<ffffffff81403f05>] inet_ioctl+0x75/0x90
[ 1215.153505]  [<ffffffff8137fbd0>] sock_do_ioctl+0x30/0x70
[ 1215.153509]  [<ffffffff8137fc89>] sock_ioctl+0x79/0x2f0
[ 1215.153514]  [<ffffffff811829d8>] do_vfs_ioctl+0x98/0x560
[ 1215.153517]  [<ffffffff81182f31>] sys_ioctl+0x91/0xa0
[ 1215.153522]  [<ffffffff81471b39>] system_call_fastpath+0x16/0x1b
[ 1215.153525] Code: e0 5b 41 5c 31 c0 5d c3 48 8d 7b c8 31 c9 ba 09 00 
00 00 be 65 00 00 00 e8 a1 5c 00 00 48 8b 83 b8 04 00 00 48 c7 c7 28 5f 
66 a0 <66> 83 68 2a 01 48 8b 83 b8 04 00 00 0f b7 70 2a 31 c0 e8 d4 10
[ 1215.153561] RIP  [<ffffffffa065e37d>] rose_device_event+0x11d/0x160 
[rose]
[ 1215.153567]  RSP <ffff880035271ca8>
[ 1215.153569] CR2: 000000000000002a
[ 1215.177577] ---[ end trace d23a7ddff228876c ]---
[ 1215.177589] Kernel panic - not syncing: Fatal exception in interrupt
[ 1215.177662] panic occurred, switching back to text console
[ 1215.177717] Rebooting in 60 seconds..

I inserted some printk into rose_device_event() and commented calls to 
subroutines.
Without calling subroutines, there is no more kernel panic.
Same results when replacing rose_kill_by_device() in net/rose/af_rose.c, 
rose_link_device_down() and rose_rt_device() in
net/rose_route.c by a dummy functions with just a printk inside.

I am glad that I found make parameters that shorten the debugging
cycle :

make modules SUBDIRS=net/rose
make modules_install SUBDIRS=net/rose

Now I have to go further into each subroutines step by step in order to 
find out the falty code !


Bernard


On 03/08/2012 16:32, folkert wrote:
>> You might want to simply try moving unregister_netdevice_notifier a bit
>> earlier in rose_exit and see if that helps.  Otherwise I would recommend
>> instrumenting the code up with some printk so you can understand what
>> part of unregistration is failing.
>
> Or bisect! Give e.g. the oldest 2.6 kernel a try to find a last known
> good and then do a bisect upto a known bad kernel.
>
>
> Folkert van Heusden
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU: 0 Not tainted  (3.1.9+ #1) when ifconfig rose0 down
  2012-08-05 16:23     ` Bernard Pidoux
@ 2012-08-05 17:23       ` Eric W. Biederman
  0 siblings, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 2012-08-05 17:23 UTC (permalink / raw)
  To: Bernard Pidoux; +Cc: folkert, linux-hams, Linux Netdev List

Bernard Pidoux <bernard.pidoux@free•fr> writes:

> Thanks for suggesting a bissect. However I guess that this bug has always been
> there !

One thing that has changed recently is I modified
unregister_netdev_notifier to create synthetic events so drivers don't
need a separate network device tear down path in module unload.

For rose it looks like this has triggered latent bugs.

> I am not professionaly involved in programming, however I committed a few
> patches for ROSE, AX.25 and NetRom modules since a few years.
>
> I reactivated netconfig and here is the report showing that kernel panic occurs
> when rose_device_event is triggered when issuing command
> ifconfig rose0 down

Oh ouch!  This is from an ioctl not even at module unload time.

> [ 1215.153302] rose_kill_by_device() rose->neighbour->use 0
> [ 1215.153316] BUG: unable to handle kernel NULL pointer dereference at
> 000000000000002a
> [ 1215.153321] IP: [<ffffffffa065e37d>] rose_device_event+0x11d/0x160 [rose]
> [ 1215.153333] PGD 36340067 PUD 359fa067 PMD 0
> [ 1215.153338] Oops: 0002 [#1] SMP
> [ 1215.153343] CPU 1
> [ 1215.153344] Modules linked in: af_packet rose mkiss ax25 nfsd exportfs nfs
> nfs_acl auth_rpcgss fscache lockd sunrpc netconsole configfs bnep bluetooth
> rfkill snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm
> snd_page_alloc snd_timer snd i82975x_edac soundcore e1000e ppdev parport_pc
> parport edac_core iTCO_wdt iTCO_vendor_support serio_raw i2c_i801 processor
> coretemp evdev ipv6 autofs4 usbhid hid ext4 crc16 jbd2 sd_mod crc_t10dif
> firewire_ohci firewire_core ehci_hcd crc_itu_t uhci_hcd usbcore usb_common
> nouveau button video mxm_wmi wmi i2c_algo_bit drm_kms_helper ttm drm i2c_core
> ahci libahci ata_piix pata_marvell libata scsi_mod [last unloaded: microcode]
> [ 1215.153395]
> [ 1215.153398] Pid: 18637, comm: ifconfig Not tainted 3.4.7 #8 /D975XBX2
> [ 1215.153404] RIP: 0010:[<ffffffffa065e37d>]  [<ffffffffa065e37d>]
> rose_device_event+0x11d/0x160 [rose]
> [ 1215.153411] RSP: 0000:ffff880035271ca8  EFLAGS: 00010296
> [ 1215.153414] RAX: 0000000000000000 RBX: ffff88003d0c2838 RCX: 0000000230924000
> [ 1215.153417] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffffa0665f28
> [ 1215.153420] RBP: ffff880035271cb8 R08: 0000000000000002 R09: 0000000000000000
> [ 1215.153422] R10: 0000000000000003 R11: 0000000000000000 R12: ffff88003a9e1000
> [ 1215.153425] R13: 00000000fffffff1 R14: ffffffffa05b0000 R15: 0000000000000000
> [ 1215.153429] FS:  00007fb7a318f700(0000) GS:ffff88003fa80000(0000)
> knlGS:0000000000000000
> [ 1215.153433] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 1215.153435] CR2: 000000000000002a CR3: 00000000393fd000 CR4: 00000000000007e0
> [ 1215.153438] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1215.153441] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 1215.153445] Process ifconfig (pid: 18637, threadinfo ffff880035270000, task
> ffff88003f5f5d90)
> [ 1215.153448] Stack:
> [ 1215.153450]  0000000000000002 ffff88003a9e1000 ffff880035271cf8
> ffffffff8146d86d
> [ 1215.153455]  ffff880036040d00 0000000000000002 ffff88003a9e1000
> 0000000000000000
> [ 1215.153459]  00000000ffffff9d 0000000000000000 ffff880035271d08
> ffffffff8107a2b6
> [ 1215.153464] Call Trace:
> [ 1215.153470]  [<ffffffff8146d86d>] notifier_call_chain+0x4d/0x70
> [ 1215.153476]  [<ffffffff8107a2b6>] raw_notifier_call_chain+0x16/0x20
> [ 1215.153483]  [<ffffffff81395136>] call_netdevice_notifiers+0x36/0x60
> [ 1215.153487]  [<ffffffff8139b9ea>] __dev_notify_flags+0x6a/0x90
> [ 1215.153491]  [<ffffffff8139ba55>] dev_change_flags+0x45/0x70
> [ 1215.153496]  [<ffffffff81403aed>] devinet_ioctl+0x61d/0x7b0
> [ 1215.153500]  [<ffffffff81403f05>] inet_ioctl+0x75/0x90
> [ 1215.153505]  [<ffffffff8137fbd0>] sock_do_ioctl+0x30/0x70
> [ 1215.153509]  [<ffffffff8137fc89>] sock_ioctl+0x79/0x2f0
> [ 1215.153514]  [<ffffffff811829d8>] do_vfs_ioctl+0x98/0x560
> [ 1215.153517]  [<ffffffff81182f31>] sys_ioctl+0x91/0xa0
> [ 1215.153522]  [<ffffffff81471b39>] system_call_fastpath+0x16/0x1b
> [ 1215.153525] Code: e0 5b 41 5c 31 c0 5d c3 48 8d 7b c8 31 c9 ba 09 00 00 00 be
> 65 00 00 00 e8 a1 5c 00 00 48 8b 83 b8 04 00 00 48 c7 c7 28 5f 66 a0 <66> 83 68
> 2a 01 48 8b 83 b8 04 00 00 0f b7 70 2a 31 c0 e8 d4 10
> [ 1215.153561] RIP  [<ffffffffa065e37d>] rose_device_event+0x11d/0x160 [rose]
> [ 1215.153567]  RSP <ffff880035271ca8>
> [ 1215.153569] CR2: 000000000000002a
> [ 1215.177577] ---[ end trace d23a7ddff228876c ]---
> [ 1215.177589] Kernel panic - not syncing: Fatal exception in interrupt
> [ 1215.177662] panic occurred, switching back to text console
> [ 1215.177717] Rebooting in 60 seconds..
>
> I inserted some printk into rose_device_event() and commented calls to
> subroutines.
> Without calling subroutines, there is no more kernel panic.
> Same results when replacing rose_kill_by_device() in net/rose/af_rose.c,
> rose_link_device_down() and rose_rt_device() in
> net/rose_route.c by a dummy functions with just a printk inside.
>
> I am glad that I found make parameters that shorten the debugging
> cycle :
>
> make modules SUBDIRS=net/rose
> make modules_install SUBDIRS=net/rose
>
> Now I have to go further into each subroutines step by step in order to find out
> the falty code !

Argh inline functions obscuring the backtraces again.

If you are any good at reading assembly you might be able to cut the
tracing time down by using gdb to get a disassembly and looking in the
disassembly.

Although in something easily reproducible what I tend to do is I have
a printk that does.

printk(KERN_EMERG "@%s.%s.%d\n", __FILE__, __func__, __LINE__);

Which I sprinkle all over the code so I know where I fail, and don't
need to cycle too many times.

Sometimes it can be productive to use kvm or a second machine so your
cycle times aren't as long.

In any event I wish you luck tracking this down.

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: CPU: 0 Not tainted  (3.1.9+ #1) when ifconfig rose0 down
@ 2012-08-05 17:08 Folkert van Heusden
  0 siblings, 0 replies; 10+ messages in thread
From: Folkert van Heusden @ 2012-08-05 17:08 UTC (permalink / raw)
  To: Bernard Pidoux, folkert; +Cc: Eric W. Biederman, linux-hams, Linux Netdev List

Good luck!
I'm to busy myself with debugging the baycom driver.
Can do at most 2 steps a day and it are 15 steps... (54k changesets)

Bernard Pidoux <bernard.pidoux@free•fr> wrote:

>Thanks for suggesting a bissect. However I guess that this bug has 
>always been there !
>
>I am not professionaly involved in programming, however I committed a 
>few patches for ROSE, AX.25 and NetRom modules since a few years.
>
>I reactivated netconfig and here is the report showing that kernel panic 
>occurs when rose_device_event is triggered when issuing command
>ifconfig rose0 down
>
>[ 1215.153302] rose_kill_by_device() rose->neighbour->use 0
>[ 1215.153316] BUG: unable to handle kernel NULL pointer dereference at 
>000000000000002a
>[ 1215.153321] IP: [<ffffffffa065e37d>] rose_device_event+0x11d/0x160 [rose]
>[ 1215.153333] PGD 36340067 PUD 359fa067 PMD 0
>[ 1215.153338] Oops: 0002 [#1] SMP
>[ 1215.153343] CPU 1
>[ 1215.153344] Modules linked in: af_packet rose mkiss ax25 nfsd 
>exportfs nfs nfs_acl auth_rpcgss fscache lockd sunrpc netconsole 
>configfs bnep bluetooth rfkill snd_hda_codec_idt snd_hda_intel 
>snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd 
>i82975x_edac soundcore e1000e ppdev parport_pc parport edac_core 
>iTCO_wdt iTCO_vendor_support serio_raw i2c_i801 processor coretemp evdev 
>ipv6 autofs4 usbhid hid ext4 crc16 jbd2 sd_mod crc_t10dif firewire_ohci 
>firewire_core ehci_hcd crc_itu_t uhci_hcd usbcore usb_common nouveau 
>button video mxm_wmi wmi i2c_algo_bit drm_kms_helper ttm drm i2c_core 
>ahci libahci ata_piix pata_marvell libata scsi_mod [last unloaded: 
>microcode]
>[ 1215.153395]
>[ 1215.153398] Pid: 18637, comm: ifconfig Not tainted 3.4.7 #8 
>/D975XBX2
>[ 1215.153404] RIP: 0010:[<ffffffffa065e37d>]  [<ffffffffa065e37d>] 
>rose_device_event+0x11d/0x160 [rose]
>[ 1215.153411] RSP: 0000:ffff880035271ca8  EFLAGS: 00010296
>[ 1215.153414] RAX: 0000000000000000 RBX: ffff88003d0c2838 RCX: 
>0000000230924000
>[ 1215.153417] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 
>ffffffffa0665f28
>[ 1215.153420] RBP: ffff880035271cb8 R08: 0000000000000002 R09: 
>0000000000000000
>[ 1215.153422] R10: 0000000000000003 R11: 0000000000000000 R12: 
>ffff88003a9e1000
>[ 1215.153425] R13: 00000000fffffff1 R14: ffffffffa05b0000 R15: 
>0000000000000000
>[ 1215.153429] FS:  00007fb7a318f700(0000) GS:ffff88003fa80000(0000) 
>knlGS:0000000000000000
>[ 1215.153433] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>[ 1215.153435] CR2: 000000000000002a CR3: 00000000393fd000 CR4: 
>00000000000007e0
>[ 1215.153438] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>0000000000000000
>[ 1215.153441] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
>0000000000000400
>[ 1215.153445] Process ifconfig (pid: 18637, threadinfo 
>ffff880035270000, task ffff88003f5f5d90)
>[ 1215.153448] Stack:
>[ 1215.153450]  0000000000000002 ffff88003a9e1000 ffff880035271cf8 
>ffffffff8146d86d
>[ 1215.153455]  ffff880036040d00 0000000000000002 ffff88003a9e1000 
>0000000000000000
>[ 1215.153459]  00000000ffffff9d 0000000000000000 ffff880035271d08 
>ffffffff8107a2b6
>[ 1215.153464] Call Trace:
>[ 1215.153470]  [<ffffffff8146d86d>] notifier_call_chain+0x4d/0x70
>[ 1215.153476]  [<ffffffff8107a2b6>] raw_notifier_call_chain+0x16/0x20
>[ 1215.153483]  [<ffffffff81395136>] call_netdevice_notifiers+0x36/0x60
>[ 1215.153487]  [<ffffffff8139b9ea>] __dev_notify_flags+0x6a/0x90
>[ 1215.153491]  [<ffffffff8139ba55>] dev_change_flags+0x45/0x70
>[ 1215.153496]  [<ffffffff81403aed>] devinet_ioctl+0x61d/0x7b0
>[ 1215.153500]  [<ffffffff81403f05>] inet_ioctl+0x75/0x90
>[ 1215.153505]  [<ffffffff8137fbd0>] sock_do_ioctl+0x30/0x70
>[ 1215.153509]  [<ffffffff8137fc89>] sock_ioctl+0x79/0x2f0
>[ 1215.153514]  [<ffffffff811829d8>] do_vfs_ioctl+0x98/0x560
>[ 1215.153517]  [<ffffffff81182f31>] sys_ioctl+0x91/0xa0
>[ 1215.153522]  [<ffffffff81471b39>] system_call_fastpath+0x16/0x1b
>[ 1215.153525] Code: e0 5b 41 5c 31 c0 5d c3 48 8d 7b c8 31 c9 ba 09 00 
>00 00 be 65 00 00 00 e8 a1 5c 00 00 48 8b 83 b8 04 00 00 48 c7 c7 28 5f 
>66 a0 <66> 83 68 2a 01 48 8b 83 b8 04 00 00 0f b7 70 2a 31 c0 e8 d4 10
>[ 1215.153561] RIP  [<ffffffffa065e37d>] rose_device_event+0x11d/0x160 
>[rose]
>[ 1215.153567]  RSP <ffff880035271ca8>
>[ 1215.153569] CR2: 000000000000002a
>[ 1215.177577] ---[ end trace d23a7ddff228876c ]---
>[ 1215.177589] Kernel panic - not syncing: Fatal exception in interrupt
>[ 1215.177662] panic occurred, switching back to text console
>[ 1215.177717] Rebooting in 60 seconds..
>
>I inserted some printk into rose_device_event() and commented calls to 
>subroutines.
>Without calling subroutines, there is no more kernel panic.
>Same results when replacing rose_kill_by_device() in net/rose/af_rose.c, 
>rose_link_device_down() and rose_rt_device() in
>net/rose_route.c by a dummy functions with just a printk inside.
>
>I am glad that I found make parameters that shorten the debugging
>cycle :
>
>make modules SUBDIRS=net/rose
>make modules_install SUBDIRS=net/rose
>
>Now I have to go further into each subroutines step by step in order to 
>find out the falty code !
>
>
>Bernard
>
>
>On 03/08/2012 16:32, folkert wrote:
>>> You might want to simply try moving unregister_netdevice_notifier a bit
>>> earlier in rose_exit and see if that helps.  Otherwise I would recommend
>>> instrumenting the code up with some printk so you can understand what
>>> part of unregistration is failing.
>>
>> Or bisect! Give e.g. the oldest 2.6 kernel a try to find a last known
>> good and then do a bisect upto a known bad kernel.
>>
>>
>> Folkert van Heusden
>>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-08-05 17:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-31 14:11 CPU: 0 Not tainted (3.1.9+ #1) when ifconfig rose0 down Bernard Pidoux
2012-07-31 15:27 ` Bernard Pidoux
2012-07-31 17:58   ` folkert
2012-08-02 23:03   ` Eric W. Biederman
2012-07-31 17:56 ` folkert
2012-08-02 22:55 ` Eric W. Biederman
2012-08-03 14:32   ` folkert
2012-08-05 16:23     ` Bernard Pidoux
2012-08-05 17:23       ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2012-08-05 17:08 Folkert van Heusden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox