public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay•com>
To: Brian Bloniarz <bmb@athenacr•com>
Cc: David Miller <davem@davemloft•net>,
	kchang@athenacr•com, netdev@vger•kernel.org,
	cl@linux-foundation•org
Subject: Re: Multicast packet loss
Date: Sun, 05 Apr 2009 15:49:14 +0200	[thread overview]
Message-ID: <49D8B6DA.7050902@cosmosbay.com> (raw)
In-Reply-To: <49D66379.7070106@athenacr.com>

Brian Bloniarz a écrit :
> Hi Eric,
> 
> We've been experimenting with this softirq-delay patch in production, and
> have seen some hard-to-reproduce crashes. We finally managed to capture a
> kexec crashdump this morning.
> 
> This is the dmesg:
> 
> [53417.592868] Unable to handle kernel NULL pointer dereference at
> 0000000000000000 RIP:
> [53417.598377]  [<ffffffff80243643>] __do_softirq+0xc3/0x150
> [53417.606300] PGD 32abb8067 PUD 32faf5067 PMD 0
> [53417.610829] Oops: 0000 [1] SMP
> [53417.614032] CPU 2
> [53417.616083] Modules linked in: nfs lockd nfs_acl sunrpc openafs(P)
> autofs4 ipv6 ac sbs sbshc video output dock battery container
> iptable_filter ip_tables x_tables parport_pc lp parport loop joydev
> iTCO_wdt iTCO_vendor_support evdev button i5000_edac psmouse serio_raw
> pcspkr shpchp pci_hotplug edac_core ext3 jbd mbcache sr_mod cdrom
> ata_generic usbhid hid ata_piix sg sd_mod ehci_hcd pata_acpi uhci_hcd
> libata bnx2 aacraid usbcore scsi_mod thermal processor fan fbcon
> tileblit font bitblit softcursor fuse
> [53417.662067] Pid: 13039, comm: gball Tainted: P       
> 2.6.24-19acr2-generic #1
> [53417.669219] RIP: 0010:[<ffffffff80243643>]  [<ffffffff80243643>]
> __do_softirq+0xc3/0x150
> [53417.677368] RSP: 0018:ffff8103314f3f20  EFLAGS: 00010297
> [53417.682697] RAX: ffff810084a1b000 RBX: ffffffff805ba530 RCX:
> 0000000000000000
> [53417.689843] RDX: ffff8103305811e0 RSI: 0000000000000282 RDI:
> ffff810332ada580
> [53417.696993] RBP: 0000000000000000 R08: ffff81032fad9f08 R09:
> ffff810332382000
> [53417.704144] R10: 0000000000000000 R11: ffffffff80316ec0 R12:
> ffffffff8062b3d8
> [53417.711294] R13: ffffffff8062b480 R14: 0000000000000002 R15:
> 000000000000000a
> [53417.718447] FS:  00007fab0d7b8750(0000) GS:ffff810334401b80(0000)
> knlGS:0000000000000000
> [53417.726568] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [53417.732332] CR2: 0000000000000000 CR3: 0000000329e2d000 CR4:
> 00000000000006e0
> [53417.739476] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [53417.746637] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [53417.753787] Process gball (pid: 13039, threadinfo ffff81032adde000,
> task ffff810329ff77d0)
> [53417.761991] Stack:  ffffffff8062b3d8 0000000000000046
> ffff8103314f3f68 0000000000000000
> [53417.770146]  00000000000000a0 ffff81032addfee8 0000000000000000
> ffffffff8020d50c
> [53417.777660]  ffff8103314f3f68 00000000000000c1 ffffffff8020ed25
> ffffffff8062c870
> [53417.784961] Call Trace:
> [53417.787635]  <IRQ>  [<ffffffff8020d50c>] call_softirq+0x1c/0x30
> [53417.793597]  [<ffffffff8020ed25>] do_softirq+0x35/0x90
> [53417.798747]  [<ffffffff80243578>] irq_exit+0x88/0x90
> [53417.803727]  [<ffffffff8020ef70>] do_IRQ+0x80/0x100
> [53417.808624]  [<ffffffff8020c891>] ret_from_intr+0x0/0xa
> [53417.813862]  <EOI>  [<ffffffff803e53c8>] skb_release_all+0x18/0x150
> [53417.820164]  [<ffffffff803e4ad9>] __kfree_skb+0x9/0x90
> [53417.825327]  [<ffffffff80437612>] udp_recvmsg+0x222/0x260
> [53417.830744]  [<ffffffff80231264>] source_load+0x34/0x70
> [53417.835984]  [<ffffffff80232a9a>] find_busiest_group+0x1fa/0x850
> [53417.842019]  [<ffffffff803e0100>] sock_common_recvmsg+0x30/0x50
> [53417.847958]  [<ffffffff803de1ca>] sock_recvmsg+0x14a/0x160
> [53417.853462]  [<ffffffff80231c21>] update_curr+0x71/0x100
> [53419.858789]  [<ffffffff802320fd>] __dequeue_entity+0x3d/0x50
> [53417.864469]  [<ffffffff80253ab0>] autoremove_wake_function+0x0/0x30
> [53417.870758]  [<ffffffff8046662f>] thread_return+0x3a/0x57b
> [53417.876262]  [<ffffffff803df73e>] sys_recvfrom+0xfe/0x190
> [53417.881680]  [<ffffffff802e2a95>] sys_epoll_wait+0x245/0x4e0
> [53417.887358]  [<ffffffff80233e20>] default_wake_function+0x0/0x10
> [53417.893384]  [<ffffffff8020c37e>] system_call+0x7e/0x83
> [53417.898628]
> [53417.900134]
> [53417.900134] Code: 48 8b 11 48 89 cf 65 48 8b 04 25 08 00 00 00 4a 89
> 14 20 ff
> [53417.909430] RIP  [<ffffffff80243643>] __do_softirq+0xc3/0x150
> [53417.915210]  RSP <ffff8103314f3f20>
> 
> The disassembly where it crashed:
> /local/home/bmb/doc/kernels/linux-hardy-eric/kernel/softirq.c:273
> ffffffff8024361b:       d1 ed                   shr    %ebp
> rcu_bh_qsctr_inc():
> /local/home/bmb/doc/kernels/linux-hardy-eric/include/linux/rcupdate.h:130
> ffffffff8024361d:       48 8b 40 08             mov    0x8(%rax),%rax
> ffffffff80243621:       41 c7 44 05 08 01 00    movl  
> $0x1,0x8(%r13,%rax,1)
> ffffffff80243628:       00 00
> __do_softirq():
> /local/home/bmb/doc/kernels/linux-hardy-eric/kernel/softirq.c:273
> ffffffff8024362a:       75 d8                   jne    ffffffff80243604
> <__do_softirq+0x84>
> softirq_delay_exec():
> /local/home/bmb/doc/kernels/linux-hardy-eric/kernel/softirq.c:225
> ffffffff8024362c:       48 8b 14 24             mov    (%rsp),%rdx
> ffffffff80243630:       65 48 8b 04 25 08 00    mov    %gs:0x8,%rax
> ffffffff80243637:       00 00
> ffffffff80243639:       48 8b 0c 10             mov    (%rax,%rdx,1),%rcx
> ffffffff8024363d:       48 83 f9 01             cmp    $0x1,%rcx
> ffffffff80243641:       74 29                   je     ffffffff8024366c
> <__do_softirq+0xec>
> /local/home/bmb/doc/kernels/linux-hardy-eric/kernel/softirq.c:226
> ffffffff80243643:       48 8b 11                mov    (%rcx),%rdx
> /local/home/bmb/doc/kernels/linux-hardy-eric/kernel/softirq.c:227
> ffffffff80243646:       48 89 cf                mov    %rcx,%rdi
> /local/home/bmb/doc/kernels/linux-hardy-eric/kernel/softirq.c:226
> ffffffff80243649:       65 48 8b 04 25 08 00    mov    %gs:0x8,%rax
> ffffffff80243650:       00 00
> ffffffff80243652:       4a 89 14 20             mov    %rdx,(%rax,%r12,1)
> /local/home/bmb/doc/kernels/linux-hardy-eric/kernel/softirq.c:227
> ffffffff80243656:       ff 51 08                callq  *0x8(%rcx)
> /local/home/bmb/doc/kernels/linux-hardy-eric/kernel/softirq.c:225
> ffffffff80243659:       65 48 8b 04 25 08 00    mov    %gs:0x8,%rax
> ffffffff80243660:       00 00
> ffffffff80243662:       4a 8b 0c 20             mov    (%rax,%r12,1),%rcx
> ffffffff80243666:       48 83 f9 01             cmp    $0x1,%rcx
> ffffffff8024366a:       75 d7                   jne    ffffffff80243643
> <__do_softirq+0xc3>
> raw_local_irq_disable():
> /local/home/bmb/doc/kernels/linux-hardy-eric/debian/build/build-generic/include2/asm/irqflags_64.h:76
> 
> ffffffff8024366c:       fa                      cli
> 
> And softirq.c line numbers:
>    218   * Because locking is provided by subsystem, please note
>    219   * that sdel->func(sdel) is responsible for setting sdel->next
> to NULL
>    220   */
>    221  static void softirq_delay_exec(void)
>    222  {
>    223          struct softirq_delay *sdel;
>    224
>    225          while ((sdel = __get_cpu_var(softirq_delay_head)) !=
> SOFTIRQ_DELAY_END) {
>    226                  __get_cpu_var(softirq_delay_head) = sdel->next;
>    227                  sdel->func(sdel);       /*      sdel->next =
> NULL;*/
>    228                  }
>    229  }
> 
> So it's crashing because __get_cpu_var(softirq_delay_head)) is NULL
> somehow.
> 
> We aren't running a recent kernel -- we're running Ubuntu Hardy's
> 2.6.24-19,
> with a backported version of this patch. One more atypical thing is that
> we run openafs, 1.4.6.dfsg1-2.
> 
> Like I said, I have a full vmcore (3, actually) and would be happy to
> post any
> more information you'd like to know.
> 
> Thanks,
> Brian Bloniarz

Hi Brian

2.6.24-19 kernel... hmm...

Could you please send me the diff of your backport against this kernel ?

I take you use Ubuntu Hardys 8.04 LTS server edition ?

Pointer being null might tell us that we managed to call inet_def_readable()
without socket lock hold...


  reply	other threads:[~2009-04-05 13:50 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-30 17:49 Multicast packet loss Kenny Chang
2009-01-30 19:04 ` Eric Dumazet
2009-01-30 19:17 ` Denys Fedoryschenko
2009-01-30 20:03 ` Neil Horman
2009-01-30 22:29   ` Kenny Chang
2009-01-30 22:41     ` Eric Dumazet
2009-01-31 16:03       ` Neil Horman
2009-02-02 16:13         ` Kenny Chang
2009-02-02 16:48         ` Kenny Chang
2009-02-03 11:55           ` Neil Horman
2009-02-03 15:20             ` Kenny Chang
2009-02-04  1:15               ` Neil Horman
2009-02-04 16:07                 ` Kenny Chang
2009-02-04 16:46                   ` Wesley Chow
2009-02-04 18:11                     ` Eric Dumazet
2009-02-05 13:33                       ` Neil Horman
2009-02-05 13:46                         ` Wesley Chow
2009-02-05 13:29                   ` Neil Horman
2009-02-01 12:40       ` Eric Dumazet
2009-02-02 13:45         ` Neil Horman
2009-02-02 16:57           ` Eric Dumazet
2009-02-02 18:22             ` Neil Horman
2009-02-02 19:51               ` Wes Chow
2009-02-02 20:29                 ` Eric Dumazet
2009-02-02 21:09                   ` Wes Chow
2009-02-02 21:31                     ` Eric Dumazet
2009-02-03 17:34                       ` Kenny Chang
2009-02-04  1:21                         ` Neil Horman
2009-02-26 17:15                           ` Kenny Chang
2009-02-28  8:51                             ` Eric Dumazet
2009-03-01 17:03                               ` Eric Dumazet
2009-03-04  8:16                               ` David Miller
2009-03-04  8:36                                 ` Eric Dumazet
2009-03-07  7:46                                   ` Eric Dumazet
2009-03-08 16:46                                     ` Eric Dumazet
2009-03-09  2:49                                       ` David Miller
2009-03-09  6:36                                         ` Eric Dumazet
2009-03-13 21:51                                           ` David Miller
2009-03-13 22:30                                             ` Eric Dumazet
2009-03-13 22:38                                               ` David Miller
2009-03-13 22:45                                                 ` Eric Dumazet
2009-03-14  9:03                                                   ` [PATCH] net: reorder fields of struct socket Eric Dumazet
2009-03-16  2:59                                                     ` David Miller
2009-03-16 22:22                                                 ` Multicast packet loss Eric Dumazet
2009-03-17 10:11                                                   ` Peter Zijlstra
2009-03-17 11:08                                                     ` Eric Dumazet
2009-03-17 11:57                                                       ` Peter Zijlstra
2009-03-17 15:00                                                       ` Brian Bloniarz
2009-03-17 15:16                                                         ` Eric Dumazet
2009-03-17 19:39                                                           ` David Stevens
2009-03-17 21:19                                                             ` Eric Dumazet
2009-04-03 19:28                                                   ` Brian Bloniarz
2009-04-05 13:49                                                     ` Eric Dumazet [this message]
2009-04-06 21:53                                                       ` Brian Bloniarz
2009-04-06 22:12                                                         ` Brian Bloniarz
2009-04-07 20:08                                                       ` Brian Bloniarz
2009-04-08  8:12                                                         ` Eric Dumazet
2009-03-09 22:56                                       ` Brian Bloniarz
2009-03-10  5:28                                         ` Eric Dumazet
2009-03-10 23:22                                           ` Brian Bloniarz
2009-03-11  3:00                                             ` Eric Dumazet
2009-03-12 15:47                                               ` Brian Bloniarz
2009-03-12 16:34                                                 ` Eric Dumazet
2009-02-27 18:40       ` Christoph Lameter
2009-02-27 18:56         ` Eric Dumazet
2009-02-27 19:45           ` Christoph Lameter
2009-02-27 20:12             ` Eric Dumazet
2009-02-27 21:36               ` Eric Dumazet
2009-02-02 13:53     ` Eric Dumazet
  -- strict thread matches above, loose matches on Subject: below --
2009-04-05 14:42 bmb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49D8B6DA.7050902@cosmosbay.com \
    --to=dada1@cosmosbay$(echo .)com \
    --cc=bmb@athenacr$(echo .)com \
    --cc=cl@linux-foundation$(echo .)org \
    --cc=davem@davemloft$(echo .)net \
    --cc=kchang@athenacr$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox