public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: ebiederm@xmission•com (Eric W. Biederman)
To: David Dillow <dave@thedillows•org>
Cc: "Michael Riepe" <michael.riepe@googlemail•com>,
	"Michael Buesch" <mb@bu3sch•de>,
	"Francois Romieu" <romieu@fr•zoreil.com>,
	"Rui Santos" <rsantos@grupopie•com>,
	"Michael Büker" <m.bueker@berlin•de>,
	linux-kernel@vger•kernel.org, netdev@vger•kernel.org
Subject: Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
Date: Fri, 21 Aug 2009 13:57:49 -0700	[thread overview]
Message-ID: <m1skfkrik2.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <1243042174.3580.23.camel@obelisk.thedillows.org> (David Dillow's message of "Fri\, 22 May 2009 21\:29\:34 -0400")

David Dillow <dave@thedillows•org> writes:

> The 8169 chip only generates MSI interrupts when all enabled event
> sources are quiescent and one or more sources transition to active. If
> not all of the active events are acknowledged, or a new event becomes
> active while the existing ones are cleared in the handler, we will not
> see a new interrupt.
>
> The current interrupt handler masks off the Rx and Tx events once the
> NAPI handler has been scheduled, which opens a race window in which we
> can get another Rx or Tx event and never ACK'ing it, stopping all
> activity until the link is reset (ifconfig down/up). Fix this by always
> ACK'ing all event sources, and loop in the handler until we have all
> sources quiescent.
>
> Signed-off-by: David Dillow <dave@thedillows•org>
> ---
> This fixes the lockups I've seen. Both MSI and level-triggered interrupt
> configurations survive over an hour of testing when it would lockup in
> under 90 seconds before. I am certain of the analysis of the root cause,
> but there may be better ways to fix it. There may also be a theoretical
> race window between the ending of a NAPI poll cycle and a link change
> interrupt coming in, but I'm not sure it would matter. 
>
> Some variant of this should also be applied to the currently running
> stable trees, as the problem is long-standing.

I have what at first glance looks like a problem caused by this
patch.  For the last month since upgrading one of my machines from
2.6.28 to 2.6.30 it has been becomming inaccessible from the
network and I have a few:

NETDEV WATCHDOG: eth0 (r8169): transmit timed out

in my logs and a lot soft lockups that always have rtl8169_interrupt
as the thing that is running.   I suspect your patch has introduced
a near infinite loop in the interrupt handler and is causing these
soft lockups.

Any ideas?

Eric

BUG: soft lockup - CPU#3 stuck for 61s! [swapper:0]
CPU 3:
Pid: 0, comm: swapper Tainted: G        W  2.6.30-170263.2006.Arora.fc11.x86_64 #1 G33M-S2
RIP: 0010:[<ffffffffa01deacd>]  [<ffffffffa01deacd>] rtl8169_interrupt+0x26f/0x2b7 [r8169]
RSP: 0018:ffff880028070cb0  EFLAGS: 00000206
RAX: 0000000000000050 RBX: ffff880028070d10 RCX: ffff88002807b9e0
RDX: ffffc2000065c03e RSI: ffff88012d79a000 RDI: 0000000000000246
RBP: ffffffff8100c9d3 R08: ffff88012fae0000 R09: ffff880028070ec0
R10: 077321422cb06619 R11: 000000003c5efb73 R12: ffff880028070c30
R13: ffff88012d79a000 R14: ffff88012d79a600 R15: 077321422cb06619
FS:  0000000000000000(0000) GS:ffff88002806d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fc10010c000 CR3: 0000000000201000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>  [<ffffffff81093f0b>] ? handle_IRQ_event+0x6a/0x13f
 [<ffffffff810219fa>] ? apic_write+0x24/0x3a
 [<ffffffff8109607a>] ? handle_edge_irq+0xdb/0x138
 [<ffffffff81012fbd>] ? native_sched_clock+0x2d/0x54
 [<ffffffff8100e996>] ? handle_irq+0x95/0xb7
 [<ffffffff8100df42>] ? do_IRQ+0x6a/0xe9
 [<ffffffff8100c853>] ? ret_from_intr+0x0/0x11
 [<ffffffff8104ba16>] ? __do_softirq+0x5e/0x1b0
 [<ffffffff8100cfcc>] ? call_softirq+0x1c/0x28
 [<ffffffff8100e721>] ? do_softirq+0x51/0xae
 [<ffffffff8104b6d2>] ? irq_exit+0x52/0xa3
 [<ffffffff81020f11>] ? smp_apic_timer_interrupt+0x94/0xb8
 [<ffffffff8100c9d3>] ? apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff81014096>] ? mwait_idle+0x9b/0xcc
 [<ffffffff81014038>] ? mwait_idle+0x3d/0xcc
 [<ffffffff8100ae08>] ? enter_idle+0x33/0x49
 [<ffffffff8100aece>] ? cpu_idle+0xb0/0xf3
 [<ffffffff8136f30c>] ? start_secondary+0x19c/0x1b7

  parent reply	other threads:[~2009-08-21 20:57 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200903041828.49972.m.bueker@berlin.de>
     [not found] ` <20090322211159.GA23042@electric-eye.fr.zoreil.com>
     [not found]   ` <49CA1822.6050902@grupopie.com>
     [not found]     ` <200904041950.04324.mb@bu3sch.de>
     [not found]       ` <4A06D8D2.4010505@googlemail.com>
2009-05-11  0:29         ` 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too David Dillow
2009-05-11 20:48           ` Michael Buesch
2009-05-11 21:10             ` Michael Buesch
2009-05-11 21:29               ` David Dillow
2009-05-11 21:59                 ` Michael Buesch
2009-05-12 20:29                 ` Michael Riepe
2009-05-14  2:38                   ` David Dillow
2009-05-14 18:37                     ` Michael Riepe
2009-05-14 19:14                       ` David Dillow
2009-05-14 19:42                         ` Michael Riepe
2009-05-23  1:29                           ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts David Dillow
2009-05-23  9:24                             ` Michael Buesch
2009-05-23 14:35                               ` Michael Riepe
2009-05-23 14:44                                 ` Michael Buesch
2009-05-23 15:01                                   ` Michael Riepe
2009-05-23 16:40                                     ` Michael Buesch
2009-05-23 14:51                                 ` David Dillow
2009-05-23 16:12                                   ` Michael Riepe
2009-05-23 16:45                                     ` Michael Buesch
2009-05-23 16:46                                     ` David Dillow
2009-05-23 16:50                                       ` Michael Buesch
2009-05-23 16:53                                       ` Michael Riepe
2009-05-23 17:03                                         ` David Dillow
2009-05-24 21:15                             ` Francois Romieu
2009-05-24 22:55                               ` David Dillow
2009-05-26  5:55                             ` David Miller
2009-05-26 18:22                               ` Michael Buesch
2009-05-26 21:52                                 ` David Miller
2009-05-26 22:14                                   ` David Miller
2009-05-26 22:40                                     ` Michael Riepe
2009-05-26 22:43                                       ` David Miller
2009-05-26 23:10                                         ` David Miller
2009-05-27 16:19                                     ` Michael Buesch
2009-06-16 19:32                                     ` Rui Santos
2009-08-21 20:57                             ` Eric W. Biederman [this message]
2009-08-21 21:22                               ` Michael Riepe
2009-08-21 22:59                               ` David Dillow
2009-08-21 23:34                                 ` David Dillow
2009-08-22  0:24                                   ` Eric W. Biederman
2009-08-22 11:48                                   ` Eric W. Biederman
2009-08-22 12:07                                     ` Eric W. Biederman
2009-08-22 20:43                                       ` David Dillow
2009-08-23 17:17                                         ` Jarek Poplawski
2009-08-23 17:43                                           ` Michal Soltys
2009-08-23 17:54                                             ` Jarek Poplawski
2009-08-24  2:37                                         ` Eric W. Biederman
2009-08-25  0:51                                         ` Eric W. Biederman
2009-08-25  2:59                                           ` David Dillow
2009-08-25 20:22                                             ` Eric W. Biederman
2009-08-25 20:40                                               ` David Dillow
2009-08-25 21:24                                                 ` Eric W. Biederman
2009-08-25 21:46                                                   ` David Dillow
2009-08-25 22:19                                                   ` Francois Romieu
2009-08-26  3:47                                                     ` Eric W. Biederman
2009-08-26  7:58                                                     ` [PATCH] r8169: Reduce looping in the interrupt handler Eric W. Biederman
2009-08-26 13:56                                                       ` David Dillow
2009-08-26 13:59                                                         ` David Dillow
2009-08-26 20:02                                                           ` Eric W. Biederman
2009-08-26 21:30                                                             ` Francois Romieu
2009-08-26 21:40                                                               ` Eric W. Biederman
2009-08-27  5:24                                                                 ` Francois Romieu
2009-08-27  5:38                                                                   ` Eric W. Biederman
2009-08-27 23:20                                                                     ` Francois Romieu
2009-08-28  1:17                                                                       ` Eric W. Biederman
2009-08-28  1:29                                                                         ` David Dillow
2009-08-30 20:37                                                                           ` Francois Romieu
2009-08-30 20:53                                                                             ` Eric W. Biederman
2009-09-01  3:33                                                                               ` David Dillow
2009-09-01  9:20                                                                                 ` Francois Romieu
2009-08-25 21:37                                             ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts Eric W. Biederman
2009-08-25 21:54                                               ` David Dillow
2009-08-25 23:11                                                 ` Francois Romieu
2009-05-12 11:10             ` 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too Krzysztof Halasa
2009-05-12 21:45               ` Michael Riepe
2009-05-13  6:11                 ` Francois Romieu
2009-05-13  6:27                   ` Michael Riepe
2009-05-13 19:34                 ` Krzysztof Halasa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1skfkrik2.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission$(echo .)com \
    --cc=dave@thedillows$(echo .)org \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=m.bueker@berlin$(echo .)de \
    --cc=mb@bu3sch$(echo .)de \
    --cc=michael.riepe@googlemail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=romieu@fr$(echo .)zoreil.com \
    --cc=rsantos@grupopie$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox