* oops/warning report for the week of November 26, 2008
@ 2008-11-26 23:11 Arjan van de Ven
2008-11-27 0:05 ` Jesse Barnes
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Arjan van de Ven @ 2008-11-26 23:11 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o,
Alan Cox, jesse Barnes
In collecting this report, oopses and warnings with versions prior to 2.6.27 are ignored.
This week, a total of 5450 oopses and warnings have been reported of version 2.6.27+,
compared to 2198 reports in the previous week.
This report is a bit different than the previous weeks; all 2.6.26 and earlier issues are no
longer used, which means the top 12 has shuffled quite a bit, with some new star appearances.
Also I've reworked the "are these two backtraces the same" algorithm; the website should now
be presenting a more compact/concise view due to having the backtraces consolidated in a much
more logical (for the human) way.
Per file statistics
936 external/virtualbox/module
602 drivers/pci/slot.c
455 drivers/net/wireless/iwlwifi/iwl-tx.c
364 kernel/power/main.c
274 drivers/net/r8169.c
231 drivers/net/wireless/iwlwifi/iwl-3945-rs.c
231 fs/jbd/journal.c
227 arch/x86/include/asm/mtrr.h
147 drivers/ata/libata-sff.c
137 drivers/net/sis900.c
71 net/ipv4/tcp.c
62 drivers/gpu/drm/radeon/radeon_cp.c
Rank 1: VBoxDrvLinuxIOCtl (warning)
Reported 934 times (1635 total reports)
[external] bug in the VirtualBox drivers
This warning was last seen in version 2.6.28-rc3, and first seen in 2.6.25.11.
More info: http://www.kerneloops.org/searchweek.php?search=VBoxDrvLinuxIOCtl
Rank 2: pci_create_slot (warning)
Reported 603 times (639 total reports)
BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs
This warning was last seen in version 2.6.27.5, and first seen in 2.6.27-rc7-git1.
More info: http://www.kerneloops.org/searchweek.php?search=pci_create_slot
Rank 3: iwl_tx_cmd_complete (warning)
Reported 455 times (693 total reports)
Bug in the IWL wireless driver; partial fix available
This warning was last seen in version 2.6.28-rc4, and first seen in 2.6.27-rc9.
More info: http://www.kerneloops.org/searchweek.php?search=iwl_tx_cmd_complete
Rank 4: suspend_test_finish (warning)
Reported 362 times (1202 total reports)
Fedora is shipping with the suspend test on.. and it's failing everywhere.
The patch to report what fails is in 2.6.28-rc6 and later
This warning was last seen in version 2.6.28-rc1, and first seen in 2.6.27-rc0-git14.
More info: http://www.kerneloops.org/searchweek.php?search=suspend_test_finish
Rank 5: dev_watchdog(r8169) (oops)
Reported 274 times (1414 total reports)
Network driver not handling timeouts itself.
This oops was last seen in version 2.6.28-rc4, and first seen in 2.6.26.6.
More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(r8169)
Rank 6: rs_get_rate (oops)
Reported 232 times (1152 total reports)
Bug in the Intel IWL wireless drivers
This oops was last seen in version 2.6.27.5, and first seen in 2.6.25-rc2-git5.
More info: http://www.kerneloops.org/searchweek.php?search=rs_get_rate
Rank 7: journal_update_superblock (warning)
Reported 231 times (6506 total reports)
Likely caused by the user removing a USB stick while mounted
This warning was last seen in version 2.6.27.7, and first seen in 2.6.24-rc6-git1.
More info: http://www.kerneloops.org/searchweek.php?search=journal_update_superblock
Rank 8: mtrr_trim_uncached_memory (warning)
Reported 227 times (619 total reports)
There is a high number of machines where our MTRR checks trigger. I suspect we are too
picky in accepting the MTRR configuration.
This warning was last seen in version 2.6.27.5, and first seen in 2.6.24.
More info: http://www.kerneloops.org/searchweek.php?search=mtrr_trim_uncached_memory
Rank 9: __atapi_pio_bytes (warning)
Reported 146 times (224 total reports)
Alan said this was due to some other layer giving the libata drivers a weird
scatter gather list. It just happens a lot, and somehow it mostly happens in
virtualized environments
This warning was last seen in version 2.6.27.5, and first seen in 2.6.27.4.
More info: http://www.kerneloops.org/searchweek.php?search=__atapi_pio_bytes
Rank 10: dev_watchdog(sis900) (oops)
Reported 137 times (1538 total reports)
This oops was last seen in version 2.6.27.6, and first seen in 2.6.26-rc4-git2.
More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(sis900)
Rank 11: tcp_recvmsg (warning)
Reported 71 times (167 total reports)
This warning was last seen in version 2.6.27.5, and first seen in 2.6.25.
More info: http://www.kerneloops.org/searchweek.php?search=tcp_recvmsg
Rank 12: dev_watchdog(atl1) (oops)
Reported 56 times (109 total reports)
This oops was last seen in version 2.6.27.5, and first seen in 2.6.26.6.
More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(atl1)
Rank 13: nv_set_page_attrib_cached (warning)
Reported 56 times (65 total reports)
[external] bug in the binary nvidia driver
warning only shows up in tainted kernels
This warning was last seen in version 2.6.27.5, and first seen in 2.6.27.5.
More info: http://www.kerneloops.org/searchweek.php?search=nv_set_page_attrib_cached
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: oops/warning report for the week of November 26, 2008 2008-11-26 23:11 oops/warning report for the week of November 26, 2008 Arjan van de Ven @ 2008-11-27 0:05 ` Jesse Barnes 2008-11-27 11:48 ` Ingo Molnar 2008-11-27 19:42 ` Alex Chiang 2008-11-27 11:52 ` Ingo Molnar 2008-11-28 17:18 ` Jay Cliburn 2 siblings, 2 replies; 25+ messages in thread From: Jesse Barnes @ 2008-11-27 0:05 UTC (permalink / raw) To: Arjan van de Ven Cc: Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote: > Rank 2: pci_create_slot (warning) > Reported 603 times (639 total reports) > BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs > This warning was last seen in version 2.6.27.5, and first seen in > 2.6.27-rc7-git1. More info: > http://www.kerneloops.org/searchweek.php?search=pci_create_slot IIRC we fixed this one post-2.6.27. I didn't send the patches back to -stable because they were a bit big, but if someone were sufficiently motiviated I'm sure the backport wouldn't be that hard... Jesse ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 0:05 ` Jesse Barnes @ 2008-11-27 11:48 ` Ingo Molnar 2008-11-27 19:42 ` Alex Chiang 1 sibling, 0 replies; 25+ messages in thread From: Ingo Molnar @ 2008-11-27 11:48 UTC (permalink / raw) To: Jesse Barnes Cc: Arjan van de Ven, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox * Jesse Barnes <jbarnes@virtuousgeek•org> wrote: > On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote: > > Rank 2: pci_create_slot (warning) > > Reported 603 times (639 total reports) > > BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs > > This warning was last seen in version 2.6.27.5, and first seen in > > 2.6.27-rc7-git1. More info: > > http://www.kerneloops.org/searchweek.php?search=pci_create_slot > > IIRC we fixed this one post-2.6.27. I didn't send the patches back > to -stable because they were a bit big, but if someone were > sufficiently motiviated I'm sure the backport wouldn't be that > hard... having the commit IDs mentioned here would be nice, should anyone feel motivated. Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 0:05 ` Jesse Barnes 2008-11-27 11:48 ` Ingo Molnar @ 2008-11-27 19:42 ` Alex Chiang 2008-11-27 19:49 ` Arjan van de Ven 1 sibling, 1 reply; 25+ messages in thread From: Alex Chiang @ 2008-11-27 19:42 UTC (permalink / raw) To: Jesse Barnes Cc: Arjan van de Ven, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox * Jesse Barnes <jbarnes@virtuousgeek•org>: > On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote: > > Rank 2: pci_create_slot (warning) > > Reported 603 times (639 total reports) > > BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs > > This warning was last seen in version 2.6.27.5, and first seen in > > 2.6.27-rc7-git1. More info: > > http://www.kerneloops.org/searchweek.php?search=pci_create_slot > > IIRC we fixed this one post-2.6.27. I didn't send the patches back to -stable > because they were a bit big, but if someone were sufficiently motiviated I'm > sure the backport wouldn't be that hard... I can do this backport. A few questions though... We're seeing a proliferation of this one presumably because Fedora10 uses 2.6.27.5 as a starting point? If I just backport the fixes against Greg's latest tree, do I have to do anything special to make sure they get into the Fedora kernel? Also, does kerneloops capture any of the machine information, like DMI output, etc. or does it just get the oops? It would be nice to see which machines out there have the broken BIOS that causes this oops. Thanks. /ac ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 19:42 ` Alex Chiang @ 2008-11-27 19:49 ` Arjan van de Ven 0 siblings, 0 replies; 25+ messages in thread From: Arjan van de Ven @ 2008-11-27 19:49 UTC (permalink / raw) To: Alex Chiang Cc: Jesse Barnes, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox On Thu, 27 Nov 2008 12:42:10 -0700 Alex Chiang <achiang@hp•com> wrote: > * Jesse Barnes <jbarnes@virtuousgeek•org>: > > On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote: > > > Rank 2: pci_create_slot (warning) > > > Reported 603 times (639 total reports) > > > BIOS provided duplicated slot names, the PCI layer > > > blindly passes to sysfs This warning was last seen in version > > > 2.6.27.5, and first seen in 2.6.27-rc7-git1. More info: > > > http://www.kerneloops.org/searchweek.php?search=pci_create_slot > > > > IIRC we fixed this one post-2.6.27. I didn't send the patches back > > to -stable because they were a bit big, but if someone were > > sufficiently motiviated I'm sure the backport wouldn't be that > > hard... > > I can do this backport. A few questions though... > > We're seeing a proliferation of this one presumably because > Fedora10 uses 2.6.27.5 as a starting point? If I just backport > the fixes against Greg's latest tree, do I have to do anything > special to make sure they get into the Fedora kernel? Fedora tends to follow -stable quite closely so that ought to be enough > > Also, does kerneloops capture any of the machine information, > like DMI output, etc. or does it just get the oops? It would be > nice to see which machines out there have the broken BIOS that > causes this oops. right now we do this for oopses, but not for warnings ;( I'll make a patch to add this; it's generally useful. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-26 23:11 oops/warning report for the week of November 26, 2008 Arjan van de Ven 2008-11-27 0:05 ` Jesse Barnes @ 2008-11-27 11:52 ` Ingo Molnar 2008-11-27 17:02 ` Jesse Barnes 2008-11-27 18:01 ` Arjan van de Ven 2008-11-28 17:18 ` Jay Cliburn 2 siblings, 2 replies; 25+ messages in thread From: Ingo Molnar @ 2008-11-27 11:52 UTC (permalink / raw) To: Arjan van de Ven, Yinghai Lu Cc: Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes * Arjan van de Ven <arjan@linux•intel.com> wrote: > Rank 8: mtrr_trim_uncached_memory (warning) > Reported 227 times (619 total reports) > There is a high number of machines where our MTRR checks > trigger. I suspect we are too picky in accepting the MTRR > configuration. the warning here means: "the BIOS messed up but we fixed it up for you just fine". Should we print a DMI descriptor so that it can be tracked back to the bad BIOSen in question? Or should we (partially) silence the warning itself? Those BIOS bugs need fixing really: older kernels will boot up with bad MTRR settings - resulting in a super-slow system or other weirdnesses. We can tone down the message so that it doesnt show up in kerneloops.org. It's up to you. Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 11:52 ` Ingo Molnar @ 2008-11-27 17:02 ` Jesse Barnes 2008-11-27 18:01 ` Arjan van de Ven 1 sibling, 0 replies; 25+ messages in thread From: Jesse Barnes @ 2008-11-27 17:02 UTC (permalink / raw) To: Ingo Molnar Cc: Arjan van de Ven, Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox On Thursday, November 27, 2008 3:52 am Ingo Molnar wrote: > * Arjan van de Ven <arjan@linux•intel.com> wrote: > > Rank 8: mtrr_trim_uncached_memory (warning) > > Reported 227 times (619 total reports) > > There is a high number of machines where our MTRR checks > > trigger. I suspect we are too picky in accepting the MTRR > > configuration. > > the warning here means: "the BIOS messed up but we fixed it up for > you just fine". > > Should we print a DMI descriptor so that it can be tracked back to the > bad BIOSen in question? Or should we (partially) silence the warning > itself? Those BIOS bugs need fixing really: older kernels will boot up > with bad MTRR settings - resulting in a super-slow system or other > weirdnesses. We can tone down the message so that it doesnt show up in > kerneloops.org. It's up to you. I actually think we're doing something wrong here, since so many platforms have this behavior. It's likely that there's an undocumented, additional check needed to determine whether a slot is hot pluggable. Matthew Garrett recently posted a patch to check for ACPI _RMV methods, which should be an improvement. I'll be putting that into linux-next soon for testing. -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 11:52 ` Ingo Molnar 2008-11-27 17:02 ` Jesse Barnes @ 2008-11-27 18:01 ` Arjan van de Ven 2008-11-27 20:18 ` Ingo Molnar 1 sibling, 1 reply; 25+ messages in thread From: Arjan van de Ven @ 2008-11-27 18:01 UTC (permalink / raw) To: Ingo Molnar Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes Ingo Molnar wrote: > * Arjan van de Ven <arjan@linux•intel.com> wrote: > >> Rank 8: mtrr_trim_uncached_memory (warning) >> Reported 227 times (619 total reports) >> There is a high number of machines where our MTRR checks >> trigger. I suspect we are too picky in accepting the MTRR >> configuration. > > the warning here means: "the BIOS messed up but we fixed it up for > you just fine". I don't believe that right now. we see so many of these, including many "there's no MTRRs at all", that I am seriously suspecting that our code is just incorrect somehow and triggering too much. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 18:01 ` Arjan van de Ven @ 2008-11-27 20:18 ` Ingo Molnar 2008-11-27 20:28 ` Arjan van de Ven 0 siblings, 1 reply; 25+ messages in thread From: Ingo Molnar @ 2008-11-27 20:18 UTC (permalink / raw) To: Arjan van de Ven Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes * Arjan van de Ven <arjan@linux•intel.com> wrote: > Ingo Molnar wrote: >> * Arjan van de Ven <arjan@linux•intel.com> wrote: >> >>> Rank 8: mtrr_trim_uncached_memory (warning) >>> Reported 227 times (619 total reports) >>> There is a high number of machines where our MTRR checks trigger. I >>> suspect we are too picky in accepting the MTRR configuration. >> >> the warning here means: "the BIOS messed up but we fixed it up for you >> just fine". > > I don't believe that right now. we see so many of these, including > many "there's no MTRRs at all", that I am seriously suspecting that > our code is just incorrect somehow and triggering too much. well we looked at existing reports and Linux was right to fix them up. Show us one that is incorrect, then we can fix it up. the "no MTRR's" are vmware/(also qemu?) guests not implementing a full CPU emulation. Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:18 ` Ingo Molnar @ 2008-11-27 20:28 ` Arjan van de Ven 2008-11-27 20:47 ` Ingo Molnar ` (3 more replies) 0 siblings, 4 replies; 25+ messages in thread From: Arjan van de Ven @ 2008-11-27 20:28 UTC (permalink / raw) To: Ingo Molnar Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes On Thu, 27 Nov 2008 21:18:36 +0100 Ingo Molnar <mingo@elte•hu> wrote: > > * Arjan van de Ven <arjan@linux•intel.com> wrote: > > > Ingo Molnar wrote: > >> * Arjan van de Ven <arjan@linux•intel.com> wrote: > >> > >>> Rank 8: mtrr_trim_uncached_memory (warning) > >>> Reported 227 times (619 total reports) > >>> There is a high number of machines where our MTRR checks > >>> trigger. I suspect we are too picky in accepting the MTRR > >>> configuration. > >> > >> the warning here means: "the BIOS messed up but we fixed it up for > >> you just fine". > > > > I don't believe that right now. we see so many of these, including > > many "there's no MTRRs at all", that I am seriously suspecting that > > our code is just incorrect somehow and triggering too much. > > well we looked at existing reports and Linux was right to fix them > up. Show us one that is incorrect, then we can fix it up. > > the "no MTRR's" are vmware/(also qemu?) guests not implementing a > full CPU emulation. ... and it's still our fault in part, since we don't even check to see if a cpu claims to support MTRR before complaining about it... easy to fix though: >From 7e987ae541c41ce908b414fee9d8e2fd2099a083 Mon Sep 17 00:00:00 2001 From: Arjan van de Ven <arjan@linux•intel.com> Date: Thu, 27 Nov 2008 12:25:47 -0800 Subject: [PATCH] x86: make sure the CPU advertizes MTRR support before complaining about the lack thereoff... We complain loudly if a CPU does not have MTRR support... but we don't check if the CPU exposes MTRR support in the CPUID flags first. While this might not fix all of the broken virtualization systems out there, it will at least fix those that properly don't advertize things they don't support. Signed-off-by: Arjan van de Ven <arjan@linux•intel.com> --- arch/x86/kernel/cpu/mtrr/main.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c index 1159e26..0044e61 100644 --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -1567,6 +1567,8 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn) * Make sure we only trim uncachable memory on machines that * support the Intel MTRR architecture: */ + if (!cpu_has_mtrr) + return 0; if (!is_cpu(INTEL) || disable_mtrr_trim) return 0; rdmsr(MTRRdefType_MSR, def, dummy); -- 1.6.0.4 -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:28 ` Arjan van de Ven @ 2008-11-27 20:47 ` Ingo Molnar 2008-11-27 20:53 ` Arjan van de Ven 2008-11-27 21:18 ` H. Peter Anvin ` (2 subsequent siblings) 3 siblings, 1 reply; 25+ messages in thread From: Ingo Molnar @ 2008-11-27 20:47 UTC (permalink / raw) To: Arjan van de Ven Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes, H. Peter Anvin * Arjan van de Ven <arjan@linux•intel.com> wrote: > On Thu, 27 Nov 2008 21:18:36 +0100 > Ingo Molnar <mingo@elte•hu> wrote: > > > > > * Arjan van de Ven <arjan@linux•intel.com> wrote: > > > > > Ingo Molnar wrote: > > >> * Arjan van de Ven <arjan@linux•intel.com> wrote: > > >> > > >>> Rank 8: mtrr_trim_uncached_memory (warning) > > >>> Reported 227 times (619 total reports) > > >>> There is a high number of machines where our MTRR checks > > >>> trigger. I suspect we are too picky in accepting the MTRR > > >>> configuration. > > >> > > >> the warning here means: "the BIOS messed up but we fixed it up for > > >> you just fine". > > > > > > I don't believe that right now. we see so many of these, including > > > many "there's no MTRRs at all", that I am seriously suspecting that > > > our code is just incorrect somehow and triggering too much. > > > > well we looked at existing reports and Linux was right to fix them > > up. Show us one that is incorrect, then we can fix it up. > > > > the "no MTRR's" are vmware/(also qemu?) guests not implementing a > > full CPU emulation. > > ... and it's still our fault in part, since we don't even check to > see if a cpu claims to support MTRR before complaining about it... > > easy to fix though: IIRC the problem is that vmware _does_ claim that it supports MTRRs. Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:47 ` Ingo Molnar @ 2008-11-27 20:53 ` Arjan van de Ven 2008-11-28 8:34 ` Ingo Molnar 0 siblings, 1 reply; 25+ messages in thread From: Arjan van de Ven @ 2008-11-27 20:53 UTC (permalink / raw) To: Ingo Molnar Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes, H. Peter Anvin On Thu, 27 Nov 2008 21:47:14 +0100 Ingo Molnar <mingo@elte•hu> wrote: > IIRC the problem is that vmware _does_ claim that it supports MTRRs. it might. but even if they would fix that, we would still WARN ( at least we should do our side correctly... -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:53 ` Arjan van de Ven @ 2008-11-28 8:34 ` Ingo Molnar 0 siblings, 0 replies; 25+ messages in thread From: Ingo Molnar @ 2008-11-28 8:34 UTC (permalink / raw) To: Arjan van de Ven Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes, H. Peter Anvin * Arjan van de Ven <arjan@linux•intel.com> wrote: > On Thu, 27 Nov 2008 21:47:14 +0100 > Ingo Molnar <mingo@elte•hu> wrote: > > > IIRC the problem is that vmware _does_ claim that it supports MTRRs. > > it might. > but even if they would fix that, we would still WARN ( > at least we should do our side correctly... As pointed out in other parts of the thread, that is not the case. Anyway, as i said it in the onset, if you think we should remove the warning altogether, or tweak it, we can do that - it is important to have relevant warnings show up in kerneloops.org. To sum it up: the only remaining MTRR warnings we know of are either: 1) apparently genuine BIOS bugs that do cause problems if the (new) kernel does not fix them up. The MTRR warning is relevant and correct in those cases. or: 2) sucky virtualization solutions that cheat the guest OS by faking "MTRR support" in the CPUID info, but not actually showing any MTRRs. These virtualization solutions do not even properly identify themselves to the kernel. The MTRR warning is unnecessary in this case. So what we did in the x86 tree was remove the warning in the second case - is to properly identify vmware (and in general, virtualization) guests. It was not a simple oneliner: earth4:~/tip> gll linus..x86/detect-hyper 4e42ebd: x86: hypervisor - fix sparse warnings c450d78: x86: vmware - fix sparse warnings fd8cd7e: x86: vmware: look for DMI string in the product serial key 6bdbfe9: x86: VMware: Fix vmware_get_tsc code 395628e: x86: Skip verification by the watchdog for TSC clocksource. eca0cd0: x86: Add a synthetic TSC_RELIABLE feature bit. 88b094f: x86: Hypervisor detection and get tsc_freq from hypervisor 49ab56a: x86: add X86_FEATURE_HYPERVISOR feature bit b2bcc7b: x86: add a synthetic TSC_RELIABLE feature bit and it will benefit vmware guests in many more areas than just a sharper MTRR warning message. That code is queued up for v2.6.29. Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:28 ` Arjan van de Ven 2008-11-27 20:47 ` Ingo Molnar @ 2008-11-27 21:18 ` H. Peter Anvin 2008-11-27 21:18 ` Yinghai Lu 2008-11-27 21:42 ` H. Peter Anvin 3 siblings, 0 replies; 25+ messages in thread From: H. Peter Anvin @ 2008-11-27 21:18 UTC (permalink / raw) To: Arjan van de Ven Cc: Ingo Molnar, Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes Arjan van de Ven wrote: > + if (!cpu_has_mtrr) > + return 0; > if (!is_cpu(INTEL) || disable_mtrr_trim) > return 0; > rdmsr(MTRRdefType_MSR, def, dummy); cpu_has_mtrr there should presumably replace is_cpu(INTEL). I'm not sure if this can be replaced by use_intel(); in particular use_intel() relies on mtrr_if having been initialized. Looking... -hpa (out of town for Thanksgiving) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:28 ` Arjan van de Ven 2008-11-27 20:47 ` Ingo Molnar 2008-11-27 21:18 ` H. Peter Anvin @ 2008-11-27 21:18 ` Yinghai Lu 2008-11-27 21:42 ` H. Peter Anvin 3 siblings, 0 replies; 25+ messages in thread From: Yinghai Lu @ 2008-11-27 21:18 UTC (permalink / raw) To: Arjan van de Ven Cc: Ingo Molnar, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes Arjan van de Ven wrote: > On Thu, 27 Nov 2008 21:18:36 +0100 > Ingo Molnar <mingo@elte•hu> wrote: > >> * Arjan van de Ven <arjan@linux•intel.com> wrote: >> >>> Ingo Molnar wrote: >>>> * Arjan van de Ven <arjan@linux•intel.com> wrote: >>>> >>>>> Rank 8: mtrr_trim_uncached_memory (warning) >>>>> Reported 227 times (619 total reports) >>>>> There is a high number of machines where our MTRR checks >>>>> trigger. I suspect we are too picky in accepting the MTRR >>>>> configuration. >>>> the warning here means: "the BIOS messed up but we fixed it up for >>>> you just fine". >>> I don't believe that right now. we see so many of these, including >>> many "there's no MTRRs at all", that I am seriously suspecting that >>> our code is just incorrect somehow and triggering too much. >> well we looked at existing reports and Linux was right to fix them >> up. Show us one that is incorrect, then we can fix it up. >> >> the "no MTRR's" are vmware/(also qemu?) guests not implementing a >> full CPU emulation. > > ... and it's still our fault in part, since we don't even check to see > if a cpu claims to support MTRR before complaining about it... > > easy to fix though: > > From 7e987ae541c41ce908b414fee9d8e2fd2099a083 Mon Sep 17 00:00:00 2001 > From: Arjan van de Ven <arjan@linux•intel.com> > Date: Thu, 27 Nov 2008 12:25:47 -0800 > Subject: [PATCH] x86: make sure the CPU advertizes MTRR support before complaining about the lack thereoff... > > We complain loudly if a CPU does not have MTRR support... but we don't check if the CPU > exposes MTRR support in the CPUID flags first. While this might not fix all of the > broken virtualization systems out there, it will at least fix those that properly don't > advertize things they don't support. > > Signed-off-by: Arjan van de Ven <arjan@linux•intel.com> > --- > arch/x86/kernel/cpu/mtrr/main.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c > index 1159e26..0044e61 100644 > --- a/arch/x86/kernel/cpu/mtrr/main.c > +++ b/arch/x86/kernel/cpu/mtrr/main.c > @@ -1567,6 +1567,8 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn) > * Make sure we only trim uncachable memory on machines that > * support the Intel MTRR architecture: > */ > + if (!cpu_has_mtrr) > + return 0; that is not needed, we already check that in mtrr_bp_init before this function is called, and it will assign mtrr_if and #define is_cpu(vnd) (mtrr_if && mtrr_if->vendor == X86_VENDOR_##vnd) will make it sure mtrr is there. ps: here INTEL mean any cpu has same interface like intel cpu's YH > if (!is_cpu(INTEL) || disable_mtrr_trim) > return 0; > rdmsr(MTRRdefType_MSR, def, dummy); ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-27 20:28 ` Arjan van de Ven ` (2 preceding siblings ...) 2008-11-27 21:18 ` Yinghai Lu @ 2008-11-27 21:42 ` H. Peter Anvin 3 siblings, 0 replies; 25+ messages in thread From: H. Peter Anvin @ 2008-11-27 21:42 UTC (permalink / raw) To: Arjan van de Ven Cc: Ingo Molnar, Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds, NetDev, x86, Andrew Morton, Theodore Ts'o, Alan Cox, jesse Barnes Arjan van de Ven wrote: > > diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c > index 1159e26..0044e61 100644 > --- a/arch/x86/kernel/cpu/mtrr/main.c > +++ b/arch/x86/kernel/cpu/mtrr/main.c > @@ -1567,6 +1567,8 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn) > * Make sure we only trim uncachable memory on machines that > * support the Intel MTRR architecture: > */ > + if (!cpu_has_mtrr) > + return 0; > if (!is_cpu(INTEL) || disable_mtrr_trim) > return 0; > rdmsr(MTRRdefType_MSR, def, dummy); Okay... is_cpu() here is defined as: #define is_cpu(vnd) (mtrr_if && mtrr_if->vendor == X86_VENDOR_##vnd) ... so an MTRR interface has been identified. Therefore testing cpu_has_mtrr is redundant. As far as use_intel() versus is_cpu(INTEL), it looks to me as though the two are identical in the current code -- mtrr_if->vendor is never set in the generic code, and so defaults to 0 - meaning X86_VENDOR_INTEL. All in all, it looks like the vendor ID stuff is a bad case of "works by accident" in the MTRR code, however, *given the current code* I conclude that is_cpu(INTEL) == use_intel() and that neither can be true without MTRRs enabled. -hpa ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-26 23:11 oops/warning report for the week of November 26, 2008 Arjan van de Ven 2008-11-27 0:05 ` Jesse Barnes 2008-11-27 11:52 ` Ingo Molnar @ 2008-11-28 17:18 ` Jay Cliburn 2008-11-28 17:32 ` Arjan van de Ven 2 siblings, 1 reply; 25+ messages in thread From: Jay Cliburn @ 2008-11-28 17:18 UTC (permalink / raw) To: Arjan van de Ven; +Cc: NetDev [trimmed the cc list down to netdev only] On Wed, 26 Nov 2008 15:11:14 -0800 Arjan van de Ven <arjan@linux•intel.com> wrote: > Rank 12: dev_watchdog(atl1) (oops) > Reported 56 times (109 total reports) > This oops was last seen in version 2.6.27.5, and first seen > in 2.6.26.6. More info: > http://www.kerneloops.org/searchweek.php?search=dev_watchdog(atl1) I can't reproduce this, so I've launched a request at fedoraforum.org hoping I can snag a Fedora user who's encountering the bug and willing to test. The tx timeout reports at kerneloops.org appear to be happening on a startling variety of network drivers (startling to me, anyway): r8169, atl1, atl2, sis900, cdc_ether, orinoco_cs, tg3, ne2k-pci, via-rhine, 8139too, ath_pci, e1000, gl620a, sky2, hso, fealnx, forcedeth; probably others, but I quit looking. Is it correct to assume all these drivers are showing symptoms of the poor timeout handling you mentioned in your r8169 comment, or is the occasional tx timeout to be expected, and the leaders in this category (r8169, sis900, atl1) are the only ones suffering from deficient timeout handling? Jay ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-28 17:18 ` Jay Cliburn @ 2008-11-28 17:32 ` Arjan van de Ven 2008-11-28 18:36 ` Jay Cliburn 2008-11-28 19:50 ` Francois Romieu 0 siblings, 2 replies; 25+ messages in thread From: Arjan van de Ven @ 2008-11-28 17:32 UTC (permalink / raw) To: Jay Cliburn; +Cc: NetDev On Fri, 28 Nov 2008 11:18:27 -0600 Jay Cliburn <jcliburn@gmail•com> wrote: > [trimmed the cc list down to netdev only] > > > Rank 12: dev_watchdog(atl1) (oops) > > Reported 56 times (109 total reports) > > This oops was last seen in version 2.6.27.5, and first seen > > in 2.6.26.6. More info: > > http://www.kerneloops.org/searchweek.php?search=dev_watchdog(atl1) > > The tx timeout reports at kerneloops.org appear to be happening on a > startling variety of network drivers (startling to me, anyway): r8169, > atl1, atl2, sis900, cdc_ether, orinoco_cs, tg3, ne2k-pci, via-rhine, > 8139too, ath_pci, e1000, gl620a, sky2, hso, fealnx, forcedeth; > probably others, but I quit looking. to be specific in counts, the data I have so far is: count | guilty -------+---------------------------- 1599 | dev_watchdog(sis900) 1501 | dev_watchdog(r8169) 280 | dev_watchdog(via-rhine) 264 | dev_watchdog(cdc_ether) 213 | dev_watchdog(usbnet) 192 | dev_watchdog(8139too) 164 | dev_watchdog(8390) 158 | dev_watchdog(via_rhine) 129 | dev_watchdog(ne2k-pci) 122 | dev_watchdog(atl1) 102 | dev_watchdog(atl2) 101 | dev_watchdog(orinoco) and then a long tail of sub-100, omitted to keep this mail not too long; if anyone wants data on his/her driver not in the list, let me know. (please don't read too much in the word "guilty"; it's just the name of the column in the kerneloops.org database used for identifing which function was the prime suspect of a backtrace) > > Is it correct to assume all these drivers are showing symptoms of the > poor timeout handling you mentioned in your r8169 comment, or is the > occasional tx timeout to be expected, and the leaders in this category > (r8169, sis900, atl1) are the only ones suffering from deficient > timeout handling? For me, sis900 and r8169 stand out; if you look at the data in the table above, both of these are an order of magnitude more frequent than the rest of the pack. ATL1 isn't doing all that bad in this regard, although your driver is still a little higher than other popular cards like tg3, e1000, e1000e etc. (those are all sub-50). -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-28 17:32 ` Arjan van de Ven @ 2008-11-28 18:36 ` Jay Cliburn 2008-11-28 18:50 ` Arjan van de Ven 2008-11-28 19:50 ` Francois Romieu 1 sibling, 1 reply; 25+ messages in thread From: Jay Cliburn @ 2008-11-28 18:36 UTC (permalink / raw) To: Arjan van de Ven; +Cc: NetDev On Fri, 28 Nov 2008 09:32:17 -0800 Arjan van de Ven <arjan@linux•intel.com> wrote: > to be specific in counts, the data I have so far is: > > count | guilty > -------+---------------------------- > 1599 | dev_watchdog(sis900) > 1501 | dev_watchdog(r8169) > 280 | dev_watchdog(via-rhine) > 264 | dev_watchdog(cdc_ether) > 213 | dev_watchdog(usbnet) > 192 | dev_watchdog(8139too) > 164 | dev_watchdog(8390) > 158 | dev_watchdog(via_rhine) > 129 | dev_watchdog(ne2k-pci) > 122 | dev_watchdog(atl1) > 102 | dev_watchdog(atl2) > 101 | dev_watchdog(orinoco) > ATL1 isn't doing all that bad in this > regard, although your driver is still a little higher than other > popular cards like tg3, e1000, e1000e etc. ...And that's what troubles me: the L1 chip isn't what I'd characterize as "popular" -- it's LOM only, and it's found in only about 25 mainboards that I know of (from voluntary user reports) -- yet its prevalence in the tx timeout list seems to be quickly rising. Can you produce a list from your database for me that includes the kernel version for each of the 122 reported atl1 dev_watchdog warnings? I'd like to see if I can correlate an increase in the warnings with a particular change we made. Thanks. Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-28 18:36 ` Jay Cliburn @ 2008-11-28 18:50 ` Arjan van de Ven 2008-11-28 21:12 ` atl1 transmit timeout Was: " Jay Cliburn 0 siblings, 1 reply; 25+ messages in thread From: Arjan van de Ven @ 2008-11-28 18:50 UTC (permalink / raw) To: Jay Cliburn; +Cc: NetDev > > > ATL1 isn't doing all that bad in this > > regard, although your driver is still a little higher than other > > popular cards like tg3, e1000, e1000e etc. > > ...And that's what troubles me: the L1 chip isn't what I'd > characterize as "popular" -- it's LOM only, and it's found in only > about 25 mainboards that I know of (from voluntary user reports) -- > yet its prevalence in the tx timeout list seems to be quickly rising. > > Can you produce a list from your database for me that includes the > kernel version for each of the 122 reported atl1 dev_watchdog > warnings? I'd like to see if I can correlate an increase in the > warnings with a particular change we made. => select count(version), version from oopses where guilty='dev_watchdog(atl1)' group by version order by version desc; count | version -------+----------------- 93 | 2.6.27.5 6 | 2.6.27.4 1 | 2.6.27.3 1 | 2.6.27.2 1 | 2.6.27-rc9 1 | 2.6.27-rc7-git1 1 | 2.6.27-rc7 1 | 2.6.27-rc6 6 | 2.6.27-rc3 7 | 2.6.27 4 | 2.6.26.6 (11 rows) or in more detail: => select count(full_version), full_version from oopses where \ guilty='dev_watchdog(atl1)' group by full_version order by \ full_version desc; count | full_version -------+----------------------------------- 2 | 2.6.27.5-94.fc10.x86_64 26 | 2.6.27.5-41.fc9.x86_64 12 | 2.6.27.5-41.fc9.i686 15 | 2.6.27.5-37.fc9.x86_64 19 | 2.6.27.5-37.fc9.i686 11 | 2.6.27.5-117.fc10.x86_64 4 | 2.6.27.5-117.fc10.i686 1 | 2.6.27.5-109.fc10.x86_64 2 | 2.6.27.5-109.fc10.i686.PAE 1 | 2.6.27.5-109.fc10.i686 1 | 2.6.27.4-79.fc10.i686 1 | 2.6.27.4-68.fc10.x86_64 3 | 2.6.27.4-68.fc10.i686 1 | 2.6.27.4-26.fc9.x86_64 1 | 2.6.27.3-34.rc1.fc10.i686.PAE 1 | 2.6.27.2-23.rc1.fc10.x86_64 4 | 2.6.27-wl 1 | 2.6.27-rc7 1 | 2.6.27-rc6-wl-AUS32 6 | 2.6.27-rc3-wl-8KS-UVC 1 | 2.6.27-7-generic 1 | 2.6.27-0.398.rc9.fc10.x86_64 1 | 2.6.27-0.352.rc7.git1.fc10.x86_64 2 | 2.6.27 3 | 2.6.26.6-79.fc9.x86_64 1 | 2.6.26.6-79.fc9.i686 (26 rows) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 25+ messages in thread
* atl1 transmit timeout Was: Re: oops/warning report for the week of November 26, 2008 2008-11-28 18:50 ` Arjan van de Ven @ 2008-11-28 21:12 ` Jay Cliburn 2008-11-28 21:22 ` Arjan van de Ven 0 siblings, 1 reply; 25+ messages in thread From: Jay Cliburn @ 2008-11-28 21:12 UTC (permalink / raw) To: Arjan van de Ven; +Cc: NetDev On Fri, 28 Nov 2008 10:50:44 -0800 Arjan van de Ven <arjan@linux•intel.com> wrote: > => select count(version), version from oopses where > guilty='dev_watchdog(atl1)' group by version order by version desc; > count | version -------+----------------- > 93 | 2.6.27.5 > 6 | 2.6.27.4 > 1 | 2.6.27.3 > 1 | 2.6.27.2 > 1 | 2.6.27-rc9 > 1 | 2.6.27-rc7-git1 > 1 | 2.6.27-rc7 > 1 | 2.6.27-rc6 > 6 | 2.6.27-rc3 > 7 | 2.6.27 > 4 | 2.6.26.6 > (11 rows) Wow. 4 hits in 2.6.26, then 118 in 2.6.27. A history of changes between 2.6.26 and 2.6.27.5 shows a mere six changes to the driver. commit event ====== ===== 788a5f3f 2.6.27.5 8dc186c1 atl1: fix vlan tag regression 056c7145 2.6.27.4 322df44b 2.6.27.3 6bcd6d77 2.6.27.2 bc5b8bb6 2.6.27.1 3fa8749e 2.6.27 4330ed8e 2.6.27-rc9 94aca1da 2.6.27-rc8 72d31053 2.6.27-rc7 adee14b2 2.6.27-rc6 24342c34 2.6.27-rc5 82c26a9d atl1: disable TSO by default 6a55617e 2.6.27-rc4 30a2f3c6 2.6.27-rc3 c2ac3ef3 atl1: deal with hardware rx checksum bug 0967d61e 2.6.27-rc2 6e86841d 2.6.27-rc1 39d48157 atl1: Do not wake queue before queue has been started. b102df14 atl1: use netdev_alloc_skb d63ddcec misc drivers/net endianness noise bce7f793 2.6.26 The only one that jumps out at me is 39d48157, which contains, in part: commit 39d48157ac1a0ff3ec81212e5451bfd1bf5f50db Author: David S. Miller <davem@davemloft•net> Date: Mon Jul 21 08:28:37 2008 -0700 atl1: Do not wake queue before queue has been started. Based upon a bug report by Alexey Dobriyan, the patch is also tested by him and confirmed to fix the problem. Packet flow during link state events should not be done by waking and stopping the TX queue anyways, that is handled transparently by netif_carrier_{on,off}(). So, remove the netif_{wake,stop}_queue() calls in the link check code, and add the necessary netif_start_queue() call to atl1_up(). Signed-off-by: David S. Miller <davem@davemloft•net> [...] @@ -2627,6 +2625,7 @@ static s32 atl1_up(struct atl1_adapter *adapter) mod_timer(&adapter->watchdog_timer, jiffies); atlx_irq_enable(adapter); atl1_check_link(adapter); + netif_start_queue(netdev); return 0; Would it be reasonable to increase the above mod_timer() expiry to jiffies + (5 * HZ)? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: atl1 transmit timeout Was: Re: oops/warning report for the week of November 26, 2008 2008-11-28 21:12 ` atl1 transmit timeout Was: " Jay Cliburn @ 2008-11-28 21:22 ` Arjan van de Ven 0 siblings, 0 replies; 25+ messages in thread From: Arjan van de Ven @ 2008-11-28 21:22 UTC (permalink / raw) To: Jay Cliburn; +Cc: NetDev Jay Cliburn wrote: > On Fri, 28 Nov 2008 10:50:44 -0800 > Arjan van de Ven <arjan@linux•intel.com> wrote: > >> => select count(version), version from oopses where >> guilty='dev_watchdog(atl1)' group by version order by version desc; >> count | version -------+----------------- >> 93 | 2.6.27.5 >> 6 | 2.6.27.4 >> 1 | 2.6.27.3 >> 1 | 2.6.27.2 >> 1 | 2.6.27-rc9 >> 1 | 2.6.27-rc7-git1 >> 1 | 2.6.27-rc7 >> 1 | 2.6.27-rc6 >> 6 | 2.6.27-rc3 >> 7 | 2.6.27 >> 4 | 2.6.26.6 >> (11 rows) > > Wow. 4 hits in 2.6.26, then 118 in 2.6.27. > > A history of changes between 2.6.26 and 2.6.27.5 shows a mere six > changes to the driver. one thing to note is that for the .26 kernel, there was not very good data collection of this issue yet. (Although.. more than 4% I would say; Fedora had the patches to report the driver info backported for quite some time) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-28 17:32 ` Arjan van de Ven 2008-11-28 18:36 ` Jay Cliburn @ 2008-11-28 19:50 ` Francois Romieu 2008-11-28 20:12 ` Arjan van de Ven 2008-11-30 8:58 ` Roger Luethi 1 sibling, 2 replies; 25+ messages in thread From: Francois Romieu @ 2008-11-28 19:50 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Jay Cliburn, NetDev Arjan van de Ven <arjan@linux•intel.com> : [...] > For me, sis900 and r8169 stand out; if you look at the data in the > table above, both of these are an order of magnitude more frequent than > the rest of the pack. via-rhine + via_rhine = 438: it does not look too good either. Is there an (ideally automated) way to retrieve more information ? The r8169 driver handles three different chipsets and a plethora of phys. The "XID" line printed by the driver could hint at some specific PHY for instance. -- Ueimor ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-28 19:50 ` Francois Romieu @ 2008-11-28 20:12 ` Arjan van de Ven 2008-11-30 8:58 ` Roger Luethi 1 sibling, 0 replies; 25+ messages in thread From: Arjan van de Ven @ 2008-11-28 20:12 UTC (permalink / raw) To: Francois Romieu; +Cc: Jay Cliburn, NetDev On Fri, 28 Nov 2008 20:50:18 +0100 Francois Romieu <romieu@fr•zoreil.com> wrote: > Arjan van de Ven <arjan@linux•intel.com> : > [...] > > For me, sis900 and r8169 stand out; if you look at the data in the > > table above, both of these are an order of magnitude more frequent > > than the rest of the pack. > > via-rhine + via_rhine = 438: it does not look too good either. > > Is there an (ideally automated) way to retrieve more information ? this will need help from the driver and a bit of the core infrastructure. the code that generates the warning is in net/sched/sch_generic.c: char drivername[64]; WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n", dev->name, netdev_drivername(dev, drivername, 64)); dev->tx_timeout(dev); > The r8169 driver handles three different chipsets and a plethora of > phys. The "XID" line printed by the driver could hint at some specific > PHY for instance. anything you add to that WARN_ONCE will end up on kerneloops.org... it could be as simple storing some information in the net dev... or having a function pointer that can print some useful diagnostics information. In addition, I'm trying to get a patch into .29 that prints, on x86, some basic DMI information in every WARN_ON class message; but this won't give you the details about the actual NIC, at most which motherboard is in use. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: oops/warning report for the week of November 26, 2008 2008-11-28 19:50 ` Francois Romieu 2008-11-28 20:12 ` Arjan van de Ven @ 2008-11-30 8:58 ` Roger Luethi 1 sibling, 0 replies; 25+ messages in thread From: Roger Luethi @ 2008-11-30 8:58 UTC (permalink / raw) To: Francois Romieu; +Cc: Arjan van de Ven, Jay Cliburn, NetDev On Fri, 28 Nov 2008 20:50:18 +0100, Francois Romieu wrote: > Arjan van de Ven <arjan@linux•intel.com> : > [...] > > For me, sis900 and r8169 stand out; if you look at the data in the > > table above, both of these are an order of magnitude more frequent than > > the rest of the pack. > > via-rhine + via_rhine = 438: it does not look too good either. Agreed. I was kinda hoping I'd get some clues for free when other drivers get fixed :-). > Is there an (ideally automated) way to retrieve more information ? > > The r8169 driver handles three different chipsets and a plethora of > phys. The "XID" line printed by the driver could hint at some specific > PHY for instance. For the Rhine, knowing the PCI rev would help identify problems tied to specific models. Roger ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2008-11-30 9:05 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-26 23:11 oops/warning report for the week of November 26, 2008 Arjan van de Ven 2008-11-27 0:05 ` Jesse Barnes 2008-11-27 11:48 ` Ingo Molnar 2008-11-27 19:42 ` Alex Chiang 2008-11-27 19:49 ` Arjan van de Ven 2008-11-27 11:52 ` Ingo Molnar 2008-11-27 17:02 ` Jesse Barnes 2008-11-27 18:01 ` Arjan van de Ven 2008-11-27 20:18 ` Ingo Molnar 2008-11-27 20:28 ` Arjan van de Ven 2008-11-27 20:47 ` Ingo Molnar 2008-11-27 20:53 ` Arjan van de Ven 2008-11-28 8:34 ` Ingo Molnar 2008-11-27 21:18 ` H. Peter Anvin 2008-11-27 21:18 ` Yinghai Lu 2008-11-27 21:42 ` H. Peter Anvin 2008-11-28 17:18 ` Jay Cliburn 2008-11-28 17:32 ` Arjan van de Ven 2008-11-28 18:36 ` Jay Cliburn 2008-11-28 18:50 ` Arjan van de Ven 2008-11-28 21:12 ` atl1 transmit timeout Was: " Jay Cliburn 2008-11-28 21:22 ` Arjan van de Ven 2008-11-28 19:50 ` Francois Romieu 2008-11-28 20:12 ` Arjan van de Ven 2008-11-30 8:58 ` Roger Luethi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox