public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
* r8169 on 3.8.13, 3.9.2, 3.10-rc1, was Re: [ 00/73] 3.8.13-stable review
       [not found] ` <pan.2013.05.10.10.54.26.943195@googlemail.com>
@ 2013-05-15  0:07   ` Ken Moffat
  2013-05-15  6:14     ` Francois Romieu
  2013-05-15  8:31     ` Holger Hoffstaette
  0 siblings, 2 replies; 6+ messages in thread
From: Ken Moffat @ 2013-05-15  0:07 UTC (permalink / raw)
  To: Holger Hoffstaette; +Cc: linux-kernel, stable, netdev

On Fri, May 10, 2013 at 12:54:27PM +0200, Holger Hoffstaette wrote:

 Cc'ing to netdev because I don't think this has had a response, and
I care because I *might* be seeing the same problem on both 3.9.2
and 3.10-rc1, but my take on the problem is slightly different [
details after Holger's posting ]

> On Thu, 09 May 2013 15:31:23 -0700, Greg Kroah-Hartman wrote:
> 
> > This is the start of the stable review cycle for the 3.8.13 release. There
> 
> This patchset broke my internet, with all sorts of weird effects like
> Samba clients having problems to talk to the server and only partially
> working DNS resolution (CDNs broken, Amazon unreachable).
> 
> After two reboots to/from .12/.13 (to rule out temporary internet
> brokenness) the problem has been identified as:
> 
> > Stefan Bader <stefan.bader@canonical•com>
> >     r8169: fix 8168evl frame padding.
> 
> After reverting only this patch (turning r8169 back to 3.8.12) things
> again behave as expected with the rest of .13. So far no other regressions
> detected.
> 
> This patch should probably be removed from 3.9.2-rc as well.
> 
> -h
> 
 For me, I'm seeing what might be a similar problem in about 2/5 of
my boots of 3.9.2 and 3.10-rc1 : in my case, r8169 is a module [ most
things are built in ], I use dhclient to get an ip address, and I
have separate nfs shares for /sources [ in /etc/mtab ] and my user's
~/notes [ mounted from ~/.bashrc ].

 On a good boot, everything mounts.  On a bad boot, /sources is NOT
mounted because eth0 is not up, but by the time anyone logs in it
_is_ up so mounting ~/notes [and manually mounting /sources as root]
works.  What seems to be happening is that eth0 is coming up
slightly later on some occasions (about 11 seconds from booting,
instead of 10.5 seconds) and somehow the dhclient script seems to
have ended *before* that.

 For me, this isn't bisectable (so far, 10 boots of 3.9.2 and
3.10-rc1 on this box, and 4 were problematic).  On this box, 3.9.0
itself was perfect.

Holger : apologies if I've hijacked this thread with what turns out
to be a different problem.

ken
-- 
das eine Mal als Tragödie, das andere Mal als Farce

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169 on 3.8.13, 3.9.2, 3.10-rc1, was Re: [ 00/73] 3.8.13-stable review
  2013-05-15  0:07   ` r8169 on 3.8.13, 3.9.2, 3.10-rc1, was Re: [ 00/73] 3.8.13-stable review Ken Moffat
@ 2013-05-15  6:14     ` Francois Romieu
  2013-05-15 17:09       ` Ken Moffat
  2013-05-15 20:39       ` David Miller
  2013-05-15  8:31     ` Holger Hoffstaette
  1 sibling, 2 replies; 6+ messages in thread
From: Francois Romieu @ 2013-05-15  6:14 UTC (permalink / raw)
  To: Ken Moffat; +Cc: Holger Hoffstaette, linux-kernel, stable, netdev

Ken Moffat <zarniwhoop@ntlworld•com> :
[...]
>  Cc'ing to netdev because I don't think this has had a response, and

A patch has been sent to netdev a few hours ago. It needs more work,
especially testing (hint, hint) as I don't have a proven test case yet.

Please note:
- if you don't use a 8168evl (check your dmesg for the XID line emitted
  by the r8169 driver), you are not the experiencing the same bug.
- if you don't enable Tx checksum offload (distro/vendor dependent though
  disabled by default in the vanilla driver, see ethtool -k eth0,
  ethtool -K eth0 tx on sg on)), you are not the experiencing the same
  bug.
- if you are experiencing the same bug, 3.10-rc1 should work again
  after reverting e5195c1f31f399289347e043d6abf3ffa80f0005

If someone comes with a failing network capture and a working one, it will
save time. A 64 bytes (max) packet is not correctly transmitted.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169 on 3.8.13, 3.9.2, 3.10-rc1, was Re: [ 00/73] 3.8.13-stable review
  2013-05-15  0:07   ` r8169 on 3.8.13, 3.9.2, 3.10-rc1, was Re: [ 00/73] 3.8.13-stable review Ken Moffat
  2013-05-15  6:14     ` Francois Romieu
@ 2013-05-15  8:31     ` Holger Hoffstaette
  1 sibling, 0 replies; 6+ messages in thread
From: Holger Hoffstaette @ 2013-05-15  8:31 UTC (permalink / raw)
  To: stable; +Cc: linux-kernel, netdev

On Wed, 15 May 2013 01:07:15 +0100, Ken Moffat wrote:

>  Cc'ing to netdev because I don't think this has had a response, and
> I care because I *might* be seeing the same problem on both 3.9.2 and
> 3.10-rc1, but my take on the problem is slightly different [ details after
> Holger's posting ]

I later isolated the regression to tx offloading, which allows one to
reproduce the problem on the fly (enable: b0rken internets, disable: all
fine). So unless you have tx offloading enabled then whatever you are
seeing has likely nothing to do with *this* problem.

Francois posted a patch to -netdev which I am running as we speak, and the
problem seems fixed for me; tx offloading (in adition to rx/gro) works
again, with no side effects. Please help test!

-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169 on 3.8.13, 3.9.2, 3.10-rc1, was Re: [ 00/73] 3.8.13-stable review
  2013-05-15  6:14     ` Francois Romieu
@ 2013-05-15 17:09       ` Ken Moffat
  2013-05-15 20:39       ` David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: Ken Moffat @ 2013-05-15 17:09 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Holger Hoffstaette, linux-kernel, stable, netdev

On Wed, May 15, 2013 at 08:14:01AM +0200, Francois Romieu wrote:
> Ken Moffat <zarniwhoop@ntlworld•com> :
> [...]
> >  Cc'ing to netdev because I don't think this has had a response, and
> 
> A patch has been sent to netdev a few hours ago. It needs more work,
> especially testing (hint, hint) as I don't have a proven test case yet.
> 
> Please note:
> - if you don't use a 8168evl (check your dmesg for the XID line emitted
>   by the r8169 driver), you are not the experiencing the same bug.
[    3.174180] r8169 0000:02:00.0 eth0: RTL8168evl/8111evl at
0xffffc90000008000, c8:60:00:97:07:35, XID 0c900800 IRQ 41

> - if you don't enable Tx checksum offload (distro/vendor dependent though
>   disabled by default in the vanilla driver, see ethtool -k eth0,
>   ethtool -K eth0 tx on sg on)), you are not the experiencing the same
>   bug.

 If it is disabled by default then my problem is different (I don't
have an ethtool program)

> - if you are experiencing the same bug, 3.10-rc1 should work again
>   after reverting e5195c1f31f399289347e043d6abf3ffa80f0005
> 
> If someone comes with a failing network capture and a working one, it will
> save time. A 64 bytes (max) packet is not correctly transmitted.
> 
> -- 
> Ueimor

 Thanks for the detailed comments, looks as if my intermittent
problem is something else.

ken
-- 
das eine Mal als Tragödie, das andere Mal als Farce

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169 on 3.8.13, 3.9.2, 3.10-rc1, was Re: [ 00/73] 3.8.13-stable review
  2013-05-15  6:14     ` Francois Romieu
  2013-05-15 17:09       ` Ken Moffat
@ 2013-05-15 20:39       ` David Miller
  2013-05-15 23:15         ` David Miller
  1 sibling, 1 reply; 6+ messages in thread
From: David Miller @ 2013-05-15 20:39 UTC (permalink / raw)
  To: romieu; +Cc: zarniwhoop, holger.hoffstaette, linux-kernel, stable, netdev

From: Francois Romieu <romieu@fr•zoreil.com>
Date: Wed, 15 May 2013 08:14:01 +0200

> Ken Moffat <zarniwhoop@ntlworld•com> :
> [...]
>>  Cc'ing to netdev because I don't think this has had a response, and
> 
> A patch has been sent to netdev a few hours ago. It needs more work,
> especially testing (hint, hint) as I don't have a proven test case yet.
> 
> Please note:
> - if you don't use a 8168evl (check your dmesg for the XID line emitted
>   by the r8169 driver), you are not the experiencing the same bug.
> - if you don't enable Tx checksum offload (distro/vendor dependent though
>   disabled by default in the vanilla driver, see ethtool -k eth0,
>   ethtool -K eth0 tx on sg on)), you are not the experiencing the same
>   bug.
> - if you are experiencing the same bug, 3.10-rc1 should work again
>   after reverting e5195c1f31f399289347e043d6abf3ffa80f0005
> 
> If someone comes with a failing network capture and a working one, it will
> save time. A 64 bytes (max) packet is not correctly transmitted.

FWIW, I was about to submit the regression causing commit to -stable
but now I'm going to hold off until we get the fix for it to Linus.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r8169 on 3.8.13, 3.9.2, 3.10-rc1, was Re: [ 00/73] 3.8.13-stable review
  2013-05-15 20:39       ` David Miller
@ 2013-05-15 23:15         ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2013-05-15 23:15 UTC (permalink / raw)
  To: romieu; +Cc: zarniwhoop, holger.hoffstaette, linux-kernel, stable, netdev

From: David Miller <davem@davemloft•net>
Date: Wed, 15 May 2013 13:39:13 -0700 (PDT)

> From: Francois Romieu <romieu@fr•zoreil.com>
> Date: Wed, 15 May 2013 08:14:01 +0200
> 
>> Ken Moffat <zarniwhoop@ntlworld•com> :
>> [...]
>>>  Cc'ing to netdev because I don't think this has had a response, and
>> 
>> A patch has been sent to netdev a few hours ago. It needs more work,
>> especially testing (hint, hint) as I don't have a proven test case yet.
>> 
>> Please note:
>> - if you don't use a 8168evl (check your dmesg for the XID line emitted
>>   by the r8169 driver), you are not the experiencing the same bug.
>> - if you don't enable Tx checksum offload (distro/vendor dependent though
>>   disabled by default in the vanilla driver, see ethtool -k eth0,
>>   ethtool -K eth0 tx on sg on)), you are not the experiencing the same
>>   bug.
>> - if you are experiencing the same bug, 3.10-rc1 should work again
>>   after reverting e5195c1f31f399289347e043d6abf3ffa80f0005
>> 
>> If someone comes with a failing network capture and a working one, it will
>> save time. A 64 bytes (max) packet is not correctly transmitted.
> 
> FWIW, I was about to submit the regression causing commit to -stable
> but now I'm going to hold off until we get the fix for it to Linus.

Oh, I see, someone merged it behind my back.

Well, guys, now do you see why I let patches cook in Linus's tree for a
week or two before I submit them to Linus?

It's so that bugs like this are less likely to propagate, and we find
them such problems and fix them before a change hits -stable and
therefore has an effect on an even larger number of users.

Please, use the networking -stable submission process.  If you want a
patch merged, tell me, and I'll queue it up in patchwork.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-05-15 23:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20130509222757.917088509@linuxfoundation.org>
     [not found] ` <pan.2013.05.10.10.54.26.943195@googlemail.com>
2013-05-15  0:07   ` r8169 on 3.8.13, 3.9.2, 3.10-rc1, was Re: [ 00/73] 3.8.13-stable review Ken Moffat
2013-05-15  6:14     ` Francois Romieu
2013-05-15 17:09       ` Ken Moffat
2013-05-15 20:39       ` David Miller
2013-05-15 23:15         ` David Miller
2013-05-15  8:31     ` Holger Hoffstaette

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox