public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Zoltan Kiss <zoltan.kiss@citrix•com>
To: Sander Eikelenboom <linux@eikelenboom•it>
Cc: Ian Campbell <Ian.Campbell@citrix•com>,
	"David S. Miller" <davem@davemloft•net>, <netdev@vger•kernel.org>,
	<xen-devel@lists•xen.org>
Subject: Re: [3.15-rc3] Bisected: xen-netback mangles packets between two guests on a bridge since merge of "TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy" series.
Date: Thu, 1 May 2014 16:46:01 +0100	[thread overview]
Message-ID: <53626C39.3030304@citrix.com> (raw)
In-Reply-To: <1810270947.20140501155911@eikelenboom.it>

On 01/05/14 14:59, Sander Eikelenboom wrote:
>
> Thursday, May 1, 2014, 3:37:41 PM, you wrote:
>
>> On 30/04/14 23:25, Sander Eikelenboom wrote:
>>>
>>> Wednesday, April 30, 2014, 10:53:39 PM, you wrote:
>>>
>>>> On 30/04/14 11:45, Sander Eikelenboom wrote:
>>>>> Hi Zoltan,
>>>>>
>>>>> Your series "TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy", merged into mainline with merge commit 4caeccb4de76440e433a15009636e77d003eb3d6,
>>>>> seem to introduce a subtle bug on network traffic between 2 guests on a bridge on the same host.
>>>>> I have one guest running apache as webdav server with SSL and another guest that is using that is uploading large files to that webdav server.
>>>>> Small requests (some get's and propfind's) seem to work ok, but when the bulk uploading begins it fails with:
>>>>>
>>>>> Attempt 1 failed. SSLError: [Errno 1] _ssl.c:1415: error:140943FC:SSL routines:SSL3_READ_BYTES:sslv3 alert bad record mac
>>>>> Attempt 2 failed. SSLError: [Errno 1] _ssl.c:1415: error:140943FC:SSL routines:SSL3_READ_BYTES:sslv3 alert bad record mac
>>>>> Attempt 3 failed. SSLError: [Errno 1] _ssl.c:1415: error:140943FC:SSL routines:SSL3_READ_BYTES:sslv3 alert bad record mac
>>>>> Attempt 4 failed. SSLError: [Errno 1] _ssl.c:1415: error:140943FC:SSL routines:SSL3_READ_BYTES:sslv3 alert bad record mac
>>>>>
>>>>> So some how large (probably fragmented) packets can get mangled when from guest to guest on the same host.
>>>>> I don't see this with clients that upload large files from external sources.
>>>>> Probably if SSL wasn't complaining it would probably be unnoticed for longer and doing some silent corruption.
>>>>>
>>>>> I first blamed openssl, since it started around all the latest openssl mayhem and updates, but it turns out it is all xen-netback related again.
>>>>>
>>>>> Since these commits break bisectabillity:
>>>>>        - 1bb332af4cd889e4b64dacbf4a793ceb3a70445d  (note in commit message && kernel panic)
>>>>>        - 62bad3199a4c20505fc36c169deef20b25e17c5f  (kernel panic)
>>>>> i stopped bisecting at this point.
>>>>>
>>>>> The upside is .. it's 100% reproduceable :-)
>>>> That's good :) Can you take tcpdump captures along the way (sending
>>>> guest, dom0, receiving guest), and try to work out which packets are
>>>> different, and where? Although taking captures in Dom0 might change your
>>>> result, as it triggers the pages to be copied and unmapped before they
>>>> reach their target.
>>>
>>>> Thanks,
>>>> Zoli
>>>
>>>
>>> Hrrmm that sounds like a lot of data and a lot of work ..
>> If you could make captures in the sending and receiving guest with
>> tcpdump (take care of increasing snaplen so the whole packet is there,
>> and filter to the SSH connection itself), and upload it somewhere for
>> me, that would be enough for start. I will try to work out where the
>> corruption happens.
>> Also, do you have timestamps for the above mentioned log entries? I
>> guess they appear on the receiving side.
>> And some info about the componenets on the server, so I can work out
>> where is that _ssl.c:1415, and which part of the packet it actually
>> looks for.
>
> They appear on the sending side (duplicity) .. the receiving side (apache +
> mod_dav + ssl | gnu_tls) gives a "Could not get next bucket brigade (URI:"
I will try to repro this case in house. What versions of these 
components you used?

Zoli

>
>
>>>
>>> how ever .. could it be just a type and would the following make sense ?
>>>
>>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>>> index 7666540..abeea10 100644
>>> --- a/drivers/net/xen-netback/netback.c
>>> +++ b/drivers/net/xen-netback/netback.c
>>> @@ -1366,7 +1366,7 @@ static int xenvif_handle_frag_list(struct xenvif *vif, struct sk_buff *skb)
>>>
>>>           xenvif_fill_frags(vif, nskb);
>>>           /* Subtract frags size, we will correct it later */
>>> -       skb->truesize -= skb->data_len;
>>> +       skb->truesize -= nskb->data_len;
>>>           skb->len += nskb->len;
>>>           skb->data_len += nskb->len;
>
>> Nope, that's correct there: after that skb->truesize will be the size of
>> the struct plus the linear buffer itself. The code is just about the
>> ditch the original fragments plus the skb on the frag_list. When the new
>> pages are created, it will update it again.
>
> Well i just went a head and tried this .. and the uploading does seem to work fine with this change
> .. (that obviously doesn't say anything about correctness)
>
>> Also, this code path runs only if the guest sends more slots we can
>> handle (so we put the extra one to the frag_list until we can get rid of
>> it). On Linux it can only happen with 3.2 or older guest kernels, and
>> only occasionally. As you said, this is 100% reproducible, so I would
>> doubt the problem is with this part of the code.
>
> Well this assumption seems to be incorrect:
>          - both dom0 and guest kernels are 3.15-rc3's.
>          - but we do end up in this code path
>
>> Zoli
>
>

  reply	other threads:[~2014-05-01 15:48 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-30 10:45 [3.15-rc3] Bisected: xen-netback mangles packets between two guests on a bridge since merge of "TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy" series Sander Eikelenboom
2014-04-30 15:24 ` Eric Dumazet
2014-04-30 20:40   ` Zoltan Kiss
2014-04-30 20:53 ` Zoltan Kiss
2014-04-30 22:25   ` Sander Eikelenboom
2014-05-01 13:37     ` Zoltan Kiss
2014-05-01 13:59       ` Sander Eikelenboom
2014-05-01 15:46         ` Zoltan Kiss [this message]
2014-05-01 17:39           ` Sander Eikelenboom
2014-05-01 17:46             ` Eric Dumazet
2014-05-01 19:39             ` [Xen-devel] " Sander Eikelenboom
2014-05-02 14:00               ` Zoltan Kiss
2014-05-02 14:06                 ` Sander Eikelenboom
2014-05-02 14:47                   ` Zoltan Kiss
2014-05-02 15:21                     ` Eric Dumazet
2014-05-02 15:26                       ` Zoltan Kiss
2014-05-02 16:28                         ` Sander Eikelenboom
2014-05-02 16:45                           ` Zoltan Kiss
2014-05-05 10:19                             ` Sander Eikelenboom
2014-05-06 17:07                               ` Steven Haigh
2014-05-06 17:13                                 ` Zoltan Kiss
2014-05-06 17:37                                   ` Sander Eikelenboom
2014-05-06 18:07                                     ` Steven Haigh
2014-05-07  8:16                                       ` [Xen-devel] " David Vrabel
2014-05-16  2:13                                         ` Steven Haigh
2014-05-06 17:08                               ` [Xen-devel] " Zoltan Kiss
2014-05-06 17:10                               ` Zoltan Kiss
2014-05-06 17:33                                 ` Sander Eikelenboom
2014-05-01 13:49 ` Zoltan Kiss
2014-05-01 14:05   ` Sander Eikelenboom
2014-05-01 15:16     ` Zoltan Kiss
2014-05-01 15:40       ` Sander Eikelenboom
2014-05-02 15:35         ` Eric Dumazet
2014-05-02 22:18           ` Sander Eikelenboom
2014-05-09 22:19           ` Neal Cardwell
2014-05-09 21:02 ` Zoltan Kiss
2014-05-13 13:40   ` Zoltan Kiss

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53626C39.3030304@citrix.com \
    --to=zoltan.kiss@citrix$(echo .)com \
    --cc=Ian.Campbell@citrix$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=linux@eikelenboom$(echo .)it \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=xen-devel@lists$(echo .)xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox