From: Florian Westphal <fw@strlen•de>
To: Florian Westphal <fw@strlen•de>
Cc: netdev@vger•kernel.org, kaber@trash•net, jesse@nicira•com
Subject: Re: [PATCH -next 3/3] ipv4: don't remove df bit when refragmenting
Date: Sun, 12 Apr 2015 10:51:22 +0200 [thread overview]
Message-ID: <20150412085122.GA14720@breakpoint.cc> (raw)
In-Reply-To: <1428704189-31247-4-git-send-email-fw@strlen.de>
Florian Westphal <fw@strlen•de> wrote:
> We always send fragments without DF bit set.
>
> Thus, given following setup:
>
> mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
> A R1 R2 B
>
> Where R1 and R2 run linux with netfilter defragmentation/conntrack
> enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
> will respond with icmp too big error if one of these fragments exceeded
> 1400 bytes. So far, so good.
>
> However, the host A will never learn about the lower 1280 link.
> The next packet presumably sent by A would use 1400 as the new fragment
> size, but R1 will strip DF bit when refragmenting.
>
> Whats worse: If R1 receives fragment sizes 1200 and 100, it would
> forward the reassembled packet without refragmenting, i.e.
> R2 would send an icmp error in response to a packet that was never sent,
> citing mtu that the original sender never exceeded.
>
> In order to 'replay' the original fragments to preserve their semantics,
> one solution is to
>
> 1. set DF bit on the new fragments if it was set on original ones.
> 2. set the size of the new fragments generated by R1 during
> refragmentation to the largest size seen when defragmenting.
>
> R2 will then notice the problem and will send the expected
> 'too big, use 1280' icmp error, and further fragments of this size
> would not grow anymore to 1400 link mtu when R1 refragments.
>
> There is however, one important caveat. We cannot just use existing
> IPCB(skb)->frag_max_size as upper boundary for refragmentation.
>
> We have to consider a case where we receive a large fragment without DF,
> followed by a small fragment with DF set.
>
> In such scenario we must not generate a large spew of small DF-fragments
> (else we induce packet/traffic amplification).
>
> This modifies ip_fragment so that we track largest fragment size seen
> both for DF and non-DF packets.
>
> Then, when we find that we had at least one DF fragment AND the largest
> non-DF fragment did not exceed one with DF set, let ip_fragment know that
> it should refragment using original frag size and also set DF bit on the
> newly created fragments.
Seems Jesse would prefer if we'd set max_frag_size unconditionally.
The advantage of doing that would be that we can easily get rid of the
nf_bridge_mtu_reduction() kludge we have right now since we'd just pick
the largest original fragment size.
Only caveat is that we'd have to check IPSKB_FRAG_PMTU flag too before
deciding to reject if out mtu is too small, which is easy to do.
So, before I respin patch #3: What do others think?
next prev parent reply other threads:[~2015-04-12 8:51 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-10 22:16 [PATCH -next 0/3] net: cap size to original frag size when refragmenting Florian Westphal
2015-04-10 22:16 ` [PATCH -next 1/3] ipv4: reject too-big defragmented DF-skb when forwarding Florian Westphal
2015-04-10 22:51 ` Hannes Frederic Sowa
2015-04-10 22:16 ` [PATCH -next 2/3] ipv6: don't increase size when refragmenting forwarded skbs Florian Westphal
2015-04-10 22:16 ` [PATCH -next 3/3] ipv4: don't remove df bit when refragmenting Florian Westphal
2015-04-12 8:51 ` Florian Westphal [this message]
2015-04-13 17:53 ` [PATCH -next 0/3] net: cap size to original frag size " David Miller
2015-04-13 18:40 ` Hannes Frederic Sowa
2015-04-13 18:56 ` Hannes Frederic Sowa
2015-04-13 20:16 ` David Miller
2015-04-13 20:33 ` Hannes Frederic Sowa
2015-04-16 4:56 ` Herbert Xu
2015-04-16 5:24 ` Patrick McHardy
2015-04-16 5:29 ` Herbert Xu
2015-04-16 5:42 ` Patrick McHardy
2015-04-16 12:11 ` Hannes Frederic Sowa
2015-04-16 15:43 ` David Miller
2015-04-16 16:13 ` Hannes Frederic Sowa
2015-04-16 20:47 ` Herbert Xu
2015-04-16 20:56 ` Patrick McHardy
2015-04-16 23:16 ` Hannes Frederic Sowa
2015-04-16 23:44 ` Patrick McHardy
2015-04-16 20:32 ` Patrick McHardy
2015-04-16 15:32 ` David Miller
2015-04-16 20:55 ` Patrick McHardy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150412085122.GA14720@breakpoint.cc \
--to=fw@strlen$(echo .)de \
--cc=jesse@nicira$(echo .)com \
--cc=kaber@trash$(echo .)net \
--cc=netdev@vger$(echo .)kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox