Re: [RFC PATCH v6 net-next 3/4] tcp: add in_flight to tcp_skb_cb

public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed

From: Lawrence Brakmo <brakmo@fb•com>
To: Yuchung Cheng <ycheng@google•com>
Cc: netdev <netdev@vger•kernel.org>, Kernel Team <Kernel-team@fb•com>,
	"Neal Cardwell" <ncardwell@google•com>,
	Eric Dumazet <eric.dumazet@gmail•com>,
	Stephen Hemminger <stephen@networkplumber•org>,
	Kenneth Klette Jonassen <kennetkl@ifi•uio.no>
Subject: Re: [RFC PATCH v6 net-next 3/4] tcp: add in_flight to tcp_skb_cb
Date: Thu, 27 Aug 2015 23:27:54 +0000	[thread overview]
Message-ID: <D204E9B0.7469%brakmo@fb.com> (raw)
In-Reply-To: <CAK6E8=fNk-x6=QLHT0rjFRh3bARfkErA+VW_PgxRUcu-3fm9pQ@mail.gmail.com>



On 8/27/15, 3:57 PM, "Yuchung Cheng" <ycheng@google•com> wrote:

>On Thu, Aug 27, 2015 at 3:54 PM, Yuchung Cheng <ycheng@google•com> wrote:
>> On Thu, Aug 27, 2015 at 3:44 PM, Lawrence Brakmo <brakmo@fb•com> wrote:
>>> Yuchung, thank you for reviewing these patches. Response inline below.
>>>
>>> On 8/27/15, 3:00 PM, "Yuchung Cheng" <ycheng@google•com> wrote:
>>>
>>>>On Tue, Aug 25, 2015 at 4:33 PM, Lawrence Brakmo <brakmo@fb•com> wrote:
>>>>> Add in_flight (bytes in flight when packet was sent) field
>>>>> to tx component of tcp_skb_cb and make it available to
>>>>> congestion modules' pkts_acked() function through the
>>>>> ack_sample function argument.
>>>>>
>>>>> Signed-off-by: Lawrence Brakmo <brakmo@fb•com>
>> Acked-by: Yuchung Cheng <ycheng@google•com>
>>
>>>>> ---
>>>>>  include/net/tcp.h     | 2 ++
>>>>>  net/ipv4/tcp_input.c  | 5 ++++-
>>>>>  net/ipv4/tcp_output.c | 4 +++-
>>>>>  3 files changed, 9 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>>>>> index a086a98..cdd93e5 100644
>>>>> --- a/include/net/tcp.h
>>>>> +++ b/include/net/tcp.h
>>>>> @@ -757,6 +757,7 @@ struct tcp_skb_cb {
>>>>>         union {
>>>>>                 struct {
>>>>>                         /* There is space for up to 20 bytes */
>>>>> +                       __u32 in_flight;/* Bytes in flight when
>>>>>packet
>>>>>sent */
>>>>>                 } tx;   /* only used for outgoing skbs */
>>>>>                 union {
>>>>>                         struct inet_skb_parm    h4;
>>>>> @@ -842,6 +843,7 @@ union tcp_cc_info;
>>>>>  struct ack_sample {
>>>>>         u32 pkts_acked;
>>>>>         s32 rtt_us;
>>>>> +       u32 in_flight;
>>>>>  };
>>>>>
>>>>>  struct tcp_congestion_ops {
>>>>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>>>>> index f506a0a..338e6bb 100644
>>>>> --- a/net/ipv4/tcp_input.c
>>>>> +++ b/net/ipv4/tcp_input.c
>>>>> @@ -3069,6 +3069,7 @@ static int tcp_clean_rtx_queue(struct sock *sk,
>>>>>int prior_fackets,
>>>>>         long ca_rtt_us = -1L;
>>>>>         struct sk_buff *skb;
>>>>>         u32 pkts_acked = 0;
>>>>> +       u32 last_in_flight = 0;
>>>>>         bool rtt_update;
>>>>>         int flag = 0;
>>>>>
>>>>> @@ -3108,6 +3109,7 @@ static int tcp_clean_rtx_queue(struct sock *sk,
>>>>>int prior_fackets,
>>>>>                         if (!first_ackt.v64)
>>>>>                                 first_ackt = last_ackt;
>>>>>
>>>>> +                       last_in_flight =
>>>>>TCP_SKB_CB(skb)->tx.in_flight;
>>>>>                         reord = min(pkts_acked, reord);
>>>>>                         if (!after(scb->end_seq, tp->high_seq))
>>>>>                                 flag |= FLAG_ORIG_SACK_ACKED;
>>>>> @@ -3197,7 +3199,8 @@ static int tcp_clean_rtx_queue(struct sock *sk,
>>>>>int prior_fackets,
>>>>>         }
>>>>>
>>>>>         if (icsk->icsk_ca_ops->pkts_acked) {
>>>>> -               struct ack_sample sample = {pkts_acked, ca_rtt_us};
>>>>> +               struct ack_sample sample = {pkts_acked, ca_rtt_us,
>>>>> +                                           last_in_flight};
>>>>>
>>>>>                 icsk->icsk_ca_ops->pkts_acked(sk, &sample);
>>>>>         }
>>>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>>>> index 444ab5b..244d201 100644
>>>>> --- a/net/ipv4/tcp_output.c
>>>>> +++ b/net/ipv4/tcp_output.c
>>>>> @@ -920,9 +920,12 @@ static int tcp_transmit_skb(struct sock *sk,
>>>>>struct sk_buff *skb, int clone_it,
>>>>>         int err;
>>>>>
>>>>>         BUG_ON(!skb || !tcp_skb_pcount(skb));
>>>>> +       tp = tcp_sk(sk);
>>>>>
>>>>>         if (clone_it) {
>>>>>                 skb_mstamp_get(&skb->skb_mstamp);
>>>>> +               TCP_SKB_CB(skb)->tx.in_flight =
>>>>>TCP_SKB_CB(skb)->end_seq
>>>>> +                       - tp->snd_una;
>>>>what if skb is a retransmitted packet? e.g. the first retransmission
>>>>in fast recovery would always record an inflight of 1 packet?
>>>
>>> Yes.
>>> This does not affect NV for 2 reasons: 1) NV does not use ACKs when
>>> ca_state is not Open or Disorder to determine congestion state, 2)
>>>even if
>>> we used it, the small inflight means that the computed throughput will
>>>be
>>> small so it will not cause a non-congestion signal, but will not cause
>>>a
>>> congestion signal either because NV needs many (~60) measurements
>>>before
>>> determining there is congestion.
>>>
>>> However, other consumers may prefer a different value. From a
>>>congestion
>>> avoidance perspective, it is unclear we will be able to compute an
>>> accurate throughput when retransmitting, so we may as well give a lower
>>> bound.
>> I see. Then this is OK for now since only NV uses it. We can enhance
>> and track tput even during other CA states later. Would that be a
>> useful feature for NV as well?
>For example, we (at Google servers) have seen some flows staying in
>very long CA_Recovery due to rate limiter or CA_Disorder state due to
>high path reordering. It'd be beneficial to have CC continue to
>operate in these circumstances in the future.

Hopefully congestion avoidance in NV would adapt to a rate limiter and
prevent
losses or large queues.
However, for the time being NV is only recommended for data center traffic
since 1) I¹ve only tested in small RTT environments and 2) cannot compete
fairly with Cubic (at least in small RTT environments).

>
>>
>>>
>>> What do you think?
>>>
>>>>
>>>>>
>>>>>                 if (unlikely(skb_cloned(skb)))
>>>>>                         skb = pskb_copy(skb, gfp_mask);
>>>>> @@ -933,7 +936,6 @@ static int tcp_transmit_skb(struct sock *sk,
>>>>>struct
>>>>>sk_buff *skb, int clone_it,
>>>>>         }
>>>>>
>>>>>         inet = inet_sk(sk);
>>>>> -       tp = tcp_sk(sk);
>>>>>         tcb = TCP_SKB_CB(skb);
>>>>>         memset(&opts, 0, sizeof(opts));
>>>>>
>>>>> --
>>>>> 1.8.1
>>>>>
>>>

next prev parent reply	other threads:[~2015-08-27 23:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-25 23:33 tcp: add NV congestion control Lawrence Brakmo
2015-08-25 23:33 ` [RFC PATCH v6 net-next 1/4] tcp: replace cnt & rtt with struct in pkts_acked() Lawrence Brakmo
2015-08-27 21:42   ` Yuchung Cheng
2015-08-25 23:33 ` [RFC PATCH v6 net-next 2/4] tcp: refactor struct tcp_skb_cb Lawrence Brakmo
2015-08-27 21:53   ` Yuchung Cheng
2015-08-25 23:33 ` [RFC PATCH v6 net-next 3/4] tcp: add in_flight to tcp_skb_cb Lawrence Brakmo
2015-08-27 22:00   ` Yuchung Cheng
2015-08-27 22:44     ` Lawrence Brakmo
2015-08-27 22:54       ` Yuchung Cheng
2015-08-27 22:57         ` Yuchung Cheng
2015-08-27 23:27           ` Lawrence Brakmo [this message]
2015-08-25 23:33 ` [RFC PATCH v6 net-next 4/4] tcp: add NV congestion control Lawrence Brakmo
2015-08-26  0:19 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D204E9B0.7469%brakmo@fb.com \
    --to=brakmo@fb$(echo .)com \
    --cc=Kernel-team@fb$(echo .)com \
    --cc=eric.dumazet@gmail$(echo .)com \
    --cc=kennetkl@ifi$(echo .)uio.no \
    --cc=ncardwell@google$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=stephen@networkplumber$(echo .)org \
    --cc=ycheng@google$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox