From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D794339874 for ; Mon, 5 Jan 2026 14:08:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767622109; cv=none; b=W2SGDplRz+Zs9UoT+Kh7PxwuA7qT2mhwelH8o1n/wqBWO6IbRShvBHmdnu0y8hvemVnIUHkRTe2hzOEP1ZHFe22ZpJcnfXnposAJg/MUXmHoQu4RzTjCDs0zSaqhMlnKftQx4Gcx9AOSh39S7GXKylUNZrHZ0LoFSU7UT2dc8XI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767622109; c=relaxed/simple; bh=WbNrbO2LnpxE01bGe0NsqfVnRtWnVrNi4DSjr1SfMtk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CGIYatEihcXvU06/2EIaGNl/ZIHAW0IVbr1KLe+Gs8XdFIqGitQ8HeE3rtdfT5iG11pxFdN0PfE2AEatf1eJnIdxmxmT8twnUoHmTTkcY1QpQ8znQ6wD4XgZRI+HkOUib0ZaWYQgwopGv0eyFK3Uqw4iXhaU1aGKSQDFMB/qg2U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=WYn03Pw1; arc=none smtp.client-ip=209.85.160.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WYn03Pw1" Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-4eda057f3c0so161751821cf.2 for ; Mon, 05 Jan 2026 06:08:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767622105; x=1768226905; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SPY2+S7BYo9nM2X5T6HZwxbCqQMIHW4ChkQjwN0puoA=; b=WYn03Pw11BdN3bgnpunkpxS4pih6mTwib6MICAgOVLZGgOwCSMpC9tp8vTcZf7kT1S 4ey88PuM6VfB4pTF5vVXpGBDdgO0z6boB8LG5QOFARZbawyseFieSMwyhEzqJG5NjcbJ CIC4TuCHf2L8C48zLjMUyJezJkcJLqoEiPWdgpHpzFAa87Vksb3iy9gKG0g46X8X3/i8 IufOve2YKuVjaSOCHLCOqOkVC5xUqI4WjdNPhxz2g1MuP/ZrVMpy2UKK0P4IcszaqC6v nIXgipDf6cEoVzfoPQSP3D6HzLhBIQnFY2D8vt6pgIYKtdir7vffYcNw2L1efdQKs9aU qYog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767622105; x=1768226905; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=SPY2+S7BYo9nM2X5T6HZwxbCqQMIHW4ChkQjwN0puoA=; b=kiOomwBOsxr2JTvythfezXqN5YPdnADkaGotNDAGvC5APYWuX+pFAh5DLeIDfqd0hs sTAg03uUoww1crcb4jFKZHFXRQGaBk/77XpxN0oSLwWyKasya3OoQCMWwG/56AZxAnkz GFtyZf0CHwm34S2p7M6g4CplPzubLWDQjaqoUGy82ZIGdsoby9SEjgqLUVUfN3mFXeZn ASs7R0e/qGbuoIC+XkUVYYSH62k7/0fQg6tfjS3ffSn8h0BPqk14a8hO6Lc4JUPF3IOk 7YrIML8q+EZq/Sk6useSf8drOB7X9PHLu+/UNzz1LlV6zgoleaTwcEdtTm4WRF0Qe9m8 O4dg== X-Forwarded-Encrypted: i=1; AJvYcCXdePOYsBe7SoME9S76kaNLA3wccRpUNOFnjgYnIDjUO9Zm+xTTGm9kciZBo6jZLCp7py3d@lists.linux.dev X-Gm-Message-State: AOJu0YxUxJgq2yem/zvpv5ihgQTKTZ1WBfy2y7MRuMARFACHkGvbD0yC 7djFjpb2n3kstsBu1efHUwPmJoaK0c/egobW15XHMCq+Erj7yxlQI8Hd X-Gm-Gg: AY/fxX4Xyvx/4FZAlUipyL80iM+BVdgwJ4Dde1XvIcVJ4eOjB9cFyrOG18tYY65s49r 0AExtYUhJONt4jn5Su8cWxMwFJFmavBH6DpG649hMt6qKyo2o1BA0H5KvHeAAVS7+XvtIQB5CBy si4rUTh6BjNl0bQR/6c/3zRAvojf0Q+9nj3IDLwZTxtp1UXsjv7B3VvkWW+e9s8WKeNQlUy/GS8 HHFbTaD0hJIDbAjykN9cg9h2pb0YK6FnMl9/iZK5cpb+JCPYk5wuIkyZpJ4VIv6Wvgr7yLe3qmf JrKmdj5EAdJdcJ/YnWySce2qPqQXMi6RDwLouE+245EWNp2aNqSaaSKziDKV8LozXN5TbaSzjlf Ks+MHCpAnSFU4UDsVmvnFNngkLbA3zSr905/DGX4qLjQKUcV/zCaxWr3arlf/Ik9MezFzaZrCP1 +eP15DWMZW20VQxZqZsaiGq+Ts5Ncyl3VRejcqeGoe1sPsPzk8UEg= X-Google-Smtp-Source: AGHT+IGx4si2gzAXf06JjqvB8L020opE3MGIYUuuC5J3sIhknwVIuWJ6G4WNHNxpENbLtwyOmWOfbQ== X-Received: by 2002:ac8:5d45:0:b0:4ff:8754:eec2 with SMTP id d75a77b69052e-4ff8754f128mr85516751cf.40.1767622103976; Mon, 05 Jan 2026 06:08:23 -0800 (PST) Received: from wsfd-netdev58.anl.eng.rdu2.dc.redhat.com ([66.187.232.140]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4f4ac64a47esm368957221cf.24.2026.01.05.06.08.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Jan 2026 06:08:23 -0800 (PST) From: Xin Long To: network dev , quic@lists.linux.dev Cc: davem@davemloft.net, kuba@kernel.org, Eric Dumazet , Paolo Abeni , Simon Horman , Stefan Metzmacher , Moritz Buhl , Tyler Fanelli , Pengtao He , Thomas Dreibholz , linux-cifs@vger.kernel.org, Steve French , Namjae Jeon , Paulo Alcantara , Tom Talpey , kernel-tls-handshake@lists.linux.dev, Chuck Lever , Jeff Layton , Steve Dickson , Hannes Reinecke , Alexander Aring , David Howells , Matthieu Baerts , John Ericson , Cong Wang , "D . Wythe" , Jason Baron , illiliti , Sabrina Dubroca , Marcelo Ricardo Leitner , Daniel Stenberg , Andy Gospodarek Subject: [PATCH net-next v6 09/16] quic: add congestion control Date: Mon, 5 Jan 2026 09:04:35 -0500 Message-ID: <43f8ffca6c157386502e4f6d339de5f1156475a8.1767621882.git.lucien.xin@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: quic@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit This patch introduces 'quic_cong' for RTT measurement and congestion control. The 'quic_cong_ops' is added to define the congestion control algorithm. It implements a congestion control state machine with slow start, congestion avoidance, and recovery phases, and currently introduces the New Reno algorithm only. The implementation updates RTT estimates when packets are acknowledged, reacts to loss and ECN signals, and adjusts the congestion window accordingly during packet transmission and acknowledgment processing. - quic_cong_rtt_update(): Performs RTT measurement, invoked when a packet is acknowledged by the largest number in the ACK frame. - quic_cong_on_packet_acked(): Invoked when a packet is acknowledged. - quic_cong_on_packet_lost(): Invoked when a packet is marked as lost. - quic_cong_on_process_ecn(): Invoked when an ACK_ECN frame is received. - quic_cong_on_packet_sent(): Invoked when a packet is transmitted. - quic_cong_on_ack_recv(): Invoked when an ACK frame is received. Signed-off-by: Xin Long --- v4: - Remove the CUBIC congestion algorithm support for this version (suggested by Paolo). v5: - Do not update the pacing rate when !cong->smoothed_rtt in quic_cong_pace_update() (suggested by Paolo). - Change timestamp variables from u32 to u64, as RTT is measured in microseconds and u64 provides sufficient precision for timestamps in microsecond. --- net/quic/Makefile | 3 +- net/quic/cong.c | 307 ++++++++++++++++++++++++++++++++++++++++++++++ net/quic/cong.h | 120 ++++++++++++++++++ net/quic/socket.c | 1 + net/quic/socket.h | 7 ++ 5 files changed, 437 insertions(+), 1 deletion(-) create mode 100644 net/quic/cong.c create mode 100644 net/quic/cong.h diff --git a/net/quic/Makefile b/net/quic/Makefile index 1565fb5cef9d..4d4a42c6d565 100644 --- a/net/quic/Makefile +++ b/net/quic/Makefile @@ -5,4 +5,5 @@ obj-$(CONFIG_IP_QUIC) += quic.o -quic-y := common.o family.o protocol.o socket.o stream.o connid.o path.o +quic-y := common.o family.o protocol.o socket.o stream.o connid.o path.o \ + cong.o diff --git a/net/quic/cong.c b/net/quic/cong.c new file mode 100644 index 000000000000..ec3bc22dfa67 --- /dev/null +++ b/net/quic/cong.c @@ -0,0 +1,307 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* QUIC kernel implementation + * (C) Copyright Red Hat Corp. 2023 + * + * This file is part of the QUIC kernel implementation + * + * Initialization/cleanup for QUIC protocol support. + * + * Written or modified by: + * Xin Long + */ + +#include + +#include "common.h" +#include "cong.h" + +static int quic_cong_check_persistent_congestion(struct quic_cong *cong, u64 time) +{ + u32 ssthresh; + + /* rfc9002#section-7.6.1: + * (smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay) * + * kPersistentCongestionThreshold + */ + ssthresh = cong->smoothed_rtt + max(4 * cong->rttvar, QUIC_KGRANULARITY); + ssthresh = (ssthresh + cong->max_ack_delay) * QUIC_KPERSISTENT_CONGESTION_THRESHOLD; + if (cong->time - time <= ssthresh) + return 0; + + pr_debug("%s: permanent congestion, cwnd: %u, ssthresh: %u\n", + __func__, cong->window, cong->ssthresh); + cong->min_rtt_valid = 0; + cong->window = cong->min_window; + cong->state = QUIC_CONG_SLOW_START; + return 1; +} + +/* NEW RENO APIs */ +static void quic_reno_on_packet_lost(struct quic_cong *cong, u64 time, u32 bytes, s64 number) +{ + if (quic_cong_check_persistent_congestion(cong, time)) + return; + + switch (cong->state) { + case QUIC_CONG_SLOW_START: + pr_debug("%s: slow_start -> recovery, cwnd: %u, ssthresh: %u\n", + __func__, cong->window, cong->ssthresh); + break; + case QUIC_CONG_RECOVERY_PERIOD: + return; + case QUIC_CONG_CONGESTION_AVOIDANCE: + pr_debug("%s: cong_avoid -> recovery, cwnd: %u, ssthresh: %u\n", + __func__, cong->window, cong->ssthresh); + break; + default: + pr_debug("%s: wrong congestion state: %d\n", __func__, cong->state); + return; + } + + cong->recovery_time = cong->time; + cong->state = QUIC_CONG_RECOVERY_PERIOD; + cong->ssthresh = max(cong->window >> 1U, cong->min_window); + cong->window = cong->ssthresh; +} + +static void quic_reno_on_packet_acked(struct quic_cong *cong, u64 time, u32 bytes, s64 number) +{ + switch (cong->state) { + case QUIC_CONG_SLOW_START: + cong->window = min_t(u32, cong->window + bytes, cong->max_window); + if (cong->window >= cong->ssthresh) { + cong->state = QUIC_CONG_CONGESTION_AVOIDANCE; + pr_debug("%s: slow_start -> cong_avoid, cwnd: %u, ssthresh: %u\n", + __func__, cong->window, cong->ssthresh); + } + break; + case QUIC_CONG_RECOVERY_PERIOD: + if (cong->recovery_time < time) { + cong->state = QUIC_CONG_CONGESTION_AVOIDANCE; + pr_debug("%s: recovery -> cong_avoid, cwnd: %u, ssthresh: %u\n", + __func__, cong->window, cong->ssthresh); + } + break; + case QUIC_CONG_CONGESTION_AVOIDANCE: + cong->window += cong->mss * bytes / cong->window; + break; + default: + pr_debug("%s: wrong congestion state: %d\n", __func__, cong->state); + return; + } +} + +static void quic_reno_on_process_ecn(struct quic_cong *cong) +{ + switch (cong->state) { + case QUIC_CONG_SLOW_START: + pr_debug("%s: slow_start -> recovery, cwnd: %u, ssthresh: %u\n", + __func__, cong->window, cong->ssthresh); + break; + case QUIC_CONG_RECOVERY_PERIOD: + return; + case QUIC_CONG_CONGESTION_AVOIDANCE: + pr_debug("%s: cong_avoid -> recovery, cwnd: %u, ssthresh: %u\n", + __func__, cong->window, cong->ssthresh); + break; + default: + pr_debug("%s: wrong congestion state: %d\n", __func__, cong->state); + return; + } + + cong->recovery_time = cong->time; + cong->state = QUIC_CONG_RECOVERY_PERIOD; + cong->ssthresh = max(cong->window >> 1U, cong->min_window); + cong->window = cong->ssthresh; +} + +static void quic_reno_on_init(struct quic_cong *cong) +{ +} + +static struct quic_cong_ops quic_congs[] = { + { /* QUIC_CONG_ALG_RENO */ + .on_packet_acked = quic_reno_on_packet_acked, + .on_packet_lost = quic_reno_on_packet_lost, + .on_process_ecn = quic_reno_on_process_ecn, + .on_init = quic_reno_on_init, + }, +}; + +/* COMMON APIs */ +void quic_cong_on_packet_lost(struct quic_cong *cong, u64 time, u32 bytes, s64 number) +{ + cong->ops->on_packet_lost(cong, time, bytes, number); +} + +void quic_cong_on_packet_acked(struct quic_cong *cong, u64 time, u32 bytes, s64 number) +{ + cong->ops->on_packet_acked(cong, time, bytes, number); +} + +void quic_cong_on_process_ecn(struct quic_cong *cong) +{ + cong->ops->on_process_ecn(cong); +} + +/* Update Probe Timeout (PTO) and loss detection delay based on RTT stats. */ +static void quic_cong_pto_update(struct quic_cong *cong) +{ + u32 pto, loss_delay; + + /* rfc9002#section-6.2.1: + * PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay + */ + pto = cong->smoothed_rtt + max(4 * cong->rttvar, QUIC_KGRANULARITY); + cong->pto = pto + cong->max_ack_delay; + + /* rfc9002#section-6.1.2: + * max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity) + */ + loss_delay = QUIC_KTIME_THRESHOLD(max(cong->smoothed_rtt, cong->latest_rtt)); + cong->loss_delay = max(loss_delay, QUIC_KGRANULARITY); + + pr_debug("%s: update pto: %u\n", __func__, pto); +} + +/* Update pacing timestamp after sending 'bytes' bytes. + * + * This function tracks when the next packet is allowed to be sent based on pacing rate. + */ +static void quic_cong_update_pacing_time(struct quic_cong *cong, u32 bytes) +{ + u64 prior_time, credit, len_ns, rate = READ_ONCE(cong->pacing_rate); + + if (!rate) + return; + + prior_time = cong->pacing_time; + cong->pacing_time = max(cong->pacing_time, ktime_get_ns()); + credit = cong->pacing_time - prior_time; + + /* take into account OS jitter */ + len_ns = div64_ul((u64)bytes * NSEC_PER_SEC, rate); + len_ns -= min_t(u64, len_ns / 2, credit); + cong->pacing_time += len_ns; +} + +/* Compute and update the pacing rate based on congestion window and smoothed RTT. */ +static void quic_cong_pace_update(struct quic_cong *cong, u32 bytes, u64 max_rate) +{ + u64 rate; + + if (unlikely(!cong->smoothed_rtt)) + return; + + /* rate = N * congestion_window / smoothed_rtt */ + rate = div64_ul((u64)cong->window * USEC_PER_SEC * 2, cong->smoothed_rtt); + + WRITE_ONCE(cong->pacing_rate, min_t(u64, rate, max_rate)); + pr_debug("%s: update pacing rate: %llu, max rate: %llu, srtt: %u\n", + __func__, cong->pacing_rate, max_rate, cong->smoothed_rtt); +} + +void quic_cong_on_packet_sent(struct quic_cong *cong, u64 time, u32 bytes, s64 number) +{ + if (!bytes) + return; + if (cong->ops->on_packet_sent) + cong->ops->on_packet_sent(cong, time, bytes, number); + quic_cong_update_pacing_time(cong, bytes); +} + +void quic_cong_on_ack_recv(struct quic_cong *cong, u32 bytes, u64 max_rate) +{ + if (!bytes) + return; + if (cong->ops->on_ack_recv) + cong->ops->on_ack_recv(cong, bytes, max_rate); + quic_cong_pace_update(cong, bytes, max_rate); +} + +/* rfc9002#section-5: Estimating the Round-Trip Time */ +void quic_cong_rtt_update(struct quic_cong *cong, u64 time, u32 ack_delay) +{ + u32 adjusted_rtt, rttvar_sample; + + /* Ignore RTT sample if ACK delay is suspiciously large. */ + if (ack_delay > cong->max_ack_delay * 2) + return; + + /* rfc9002#section-5.1: latest_rtt = ack_time - send_time_of_largest_acked */ + cong->latest_rtt = cong->time - time; + + /* rfc9002#section-5.2: Estimating min_rtt */ + if (!cong->min_rtt_valid) { + cong->min_rtt = cong->latest_rtt; + cong->min_rtt_valid = 1; + } + if (cong->min_rtt > cong->latest_rtt) + cong->min_rtt = cong->latest_rtt; + + if (!cong->is_rtt_set) { + /* rfc9002#section-5.3: + * smoothed_rtt = latest_rtt + * rttvar = latest_rtt / 2 + */ + cong->smoothed_rtt = cong->latest_rtt; + cong->rttvar = cong->smoothed_rtt / 2; + quic_cong_pto_update(cong); + cong->is_rtt_set = 1; + return; + } + + /* rfc9002#section-5.3: + * adjusted_rtt = latest_rtt + * if (latest_rtt >= min_rtt + ack_delay): + * adjusted_rtt = latest_rtt - ack_delay + * smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt + * rttvar_sample = abs(smoothed_rtt - adjusted_rtt) + * rttvar = 3/4 * rttvar + 1/4 * rttvar_sample + */ + adjusted_rtt = cong->latest_rtt; + if (cong->latest_rtt >= cong->min_rtt + ack_delay) + adjusted_rtt = cong->latest_rtt - ack_delay; + + cong->smoothed_rtt = (cong->smoothed_rtt * 7 + adjusted_rtt) / 8; + if (cong->smoothed_rtt >= adjusted_rtt) + rttvar_sample = cong->smoothed_rtt - adjusted_rtt; + else + rttvar_sample = adjusted_rtt - cong->smoothed_rtt; + cong->rttvar = (cong->rttvar * 3 + rttvar_sample) / 4; + quic_cong_pto_update(cong); + + if (cong->ops->on_rtt_update) + cong->ops->on_rtt_update(cong); +} + +void quic_cong_set_algo(struct quic_cong *cong, u8 algo) +{ + if (algo >= QUIC_CONG_ALG_MAX) + algo = QUIC_CONG_ALG_RENO; + + cong->state = QUIC_CONG_SLOW_START; + cong->ssthresh = U32_MAX; + cong->ops = &quic_congs[algo]; + cong->ops->on_init(cong); +} + +void quic_cong_set_srtt(struct quic_cong *cong, u32 srtt) +{ + /* rfc9002#section-5.3: + * smoothed_rtt = kInitialRtt + * rttvar = kInitialRtt / 2 + */ + cong->latest_rtt = srtt; + cong->smoothed_rtt = cong->latest_rtt; + cong->rttvar = cong->smoothed_rtt / 2; + quic_cong_pto_update(cong); +} + +void quic_cong_init(struct quic_cong *cong) +{ + cong->max_ack_delay = QUIC_DEF_ACK_DELAY; + cong->max_window = S32_MAX / 2; + quic_cong_set_algo(cong, QUIC_CONG_ALG_RENO); + quic_cong_set_srtt(cong, QUIC_RTT_INIT); +} diff --git a/net/quic/cong.h b/net/quic/cong.h new file mode 100644 index 000000000000..e6cfb0fa1b6c --- /dev/null +++ b/net/quic/cong.h @@ -0,0 +1,120 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* QUIC kernel implementation + * (C) Copyright Red Hat Corp. 2023 + * + * This file is part of the QUIC kernel implementation + * + * Written or modified by: + * Xin Long + */ + +#define QUIC_KPERSISTENT_CONGESTION_THRESHOLD 3 +#define QUIC_KPACKET_THRESHOLD 3 +#define QUIC_KTIME_THRESHOLD(rtt) ((rtt) * 9 / 8) +#define QUIC_KGRANULARITY 1000U + +#define QUIC_RTT_INIT 333000U +#define QUIC_RTT_MAX 2000000U +#define QUIC_RTT_MIN QUIC_KGRANULARITY + +/* rfc9002#section-7.3: Congestion Control States + * + * New path or +------------+ + * persistent congestion | Slow | + * (O)---------------------->| Start | + * +------------+ + * | + * Loss or | + * ECN-CE increase | + * v + * +------------+ Loss or +------------+ + * | Congestion | ECN-CE increase | Recovery | + * | Avoidance |------------------>| Period | + * +------------+ +------------+ + * ^ | + * | | + * +----------------------------+ + * Acknowledgment of packet + * sent during recovery + */ +enum quic_cong_state { + QUIC_CONG_SLOW_START, + QUIC_CONG_RECOVERY_PERIOD, + QUIC_CONG_CONGESTION_AVOIDANCE, +}; + +struct quic_cong { + /* RTT tracking */ + u32 max_ack_delay; /* max_ack_delay from rfc9000#section-18.2 */ + u32 smoothed_rtt; /* Smoothed RTT */ + u32 latest_rtt; /* Latest RTT sample */ + u32 min_rtt; /* Lowest observed RTT */ + u32 rttvar; /* RTT variation */ + u32 pto; /* Probe timeout */ + + /* Timing & pacing */ + u64 recovery_time; /* Recovery period start timestamp */ + u64 pacing_rate; /* Packet sending speed Bytes/sec */ + u64 pacing_time; /* Next scheduled send timestamp (ns) */ + u64 time; /* Cachedached current timestamp */ + + /* Congestion window */ + u32 max_window; /* Max growth cap */ + u32 min_window; /* Min window limit */ + u32 loss_delay; /* Time before marking loss */ + u32 ssthresh; /* Slow start threshold */ + u32 window; /* Bytes in flight allowed */ + u32 mss; /* QUIC MSS (excl. UDP) */ + + /* Algorithm-specific */ + struct quic_cong_ops *ops; + u64 priv[8]; /* Algo private data */ + + /* Flags & state */ + u8 min_rtt_valid; /* min_rtt initialized */ + u8 is_rtt_set; /* RTT samples exist */ + u8 state; /* State machine in rfc9002#section-7.3 */ +}; + +/* Hooks for congestion control algorithms */ +struct quic_cong_ops { + void (*on_packet_acked)(struct quic_cong *cong, u64 time, u32 bytes, s64 number); + void (*on_packet_lost)(struct quic_cong *cong, u64 time, u32 bytes, s64 number); + void (*on_process_ecn)(struct quic_cong *cong); + void (*on_init)(struct quic_cong *cong); + + /* Optional callbacks */ + void (*on_packet_sent)(struct quic_cong *cong, u64 time, u32 bytes, s64 number); + void (*on_ack_recv)(struct quic_cong *cong, u32 bytes, u64 max_rate); + void (*on_rtt_update)(struct quic_cong *cong); +}; + +static inline void quic_cong_set_mss(struct quic_cong *cong, u32 mss) +{ + if (cong->mss == mss) + return; + + /* rfc9002#section-7.2: Initial and Minimum Congestion Window */ + cong->mss = mss; + cong->min_window = max(min(mss * 10, 14720U), mss * 2); + + if (cong->window < cong->min_window) + cong->window = cong->min_window; +} + +static inline void *quic_cong_priv(struct quic_cong *cong) +{ + return (void *)cong->priv; +} + +void quic_cong_on_packet_acked(struct quic_cong *cong, u64 time, u32 bytes, s64 number); +void quic_cong_on_packet_lost(struct quic_cong *cong, u64 time, u32 bytes, s64 number); +void quic_cong_on_process_ecn(struct quic_cong *cong); + +void quic_cong_on_packet_sent(struct quic_cong *cong, u64 time, u32 bytes, s64 number); +void quic_cong_on_ack_recv(struct quic_cong *cong, u32 bytes, u64 max_rate); +void quic_cong_rtt_update(struct quic_cong *cong, u64 time, u32 ack_delay); + +void quic_cong_set_srtt(struct quic_cong *cong, u32 srtt); +void quic_cong_set_algo(struct quic_cong *cong, u8 algo); +void quic_cong_init(struct quic_cong *cong); diff --git a/net/quic/socket.c b/net/quic/socket.c index d135f24c175a..46f1df978604 100644 --- a/net/quic/socket.c +++ b/net/quic/socket.c @@ -43,6 +43,7 @@ static int quic_init_sock(struct sock *sk) quic_conn_id_set_init(quic_source(sk), 1); quic_conn_id_set_init(quic_dest(sk), 0); + quic_cong_init(quic_cong(sk)); if (quic_stream_init(quic_streams(sk))) return -ENOMEM; diff --git a/net/quic/socket.h b/net/quic/socket.h index 0553caaa0237..c5684cf7378d 100644 --- a/net/quic/socket.h +++ b/net/quic/socket.h @@ -16,6 +16,7 @@ #include "stream.h" #include "connid.h" #include "path.h" +#include "cong.h" #include "protocol.h" @@ -42,6 +43,7 @@ struct quic_sock { struct quic_conn_id_set source; struct quic_conn_id_set dest; struct quic_path_group paths; + struct quic_cong cong; }; struct quic6_sock { @@ -104,6 +106,11 @@ static inline bool quic_is_serv(const struct sock *sk) return !!sk->sk_max_ack_backlog; } +static inline struct quic_cong *quic_cong(const struct sock *sk) +{ + return &quic_sk(sk)->cong; +} + static inline bool quic_is_establishing(struct sock *sk) { return sk->sk_state == QUIC_SS_ESTABLISHING; -- 2.47.1