Christian Hopps writes: > [[PGP Signed Part:Good signature from 2E1D830ED7B83025 Christian Hopps (trust ultimate) created at 2024-07-17T23:02:33-0700 using RSA]] > > Simon Horman via Devel writes: > >> On Sun, Jul 14, 2024 at 04:22:39PM -0400, Christian Hopps wrote: >>> From: Christian Hopps >>> >>> Add support for tunneling user (inner) packets that are larger than the >>> tunnel's path MTU (outer) using IP-TFS fragmentation. >>> >>> Signed-off-by: Christian Hopps >>> --- >>> net/xfrm/xfrm_iptfs.c | 401 +++++++++++++++++++++++++++++++++++++++--- >>> 1 file changed, 375 insertions(+), 26 deletions(-) >>> >>> diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c >> >> ... >> >>> +static int iptfs_copy_create_frags(struct sk_buff **skbp, >>> + struct xfrm_iptfs_data *xtfs, u32 mtu) >>> +{ >>> + struct skb_seq_state skbseq; >>> + struct list_head sublist; >>> + struct sk_buff *skb = *skbp; >>> + struct sk_buff *nskb = *skbp; >>> + u32 copy_len, offset; >>> + u32 to_copy = skb->len - mtu; >>> + u32 blkoff = 0; >>> + int err = 0; >>> + >>> + INIT_LIST_HEAD(&sublist); >>> + >>> + BUG_ON(skb->len <= mtu); >>> + skb_prepare_seq_read(skb, 0, skb->len, &skbseq); >>> + >>> + /* A trimmed `skb` will be sent as the first fragment, later. */ >>> + offset = mtu; >>> + to_copy = skb->len - offset; >>> + while (to_copy) { >>> + /* Send all but last fragment to allow agg. append */ >>> + list_add_tail(&nskb->list, &sublist); >>> + >>> + /* FUTURE: if the packet has an odd/non-aligning length we could >>> + * send less data in the penultimate fragment so that the last >>> + * fragment then ends on an aligned boundary. >>> + */ >>> + copy_len = to_copy <= mtu ? to_copy : mtu; >> >> nit: this looks like it could be expressed using min() >> >> Flagged by Coccinelle > > Changed. > >> >>> + nskb = iptfs_copy_create_frag(&skbseq, offset, copy_len); >>> + if (IS_ERR(nskb)) { >>> + XFRM_INC_STATS(dev_net(skb->dev), >>> + LINUX_MIB_XFRMOUTERROR); >>> + skb_abort_seq_read(&skbseq); >>> + err = PTR_ERR(nskb); >>> + nskb = NULL; >>> + break; >>> + } >>> + iptfs_output_prepare_skb(nskb, to_copy); >>> + offset += copy_len; >>> + to_copy -= copy_len; >>> + blkoff = to_copy; >> >> blkoff is set but otherwise unused in this function. >> >> Flagged by W=1 x86_64 allmodconfig builds with gcc-14 and clang 18. > > This value is used in a trace point call in this function. Moved to the later tracepoint layered commit. Thanks, Chris. > >> >>> + } >>> + skb_abort_seq_read(&skbseq); >>> + >>> + /* return last fragment that will be unsent (or NULL) */ >>> + *skbp = nskb; >>> + >>> + /* trim the original skb to MTU */ >>> + if (!err) >>> + err = pskb_trim(skb, mtu); >>> + >>> + if (err) { >>> + /* Free all frags. Don't bother sending a partial packet we will >>> + * never complete. >>> + */ >>> + kfree_skb(nskb); >>> + list_for_each_entry_safe(skb, nskb, &sublist, list) { >>> + skb_list_del_init(skb); >>> + kfree_skb(skb); >>> + } >>> + return err; >>> + } >>> + >>> + /* prepare the initial fragment with an iptfs header */ >>> + iptfs_output_prepare_skb(skb, 0); >>> + >>> + /* Send all but last fragment, if we fail to send a fragment then free >>> + * the rest -- no point in sending a packet that can't be reassembled. >>> + */ >>> + list_for_each_entry_safe(skb, nskb, &sublist, list) { >>> + skb_list_del_init(skb); >>> + if (!err) >>> + err = xfrm_output(NULL, skb); >>> + else >>> + kfree_skb(skb); >>> + } >>> + if (err) >>> + kfree_skb(*skbp); >>> + return err; >>> +} >>> + >>> +/** >>> + * iptfs_first_should_copy() - determine if we should copy packet data. >>> + * @first_skb: the first skb in the packet >>> + * @mtu: the MTU. >>> + * >>> + * Determine if we should create subsequent skbs to hold the remaining data from >>> + * a large inner packet by copying the packet data, or cloning the original skb >>> + * and adjusting the offsets. >>> + */ >>> +static bool iptfs_first_should_copy(struct sk_buff *first_skb, u32 mtu) >>> +{ >>> + u32 frag_copy_max; >>> + >>> + /* If we have less than frag_copy_max for remaining packet we copy >>> + * those tail bytes as it is more efficient. >>> + */ >>> + frag_copy_max = mtu <= IPTFS_FRAG_COPY_MAX ? mtu : IPTFS_FRAG_COPY_MAX; >> >> Likewise, it looks like min could be used here too. > > Changed. > > Thanks! > Chris. >> >>> + if ((int)first_skb->len - (int)mtu < (int)frag_copy_max) >>> + return true; >>> + >>> + /* If we have non-linear skb just use copy */ >>> + if (skb_is_nonlinear(first_skb)) >>> + return true; >>> + >>> + /* So we have a simple linear skb, easy to clone and share */ >>> + return false; >>> +} >> >> ... > > [[End of PGP Signed Part]] a