public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare•com>
To: John Fastabend <john.fastabend@gmail•com>
Cc: daniel@iogearbox•net, lmb@isovalent•com, edumazet@google•com,
	bpf@vger•kernel.org, netdev@vger•kernel.org, ast@kernel•org,
	andrii@kernel•org, will@isovalent•com
Subject: Re: [PATCH bpf v7 08/13] bpf: sockmap, incorrectly handling copied_seq
Date: Fri, 05 May 2023 14:14:12 +0200	[thread overview]
Message-ID: <87zg6jvtnx.fsf@cloudflare.com> (raw)
In-Reply-To: <20230502155159.305437-9-john.fastabend@gmail.com>

On Tue, May 02, 2023 at 08:51 AM -07, John Fastabend wrote:
> The read_skb() logic is incrementing the tcp->copied_seq which is used for
> among other things calculating how many outstanding bytes can be read by
> the application. This results in application errors, if the application
> does an ioctl(FIONREAD) we return zero because this is calculated from
> the copied_seq value.
>
> To fix this we move tcp->copied_seq accounting into the recv handler so
> that we update these when the recvmsg() hook is called and data is in
> fact copied into user buffers. This gives an accurate FIONREAD value
> as expected and improves ACK handling. Before we were calling the
> tcp_rcv_space_adjust() which would update 'number of bytes copied to
> user in last RTT' which is wrong for programs returning SK_PASS. The
> bytes are only copied to the user when recvmsg is handled.
>
> Doing the fix for recvmsg is straightforward, but fixing redirect and
> SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then
> call this from skmsg handlers. This fixes another issue where a broken
> socket with a BPF program doing a resubmit could hang the receiver. This
> happened because although read_skb() consumed the skb through sock_drop()
> it did not update the copied_seq. Now if a single reccv socket is
> redirecting to many sockets (for example for lb) the receiver sk will be
> hung even though we might expect it to continue. The hang comes from
> not updating the copied_seq numbers and memory pressure resulting from
> that.
>
> We have a slight layer problem of calling tcp_eat_skb even if its not
> a TCP socket. To fix we could refactor and create per type receiver
> handlers. I decided this is more work than we want in the fix and we
> already have some small tweaks depending on caller that use the
> helper skb_bpf_strparser(). So we extend that a bit and always set
> the strparser bit when it is in use and then we can gate the
> seq_copied updates on this.
>
> Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
> Signed-off-by: John Fastabend <john.fastabend@gmail•com>
> ---
>  include/net/tcp.h  | 10 ++++++++++
>  net/core/skmsg.c   |  7 +++++--
>  net/ipv4/tcp.c     | 10 +---------
>  net/ipv4/tcp_bpf.c | 28 +++++++++++++++++++++++++++-
>  4 files changed, 43 insertions(+), 12 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index db9f828e9d1e..76bf0a11bdc7 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1467,6 +1467,8 @@ static inline void tcp_adjust_rcv_ssthresh(struct sock *sk)
>  }
>  
>  void tcp_cleanup_rbuf(struct sock *sk, int copied);
> +void __tcp_cleanup_rbuf(struct sock *sk, int copied);
> +
>  
>  /* We provision sk_rcvbuf around 200% of sk_rcvlowat.
>   * If 87.5 % (7/8) of the space has been consumed, we want to override
> @@ -2323,6 +2325,14 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore);
>  void tcp_bpf_clone(const struct sock *sk, struct sock *newsk);
>  #endif /* CONFIG_BPF_SYSCALL */
>  
> +#ifdef CONFIG_INET
> +void tcp_eat_skb(struct sock *sk, struct sk_buff *skb);
> +#else
> +static inline void tcp_eat_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +}
> +#endif
> +
>  int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress,
>  			  struct sk_msg *msg, u32 bytes, int flags);
>  #endif /* CONFIG_NET_SOCK_MSG */
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 3c0663f5cc3e..18c4f4015559 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1017,11 +1017,14 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
>  		}
>  		break;
>  	case __SK_REDIRECT:
> +		tcp_eat_skb(psock->sk, skb);
>  		err = sk_psock_skb_redirect(psock, skb);
>  		break;
>  	case __SK_DROP:
>  	default:
>  out_free:
> +		tcp_eat_skb(psock->sk, skb);
> +		skb_bpf_redirect_clear(skb);
>  		sock_drop(psock->sk, skb);
>  	}
>  

I have a feeling you wanted to factor out the common
skb_bpf_redirect_clear() into out_free: block, but maybe forgot to
update the jump sites?

[...]

  reply	other threads:[~2023-05-05 12:39 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-02 15:51 [PATCH bpf v7 00/13] bpf sockmap fixes John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 01/13] bpf: sockmap, pass skb ownership through read_skb John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 02/13] bpf: sockmap, convert schedule_work into delayed_work John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 03/13] bpf: sockmap, reschedule is now done through backlog John Fastabend
2023-05-03  9:49   ` Jakub Sitnicki
2023-05-02 15:51 ` [PATCH bpf v7 04/13] bpf: sockmap, improved check for empty queue John Fastabend
2023-05-04 16:53   ` Jakub Sitnicki
2023-05-04 17:42     ` John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 05/13] bpf: sockmap, handle fin correctly John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 06/13] bpf: sockmap, TCP data stall on recv before accept John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 07/13] bpf: sockmap, wake up polling after data copy John Fastabend
2023-05-02 15:51 ` [PATCH bpf v7 08/13] bpf: sockmap, incorrectly handling copied_seq John Fastabend
2023-05-05 12:14   ` Jakub Sitnicki [this message]
2023-05-02 15:51 ` [PATCH bpf v7 09/13] bpf: sockmap, pull socket helpers out of listen test for general use John Fastabend
2023-05-05 17:38   ` Jakub Sitnicki
2023-05-02 15:51 ` [PATCH bpf v7 10/13] bpf: sockmap, build helper to create connected socket pair John Fastabend
2023-05-05 17:39   ` Jakub Sitnicki
2023-05-02 15:51 ` [PATCH bpf v7 11/13] bpf: sockmap, test shutdown() correctly exits epoll and recv()=0 John Fastabend
2023-05-08 11:04   ` Jakub Sitnicki
2023-05-16  1:51     ` John Fastabend
2023-05-16 13:41       ` Jakub Sitnicki
2023-05-02 15:51 ` [PATCH bpf v7 12/13] bpf: sockmap, test FIONREAD returns correct bytes in rx buffer John Fastabend
2023-05-08 11:19   ` Jakub Sitnicki
2023-05-02 15:51 ` [PATCH bpf v7 13/13] bpf: sockmap, test FIONREAD returns correct bytes in rx buffer with drops John Fastabend
2023-05-08 11:34   ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zg6jvtnx.fsf@cloudflare.com \
    --to=jakub@cloudflare$(echo .)com \
    --cc=andrii@kernel$(echo .)org \
    --cc=ast@kernel$(echo .)org \
    --cc=bpf@vger$(echo .)kernel.org \
    --cc=daniel@iogearbox$(echo .)net \
    --cc=edumazet@google$(echo .)com \
    --cc=john.fastabend@gmail$(echo .)com \
    --cc=lmb@isovalent$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=will@isovalent$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox