From: Brian Bloniarz <bmb@athenacr•com>
To: Eric Dumazet <eric.dumazet@gmail•com>
Cc: David Miller <davem@davemloft•net>,
therbert@google•com, netdev@vger•kernel.org, rick.jones2@hp•com
Subject: Re: [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account
Date: Wed, 28 Apr 2010 11:41:12 -0400 [thread overview]
Message-ID: <4BD85718.5000404@athenacr.com> (raw)
In-Reply-To: <1272399662.2343.12.camel@edumazet-laptop>
Eric Dumazet wrote:
> Le mardi 27 avril 2010 à 19:37 +0200, Eric Dumazet a écrit :
>
>> We might use the ticket spinlock paradigm to let writers go in parallel
>> and let the user the socket lock
>>
>> Instead of having the bh_lock_sock() to protect receive_queue *and*
>> backlog, writers get a unique slot in a table, that 'user' can handle
>> later.
>>
>> Or serialize writers (before they try to bh_lock_sock()) with a
>> dedicated lock, so that user has 50% chances to get the sock lock,
>> contending with at most one writer.
>
> Following patch fixes the issue for me, with little performance hit on
> fast path.
>
> Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
> receiver can now process ~200.000 pps (instead of ~100 pps before the
> patch) on my dev machine.
>
> Thanks !
>
> [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account
>
> Current socket backlog limit is not enough to really stop DDOS attacks,
> because user thread spend many time to process a full backlog each
> round, and user might crazy spin on socket lock.
>
> We should add backlog size and receive_queue size (aka rmem_alloc) to
> pace writers, and let user run without being slow down too much.
>
> Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
> stress situations.
>
> Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
> receiver can now process ~200.000 pps (instead of ~100 pps before the
> patch) on a 8 core machine.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail•com>
Wow that was awesome.
> ---
> include/net/sock.h | 13 +++++++++++--
> net/core/sock.c | 5 ++++-
> net/ipv4/udp.c | 4 ++++
> net/ipv6/udp.c | 8 ++++++++
> 4 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 86a8ca1..4b0097d 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -255,7 +255,6 @@ struct sock {
> struct sk_buff *head;
> struct sk_buff *tail;
> int len;
> - int limit;
> } sk_backlog;
> wait_queue_head_t *sk_sleep;
> struct dst_entry *sk_dst_cache;
> @@ -604,10 +603,20 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
> skb->next = NULL;
> }
>
> +/*
> + * Take into account size of receive queue and backlog queue
> + */
> +static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb)
> +{
> + unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc);
> +
> + return qsize + skb->truesize > sk->sk_rcvbuf;
> +}
> +
Reading sk_backlog.len without the socket lock held seems to
contradict the comment in sock.h:
* @sk_backlog: always used with the per-socket spinlock held
...
struct sock {
...
/*
* The backlog queue is special, it is always used with
* the per-socket spinlock held and requires low latency
* access. Therefore we special case it's implementation.
*/
struct { ... } sk_backlog;
Is this just a doc mismatch or does sk_backlog.len need to use
atomic_read & atomic_set?
> /* The per-socket spinlock must be held here. */
> static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *skb)
> {
> - if (sk->sk_backlog.len >= max(sk->sk_backlog.limit, sk->sk_rcvbuf << 1))
> + if (sk_rcvqueues_full(sk, skb))
> return -ENOBUFS;
>
> __sk_add_backlog(sk, skb);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 58ebd14..5104175 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -327,6 +327,10 @@ int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested)
>
> skb->dev = NULL;
>
> + if (sk_rcvqueues_full(sk, skb)) {
> + atomic_inc(&sk->sk_drops);
> + goto discard_and_relse;
> + }
> if (nested)
> bh_lock_sock_nested(sk);
> else
> @@ -1885,7 +1889,6 @@ void sock_init_data(struct socket *sock, struct sock *sk)
> sk->sk_allocation = GFP_KERNEL;
> sk->sk_rcvbuf = sysctl_rmem_default;
> sk->sk_sndbuf = sysctl_wmem_default;
> - sk->sk_backlog.limit = sk->sk_rcvbuf << 1;
> sk->sk_state = TCP_CLOSE;
> sk_set_socket(sk, sock);
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 1e18f9c..776c844 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1372,6 +1372,10 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> goto drop;
> }
>
> +
> + if (sk_rcvqueues_full(sk, skb))
> + goto drop;
> +
> rc = 0;
>
> bh_lock_sock(sk);
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 2850e35..3ead20a 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -584,6 +584,10 @@ static void flush_stack(struct sock **stack, unsigned int count,
>
> sk = stack[i];
> if (skb1) {
> + if (sk_rcvqueues_full(sk, skb)) {
> + kfree_skb(skb1);
> + goto drop;
> + }
> bh_lock_sock(sk);
> if (!sock_owned_by_user(sk))
> udpv6_queue_rcv_skb(sk, skb1);
> @@ -759,6 +763,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
>
> /* deliver */
>
> + if (sk_rcvqueues_full(sk, skb)) {
> + sock_put(sk);
> + goto discard;
> + }
> bh_lock_sock(sk);
> if (!sock_owned_by_user(sk))
> udpv6_queue_rcv_skb(sk, skb);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger•kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-04-28 15:41 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-23 5:54 [PATCH] bnx2x: add support for receive hashing Tom Herbert
2010-04-23 7:11 ` David Miller
2010-04-26 17:20 ` Eric Dumazet
2010-04-26 17:38 ` Tom Herbert
2010-04-26 17:47 ` Ben Hutchings
2010-04-26 17:56 ` David Miller
2010-04-26 18:04 ` David Miller
2010-04-26 18:19 ` Tom Herbert
2010-04-26 18:22 ` David Miller
2010-04-26 20:19 ` Rick Jones
2010-04-26 20:40 ` David Miller
2010-04-26 20:48 ` Rick Jones
2010-04-26 20:53 ` David Miller
2010-04-26 21:12 ` Rick Jones
2010-04-27 18:31 ` Eilon Greenstein
2010-04-27 19:30 ` Eric Dumazet
2010-04-26 20:58 ` Stephen Hemminger
2010-04-26 21:11 ` jamal
2010-04-26 21:14 ` jamal
2010-04-26 23:27 ` Brian Bloniarz
2010-04-27 13:37 ` Brian Bloniarz
2010-04-27 14:30 ` Eric Dumazet
2010-04-27 15:44 ` Brian Bloniarz
2010-04-27 16:51 ` David Miller
2010-04-27 17:02 ` Brian Bloniarz
2010-04-27 17:06 ` David Miller
2010-04-27 17:13 ` Eric Dumazet
2010-04-27 17:20 ` David Miller
2010-04-27 17:37 ` Eric Dumazet
2010-04-27 20:21 ` [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account Eric Dumazet
2010-04-27 21:43 ` David Miller
2010-04-27 22:11 ` David Miller
2010-04-27 22:19 ` Eric Dumazet
2010-04-28 15:41 ` Brian Bloniarz [this message]
2010-04-28 15:52 ` Eric Dumazet
2010-04-27 17:31 ` [PATCH] bnx2x: add support for receive hashing Tom Herbert
2010-04-27 17:36 ` Eric Dumazet
2010-07-04 16:36 ` Vladislav Zolotarov
2010-07-04 16:46 ` Vladislav Zolotarov
2010-07-06 7:16 ` Vladislav Zolotarov
2010-07-07 19:17 ` Tom Herbert
2010-07-08 8:40 ` Vladislav Zolotarov
2010-07-11 2:12 ` David Miller
2010-07-11 10:02 ` Vladislav Zolotarov
2010-07-11 13:16 ` Vladislav Zolotarov
2010-07-11 16:45 ` Eric Dumazet
2010-07-11 17:22 ` Vladislav Zolotarov
2010-07-11 17:41 ` Eric Dumazet
2010-07-11 18:18 ` Vladislav Zolotarov
2010-07-11 22:34 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BD85718.5000404@athenacr.com \
--to=bmb@athenacr$(echo .)com \
--cc=davem@davemloft$(echo .)net \
--cc=eric.dumazet@gmail$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=rick.jones2@hp$(echo .)com \
--cc=therbert@google$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox