public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Patrick McHardy <kaber@trash•net>
To: Pablo Neira Ayuso <pablo@netfilter•org>
Cc: netdev@vger•kernel.org, davem@davemloft•net, eric.dumazet@gmail•com
Subject: Re: [PATCHv2 net-next] netlink: allow large data transfers from user-space
Date: Mon, 3 Jun 2013 19:01:37 +0200	[thread overview]
Message-ID: <20130603170136.GA23920@macbook.localnet> (raw)
In-Reply-To: <1370277599-27072-1-git-send-email-pablo@netfilter.org>

On Mon, Jun 03, 2013 at 06:39:59PM +0200, Pablo Neira Ayuso wrote:
> I can hit ENOBUFS in the sendmsg() path with a large batch that is
> composed of many netlink messages. Here that limit is 8 MBytes of
> skbuff data area as kmalloc does not manage to get more than that.
> 
> While discussing atomic rule-set for nftables with Patrick McHardy,
> we decided to put all rule-set updates that need to be applied
> atomically in one single batch to simplify the existing approach.
> However, as explained above, the existing netlink code limits us
> to a maximum of ~20000 rules that fit in one single batch without
> hitting ENOBUFS. iptables does not have such limitation as it is
> using vmalloc.
> 
> This patch adds netlink_alloc_large_skb() which is only used in
> the netlink_sendmsg() path. It uses alloc_skb if the memory
> requested is <= one memory page, that should be the common case
> for most subsystems, else vmalloc for higher memory allocations.

I know I suggested to do this - just wondering right now, how will we
indiciate to userspace that a change has been applied atomically when
sending notifications? Not sure whether it matters unless userspace
will be able to get a dump while we're in the middle of updating
the ruleset. I guess that won't be possible, right?

> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter•org>
> ---
> v1: initial version
> v2: Use NLMSG_GOODSIZE instead of PAGE_SIZE, suggested by Eric Dumazet.
> 
>  net/netlink/af_netlink.c |   37 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 12ac6b4..7c71d07 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -750,6 +750,10 @@ static void netlink_skb_destructor(struct sk_buff *skb)
>  		skb->data = NULL;
>  	}
>  #endif
> +	if (is_vmalloc_addr(skb->head)) {
> +		vfree(skb->head);
> +		skb->data = NULL;
> +	}
>  	if (skb->sk != NULL)
>  		sock_rfree(skb);
>  }
> @@ -1420,6 +1424,35 @@ struct sock *netlink_getsockbyfilp(struct file *filp)
>  	return sock;
>  }
>  
> +static struct sk_buff *netlink_alloc_large_skb(unsigned int size)
> +{
> +	struct sk_buff *skb;
> +	void *data;
> +
> +	if (size <= NLMSG_GOODSIZE)
> +		return alloc_skb(size, GFP_KERNEL);
> +
> +	skb = alloc_skb_head(GFP_KERNEL);
> +	if (skb == NULL)
> +		return NULL;
> +
> +	data = vmalloc(size);
> +	if (data == NULL)
> +		goto err;
> +
> +	skb->head	= data;
> +	skb->data	= data;
> +	skb_reset_tail_pointer(skb);
> +	skb->end	= skb->tail + size;
> +	skb->len	= 0;
> +	skb->destructor = netlink_skb_destructor;
> +
> +	return skb;
> +err:
> +	kfree_skb(skb);
> +	return NULL;
> +}
> +
>  /*
>   * Attach a skb to a netlink socket.
>   * The caller must hold a reference to the destination socket. On error, the
> @@ -1510,7 +1543,7 @@ static struct sk_buff *netlink_trim(struct sk_buff *skb, gfp_t allocation)
>  		return skb;
>  
>  	delta = skb->end - skb->tail;
> -	if (delta * 2 < skb->truesize)
> +	if (is_vmalloc_addr(skb->head) || delta * 2 < skb->truesize)
>  		return skb;
>  
>  	if (skb_shared(skb)) {
> @@ -2096,7 +2129,7 @@ static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  	if (len > sk->sk_sndbuf - 32)
>  		goto out;
>  	err = -ENOBUFS;
> -	skb = alloc_skb(len, GFP_KERNEL);
> +	skb = netlink_alloc_large_skb(len);
>  	if (skb == NULL)
>  		goto out;
>  
> -- 
> 1.7.10.4

  reply	other threads:[~2013-06-03 17:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-03 16:39 [PATCHv2 net-next] netlink: allow large data transfers from user-space Pablo Neira Ayuso
2013-06-03 17:01 ` Patrick McHardy [this message]
2013-06-03 17:29   ` Pablo Neira Ayuso
2013-06-03 17:12 ` Eric Dumazet
2013-06-03 17:41   ` Pablo Neira Ayuso
2013-06-03 18:00     ` Eric Dumazet
2013-06-03 19:21       ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130603170136.GA23920@macbook.localnet \
    --to=kaber@trash$(echo .)net \
    --cc=davem@davemloft$(echo .)net \
    --cc=eric.dumazet@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=pablo@netfilter$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox