public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay•com>
To: Vitaly Mayatskikh <v.mayatskih@gmail•com>
Cc: David Miller <davem@davemloft•net>, netdev@vger•kernel.org
Subject: Re: speed regression in udp_lib_lport_inuse()
Date: Fri, 23 Jan 2009 12:45:14 +0100	[thread overview]
Message-ID: <4979ADCA.2080106@cosmosbay.com> (raw)
In-Reply-To: <m3bpty1hq7.wl%vmayatsk@redhat.com>

Vitaly Mayatskikh a écrit :
> At Fri, 23 Jan 2009 01:14:58 +0100, Eric Dumazet wrote:
> 
>> Could you try following patch ?
>>
>> Thank you
>>
>> [PATCH] udp: optimize bind(0) if many ports are in use
>>
>> commit 9088c5609584684149f3fb5b065aa7f18dcb03ff
>> (udp: Improve port randomization) introduced a regression for UDP bind() syscall
>> to null port (getting a random port) in case lot of ports are already in use.
>>
>> This is because we do about 28000 scans of very long chains (220 sockets per chain),
>> with many spin_lock_bh()/spin_unlock_bh() calls.
>>
>> Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
>> so that we scan chains at most once.
>>
>> Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms 
> 
> It's much better, thanks. FPS in glxgears now drops down only 2x
> harder if compare with 2.6.28. However, this again kills randomness :)
> Now number distribution is k*x^2 with x-axis zero in the (high - low)
> / 2. Try this program, it produces input file for Octave + Gnuplot.
> 
> #include <stdio.h>
> #include <errno.h>
> #include <string.h>
> #include <sys/types.h>
> #include <sys/socket.h>
> #include <netinet/in.h>
> 
> #define PORTS 65536
> 
> int main()
> {
> 	int s, err, i, j;
> 	char buf[256];
> 	struct sockaddr_in sa;
> 	int optval = 1, port;
> 	unsigned int p[PORTS] = { 0 };
> 
> 	for (i = 0; i < PORTS * 100; ++i) {
> 		s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
> 		memset(&sa, 0, sizeof(sa));
> 		sa.sin_addr.s_addr = htonl(INADDR_ANY);
> 		sa.sin_family = AF_INET;
> 		sa.sin_port = 0;
> 		setsockopt(s, IPPROTO_UDP, SO_REUSEADDR, &optval, sizeof(optval));
> 		err = bind(s, (const struct sockaddr*)&sa, sizeof(sa));
> 		getsockname(s, (struct sockaddr*)&sa, &j);
> 		port = ntohs(sa.sin_port);
> 		p[port]++;
> 		close(s);
> 	}
> 	printf("x = 32766:1:65535;\ny = [-100; ");
> 	for (i = 32767; i < PORTS; i++)
> 		printf("%d%s", p[i], (i + 1 < PORTS ? "; " : ""));
> 	printf("];\nplot(x,y,'.');pause;");
> }
> 
> I was thinking about bitmap also, but in a bit different approach. It
> is also uses bias (delta) value instead of exact port number. When we
> get next random port value (from rng or in the next iteration), we can
> calculate byte offset in that bitmap:
> 
>     A        B        C        D
> 76543210 76543210 7654321 076543210
> 11110111 11011110 1001111 011110111
>             ^
> 
> We here land in the byte B in the marked bit position, but it is
> already busy. If there're any free bits in this byte B, we can stop
> further iterations and use any free bit. I don't think it can kill
> randomness too much, because average bias will be small. May be it
> only needs some more complicated logic for searching free bit in the
> byte, because it's not good to do scanning always from the beginning.
> 

Interesting... Please note I dont search in the bitmap from its begining,
but from a random point.

Maybe we should study lib/random32.c and discover it has said distribution :)

Since my algo uses net_random() (random32() to get 32 bits number, that we
split in two "16 bits numbers" (bias & rand).

One (bias) to select the starting chain and starting slot in chain (so it really should be random)
first = bias + 0  (slotn=0)

One (rand, forced to be odd) to select next slot in chain in case current slot is already in use.
I feel this is the problem, because when we hit a slot outside of ip_local_port_range, it seems we
escape from this range with the distribution you got. maybe we should get rand depending on ip_local_port_range




  reply	other threads:[~2009-01-23 12:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-22 18:49 speed regression in udp_lib_lport_inuse() Vitaly Mayatskikh
2009-01-22 22:06 ` Eric Dumazet
2009-01-22 22:14   ` Evgeniy Polyakov
2009-01-23  0:20     ` Eric Dumazet
2009-01-22 22:40   ` Vitaly Mayatskikh
2009-01-23  0:14     ` Eric Dumazet
2009-01-23  9:42       ` Vitaly Mayatskikh
2009-01-23 11:45         ` Eric Dumazet [this message]
2009-01-23 13:44           ` Eric Dumazet
2009-01-23 14:56             ` Vitaly Mayatskikh
2009-01-23 16:05               ` Eric Dumazet
2009-01-23 16:14                 ` Vitaly Mayatskikh
2009-01-26  8:20             ` [PATCH] udp: optimize bind(0) if many ports are in use Eric Dumazet
2009-01-27  5:35               ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4979ADCA.2080106@cosmosbay.com \
    --to=dada1@cosmosbay$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=v.mayatskih@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox