* network load 8245 vs. 8347E
@ 2007-06-21 14:15 Marc Leeman
2007-06-21 15:33 ` Kumar Gala
0 siblings, 1 reply; 12+ messages in thread
From: Marc Leeman @ 2007-06-21 14:15 UTC (permalink / raw)
To: linuxppc-dev
[-- Attachment #1.1: Type: text/plain, Size: 804 bytes --]
Ok, I guess it's comparing apples to lemons here, but I'll have a go at
it anyway.
I'm trying to figure out why partial network decoding on an 8347E is
disappointingly slow wrt an older 8245 processor.
When simply receiving (cf. att) a multicast stream of 12 Mbps, an
8245/uclibc 0.9.28 @350 MHz, ppc arch , the system runs smoothly at a
load of around 4% on a e100 based MAC (pci: 8086:1209 ).
When doing the same thing on 8347e/0.9.28 @400 Mhz, powerpc arch, the
system is loaded at around 34%.
any clues?
This program just takes in data and does nothing with it, to limit the
search area :)
--
greetz, marc
You know until today, I never really realized how much I love my feet.
Chiana - Vitas Mortis
chiana 2.6.18-4-ixp4xx #1 Tue Mar 27 18:01:56 BST 2007 GNU/Linux
[-- Attachment #1.2: recv_mcast.c --]
[-- Type: text/x-csrc, Size: 3067 bytes --]
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUFSIZE 64*1024 /* Maximal packet content is 64k bytes */
#define MAXPARAMLEN 80
void error(char *msg);
int main(int argc, char *argv[])
{
int socket_val, bind_val, recvfromval, ctrlboardlen, cameralen,
i, rc, recvbuff, optlen;
u_short ctrlboardPort;
struct ip_mreq mreq;
struct sockaddr_in ctrlboard, camera;
struct in_addr mcast_address;
struct hostent *h;
unsigned char buffer[BUFSIZE], ctrlboardip[16], sendseq[11];
FILE *fc,*fp;
/* Initialize the buffer */
bzero(buffer,BUFSIZE);
if(argc!=3) {
fprintf(stdout,"usage : %s <mcast address> <mcast port>\n",argv[0]);
exit(0);
}
/* Get mcast address to listen to */
h=gethostbyname(argv[1]);
if(h==NULL) {
fprintf(stdout,"Unknown group %s\n",ctrlboardip);
exit(1);
}
ctrlboardPort=atoi(argv[2]);
memcpy(&mcast_address, h->h_addr_list[0],h->h_length);
/* Check given address is multicast */
if(!IN_MULTICAST(ntohl(mcast_address.s_addr))) {
fprintf(stdout,"Given address '%s' is not multicast\n",
inet_ntoa(mcast_address));
exit(1);
}
/* Create socket for incoming connections */
socket_val=socket(AF_INET, SOCK_DGRAM, 0);
if (socket_val < 0)
error("Error opening socket");
/* Set content of ctrlboard */
ctrlboardlen = sizeof(ctrlboard);
bzero(&ctrlboard,ctrlboardlen);
/* Fill in the UDP Receiver properties */
ctrlboard.sin_family=AF_INET;
ctrlboard.sin_addr.s_addr=htonl(INADDR_ANY);
ctrlboard.sin_port=htons(ctrlboardPort);
/* Set socket options */
recvbuff=128*1024;
if (setsockopt(socket_val,SOL_SOCKET,SO_RCVBUF,(char *) &recvbuff, sizeof(recvbuff)) < 0)
error("Error setting socket options");
/* Get socket options */
optlen=sizeof(recvbuff);
if (getsockopt(socket_val,SOL_SOCKET,SO_RCVBUF,(char *) &recvbuff, &optlen) < 0)
error("Error getting socket options");
fprintf(stdout,"Receive buffer size: %d \n",recvbuff);
/* Bind to associate port number with the socket */
bind_val = bind(socket_val,(struct sockaddr *)&ctrlboard,ctrlboardlen);
if (bind_val < 0)
error("Error bind");
/* join multicast group */
mreq.imr_multiaddr.s_addr=mcast_address.s_addr;
mreq.imr_interface.s_addr=htonl(INADDR_ANY);
rc = setsockopt(socket_val,IPPROTO_IP,IP_ADD_MEMBERSHIP, (void *) &mreq, sizeof(mreq));
if(rc<0)
{
fprintf(stdout,"Cannot join multicast group '%s'", inet_ntoa(mcast_address));
exit(1);
}
else
fprintf(stdout,"Listening to mgroup %s:%d\n", inet_ntoa(mcast_address), ctrlboardPort);
/* Fill in length of struct sockaddr_in camera */
cameralen = sizeof(camera);
/* Loop */
while (1) {
recvfromval = recvfrom(socket_val,buffer,BUFSIZE,0,(struct sockaddr *)&camera,&cameralen);
if (recvfromval < 0) error("Error recvfrom");
}
close(socket_val);
fclose(fp);
}
void error(char *msg) {
perror(msg);
exit(0);
}
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: network load 8245 vs. 8347E
2007-06-21 14:15 network load 8245 vs. 8347E Marc Leeman
@ 2007-06-21 15:33 ` Kumar Gala
2007-06-21 15:56 ` Marc Leeman
0 siblings, 1 reply; 12+ messages in thread
From: Kumar Gala @ 2007-06-21 15:33 UTC (permalink / raw)
To: Marc Leeman; +Cc: linuxppc-dev
On Jun 21, 2007, at 9:15 AM, Marc Leeman wrote:
> Ok, I guess it's comparing apples to lemons here, but I'll have a
> go at
> it anyway.
>
> I'm trying to figure out why partial network decoding on an 8347E is
> disappointingly slow wrt an older 8245 processor.
>
> When simply receiving (cf. att) a multicast stream of 12 Mbps, an
> 8245/uclibc 0.9.28 @350 MHz, ppc arch , the system runs smoothly at a
> load of around 4% on a e100 based MAC (pci: 8086:1209 ).
>
> When doing the same thing on 8347e/0.9.28 @400 Mhz, powerpc arch, the
> system is loaded at around 34%.
>
> any clues?
How are you measuring load? I'm assuming the 8245 and 8347 are using
the same kernel.
- k
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: network load 8245 vs. 8347E
2007-06-21 15:33 ` Kumar Gala
@ 2007-06-21 15:56 ` Marc Leeman
2007-06-21 17:13 ` Marc Leeman
0 siblings, 1 reply; 12+ messages in thread
From: Marc Leeman @ 2007-06-21 15:56 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 1769 bytes --]
> How are you measuring load? I'm assuming the 8245 and 8347 are using
> the same kernel.
Simply the load of the processor. The 8245 is running ppc/2.6.17, the
8347e is running powerpc/2.6.21.1.
The 8245 kernel is not being upgraded anymore since
1. we're not using them anymore on the new designs
2. I didn't bother fixing the board support after the interrupt
handling changed in 2.6.18 because of 1.
Disabling NAPI seems to improve the situation a bit, but there's still a
load difference 25% on a marginally faster processor.
Mem: 10788K used, 116940K free, 0K shrd, 0K buff, 4212K cached
Load average: 0.39 0.43 0.29
PID USER STATUS VSZ PPID %CPU %MEM COMMAND
361 barco SW 152 270 30.8 0.1 recv
2222 barco RW 1124 270 0.3 0.8 top
119 root SW 1216 1 0.0 0.9 dropbear
270 barco SW 1132 1 0.0 0.8 sh
94 root SW 1132 1 0.0 0.8 syslogd
1 root SW 1128 0 0.0 0.8 init
95 root SW 1112 1 0.0 0.8 klogd
Mem: 8144K used, 20944K free, 0K shrd, 888K buff, 2384K cached
Load average: 0.00 0.00 0.00 (Status: S=sleeping R=running, W=waiting)
PID USER STATUS RSS PPID %CPU %MEM COMMAND
407 barco R 124 1 6.4 0.4 recv
510 root S 668 130 0.7 2.2 dropbear
13889 root R 368 7377 0.3 1.2 top
511 barco S 476 510 0.0 1.6 sh
52 root S 376 1 0.0 1.2 syslogd
1 root S 352 0 0.0 1.2 init
59 root S 340 1 0.0 1.1 klogd
--
greetz, marc
Oh no, no, no, no I don't boogie.
Crichton - Won't Get Fooled Again
chiana 2.6.18-4-ixp4xx #1 Tue Mar 27 18:01:56 BST 2007 GNU/Linux
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: network load 8245 vs. 8347E
2007-06-21 15:56 ` Marc Leeman
@ 2007-06-21 17:13 ` Marc Leeman
[not found] ` <467ABCDC.8060401@genesi-usa.com>
2007-06-28 18:06 ` 2.4/2.6/ppc/powerpc/8245/8347e Marc Leeman
0 siblings, 2 replies; 12+ messages in thread
From: Marc Leeman @ 2007-06-21 17:13 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 574 bytes --]
> Simply the load of the processor. The 8245 is running ppc/2.6.17, the
> 8347e is running powerpc/2.6.21.1.
Sorry, I'm confusing my boards: this particular board with the 8245
processor is running 2.4.34 [1].
so
8245: 2.4.34
8237e: 2.6.21.1
[1] the 2.4 line balanced the load better of multiple streams being
taken in wrt to the 2.6 kernels; this is the reason we stuck with 2.4
for this platform.
--
greetz, marc
He claims to be a human from a planet called Erp.
Aeryn - Premiere
chiana 2.6.18-4-ixp4xx #1 Tue Mar 27 18:01:56 BST 2007 GNU/Linux
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: network load 8245 vs. 8347E
[not found] ` <467ABCDC.8060401@genesi-usa.com>
@ 2007-06-21 18:33 ` Marc Leeman
2007-06-21 18:53 ` Matt Sealey
0 siblings, 1 reply; 12+ messages in thread
From: Marc Leeman @ 2007-06-21 18:33 UTC (permalink / raw)
To: Matt Sealey; +Cc: Linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 1611 bytes --]
> Isn't it just because the Intel chipset is a REALLY nice ethernet
> controller and the integrated one in the 8347E isn't as good? :)
Well, that's something what I'm afraid off: that for any kind of decent
network performance we'd need an external chipset. But there are some
other hickups we need to investigate with the 8347.
I just hope it's either a configuration problem, a not so efficient
driver implementation -> no re-design :)
> You could try turning the interrupt coaelescing off in the e100
> driver and see how well it does, then. Or knock the bundling
> threshold or timer down to something similar to the 8347E is using..
> or turn the ones on the 8347E up to match the ones the e100 driver
> is using :)
>
> All these options used to be modprobe options but the latest
> e100 driver seems to hardcode a bunch of them probably for best
> performance. Anyway, putting them on a level peg would mean at
> least you are comparing onboard apples with pci apples.
I started doing this this evening, but at first glance, changing sysfs
params (on the gianfar driver) didn't change much.
I'll start comparing 8245/e100/2.4.34, 8245/e100/2.6.17 and
8347E/gianfar/2.6.21.1 tomorrow.
Anyway a load of 30% for a single process where an older processor only
takes around 5% seems too much of a difference to be solved with simple
parameter settings.
--
greetz, marc
Open your ears, or your tentacles, or whatever orifice it is you
listen with!
Crichton - Back and Back and Back to the Future
chiana 2.6.18-4-ixp4xx #1 Tue Mar 27 18:01:56 BST 2007 GNU/Linux
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: network load 8245 vs. 8347E
2007-06-21 18:33 ` Marc Leeman
@ 2007-06-21 18:53 ` Matt Sealey
2007-06-21 19:09 ` Marc Leeman
0 siblings, 1 reply; 12+ messages in thread
From: Matt Sealey @ 2007-06-21 18:53 UTC (permalink / raw)
To: Marc Leeman; +Cc: Linuxppc-dev
Sure; the 2.4 ethernet driver for the e100 might be extremely efficient
on the CPU but not very good on bandwidth. They've (Intel) rewritten it
a short while back according to the documentation, so it may be that
the newer version on 2.4 may be just as CPU intensive as the gianfar
driver, and you're lucky to be using the old one. I haven't looked at
the 2.4.x version you're running to see what driver it's using, but Intel
do have their latest driver version backported to the 2.4 kernel series
on their site.
You might just be lucky that 2.4 is less bloated and has to handle less
features than 2.6 and it's lowering the CPU usage :)
So, I suppose, comparing at least 2.6 series kernels is a start, making
sure you compare the old 2.4 e100 driver to the new 2.6 e100 driver is
another idea. I think there are too many variables. That said, usually
inbuilt ethernets on SoC's.. at least in my experience.. tend to be
more efficient in some ways but not in others. In any case an Intel
network card has always kicked the pants off it.. :(
--
Matt Sealey <matt@genesi-usa•com>
Genesi, Manager, Developer Relations
Marc Leeman wrote:
>> Isn't it just because the Intel chipset is a REALLY nice ethernet
>> controller and the integrated one in the 8347E isn't as good? :)
>
> Well, that's something what I'm afraid off: that for any kind of decent
> network performance we'd need an external chipset. But there are some
> other hickups we need to investigate with the 8347.
>
> I just hope it's either a configuration problem, a not so efficient
> driver implementation -> no re-design :)
>
>> You could try turning the interrupt coaelescing off in the e100
>> driver and see how well it does, then. Or knock the bundling
>> threshold or timer down to something similar to the 8347E is using..
>> or turn the ones on the 8347E up to match the ones the e100 driver
>> is using :)
>>
>> All these options used to be modprobe options but the latest
>> e100 driver seems to hardcode a bunch of them probably for best
>> performance. Anyway, putting them on a level peg would mean at
>> least you are comparing onboard apples with pci apples.
>
> I started doing this this evening, but at first glance, changing sysfs
> params (on the gianfar driver) didn't change much.
>
> I'll start comparing 8245/e100/2.4.34, 8245/e100/2.6.17 and
> 8347E/gianfar/2.6.21.1 tomorrow.
>
> Anyway a load of 30% for a single process where an older processor only
> takes around 5% seems too much of a difference to be solved with simple
> parameter settings.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: network load 8245 vs. 8347E
2007-06-21 18:53 ` Matt Sealey
@ 2007-06-21 19:09 ` Marc Leeman
0 siblings, 0 replies; 12+ messages in thread
From: Marc Leeman @ 2007-06-21 19:09 UTC (permalink / raw)
To: Matt Sealey; +Cc: Linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 624 bytes --]
> You might just be lucky that 2.4 is less bloated and has to handle less
> features than 2.6 and it's lowering the CPU usage :)
Actually, I backported the e100 driver from the 2.6 series to the 2.4
series. It performed better. Perhaps I should re-sync again with what is
currently in the 2.6, but as I said, not much work is put in the older
platforms due to a load of proto 8347e boards being ported..
--
greetz, marc
Long enough for me to see your blue backside meditating, but not long
enough for you to touch me.
Rygel - PK Tech Girl
chiana 2.6.18-4-ixp4xx #1 Tue Mar 27 18:01:56 BST 2007 GNU/Linux
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* 2.4/2.6/ppc/powerpc/8245/8347e
2007-06-21 17:13 ` Marc Leeman
[not found] ` <467ABCDC.8060401@genesi-usa.com>
@ 2007-06-28 18:06 ` Marc Leeman
2007-06-29 14:59 ` 2.4/2.6/ppc/powerpc/8245/8347e Marc Leeman
1 sibling, 1 reply; 12+ messages in thread
From: Marc Leeman @ 2007-06-28 18:06 UTC (permalink / raw)
To: linuxppc-dev
[-- Attachment #1.1: Type: text/plain, Size: 3224 bytes --]
a small update:
> 8245: 2.4.34
> 8237e: 2.6.21.1
I've tried the following setup:
multicast stream @8192 kbps, one process taking in and dumping the data
on each board [1].
a) 8245/2.4.34/e100: 2.3.43-k1, @400 MHz
b) 8245/2.6.17/e100: 2.3.43-k1 [2] @350 MHz
c) 8347e/2.6.21.1/gianfar @400 Mhz
c) XScale-IXP42x/2.6.18-4/ixp4xx @266 MHz (NSLU2)
(2.3.43-k1 is the e100 driver version).
The process load for taking in the data is:
a) 4-5% [3]
b) 10-11%
c) 13-14%
d) 4-5%
While the current 8347/gianfar platform is the worst performer, the
2.6 kernel with the 2.4 e100 (before the rewrite) seems to perform
poorly too [4].
So the 834x preforms worse wrt the 8245 based configuration even though
it is slightly higher clocked.
It seems as if I bumped into the problem that lead me stick with the 2.4
in the first place for this 8245 platform; but never got round to
investigating. I find these results especially intriguing when
considering an ARM platform (NSLU2 device) that I had around, clocked at
only 66% of the 8347 and at 80% of the 8245 performs certainly in par
with the last one...
Even though I will need to recheck this (results to follow), a quick
test didn't reveal any significant difference between a ppc and powerpc
arch in the kernel.
It does look like, on our 8245/83xx platforms, the 2.6.x kernel performs
worse wrt the 2.4 ppc kernels and the 83xx configuration is worse wrt
the 8245 based configuration [5]. In retrospect, we had signals that
there was a problem with the 8245/83xx performance over the network last
year when investigating gstreamer, but due to time pressure but assumed
it was due to gstreamer and not the processor. This came as a suprise to
some of the ppl on the gstreamer mailing list that reported performant
ports to ARM architectures.
The results with the NSLU2 will certainly put heat on us from management
when redesigning or for follow up designs :(
Anyhow, I'm currently extending my test setups since this is an
important problem and set back.
If anyone has a hint to explaining what is going on here, please do
since solving this will certainly beat redesigning (esp. considering the
timeframe we've been assigned).
I've only found one relevant reference to 2.4/2.6 network performance
decrease at this point [6].
[1] sources attached
mcrecv -p 225.1.2.3 -a 12345
mcsend -p 225.1.2.3 -a 12345 -b 8192
I'm preparing more tests in the next days, in trying to figure out
what really is going on here.
[2] 2.4 driver ported to 2.6 kernel.
[3] This figure is read from top and not from the app since it seems to
be an underestimate (./fs/proc/array.c).
[4] I believe I ported the 2.4 e100 to the 2.6 2 years ago because it
performed much better, but I'll verify that in the next days.
[5] Obviously, testing 834x against the 2.4 kernel is not really an
option :)
[6] http://www.mail-archive.com/linux-net@vger.kernel.org/msg01283.html
--
greetz, marc
Don't think I'm going to miss you, any of you. I'm not. Well, maybe
a little bit.
Rygel - Into the Lion's Den - Wolf in Sheep's Clothing
chiana 2.6.18-4-ixp4xx #1 Tue Mar 27 18:01:56 BST 2007 GNU/Linux
[-- Attachment #1.2: recv_mcast.c --]
[-- Type: text/x-csrc, Size: 9890 bytes --]
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/time.h>
#include <time.h>
#define BUFSIZE 64*1024 /* Maximal packet content is 64k bytes */
#define MAXPARAMLEN 80
#define STATFILE "/proc/stat"
#define SELFSTATFILE "/proc/self/stat"
void error(char *msg);
void usage(const char* program)
{
fprintf(stdout,"usage : %s -a address -p port\n",program);
}
float getsysload()
{
unsigned long user = 0ul, nice = 0ul, system = 0ul, idle = 0ul;
char buf[BUFSIZ];
FILE *fp;
char buffer[BUFSIZ];
memset(buffer,0x0,BUFSIZ);
if((fp=fopen(STATFILE,"r"))<=0){
fprintf(stderr, "Problem opening %s\n", STATFILE);
return EXIT_FAILURE;
}
fread(buffer, sizeof(char), BUFSIZ, fp);
fclose(fp);
if(sscanf(buffer,"%s %lu %lu %lu %lu",buf, &user, &nice, &system, &idle)){
return (float)(user+system)/(user+system+nice+idle);
}
else{
fprintf(stdout, "no matching strings found\n");
}
}
unsigned long getcurrjiffies()
{
FILE *fp;
char buffer[BUFSIZ];
memset(buffer, 0x0, BUFSIZ);
fp=popen("cat /proc/self/stat | cut -d \\ -f 22","r");
fread(buffer, sizeof(char), BUFSIZ, fp);
pclose(fp);
return atoi(buffer);
}
unsigned long scan_stat(unsigned long *user, unsigned long *kernel)
{
FILE *fp;
char buffer[BUFSIZ];
uint32_t scanned = 0u;
signed int pid = 0;
char tcomm[BUFSIZ];
char state = 0x0;
signed int ppid = 0;
signed int pgid = 0;
signed int sid = 0;
signed int tty_nr = 0;
signed int tty_pgrp = 0;
unsigned long flags = 0ul;
unsigned long min_flt = 0ul;
unsigned long cmin_flt = 0ul;
unsigned long maj_flt = 0ul;
unsigned long cmaj_flt = 0ul;
unsigned long utime = 0ul;
unsigned long stime = 0ul;
signed long cutime = 0l;
signed long cstime = 0l;
signed long priority = 0l;
signed long nice = 0l;
signed int num_threads = 0;
unsigned long long start_time = 0ul;
unsigned long vsize = 0ul;
signed long rss = 0l;
unsigned long rsslim = 0ul;
unsigned long start_code = 0ul;
unsigned long end_code = 0ul;
unsigned long start_stack = 0ul;
unsigned long esp = 0ul;
unsigned long eip = 0ul;
/* The signal information here is obsolete.
* It must be decimal for Linux 2.0 compatibility.
* Use /proc/#/status for real-time signals.
*/
unsigned long signalpending = 0ul;
unsigned long signalblocked = 0ul;
unsigned long sigign = 0ul;
unsigned long sigcatch = 0ul;
unsigned long wchan = 0ul;
unsigned long dummy0 = 0ul;
unsigned long dummy1 = 0ul;
int exit_signal = 0;
int task_cpu = 0;
unsigned long rt_priority = 0ul;
unsigned long policy = 0ul;
unsigned long long delayticks = 0ull;
memset(buffer,0x0,BUFSIZ);
memset(tcomm,0x0,BUFSIZ);
if(!(fp=fopen(SELFSTATFILE,"r"))){
fprintf(stderr,"Error opening \"%s\".\n",SELFSTATFILE);
return 0;
}
fread(buffer, sizeof(char), BUFSIZ, fp);
fclose(fp);
#if 0
fprintf(stdout,"Reference:\n");
fprintf(stdout,"-------\n");
fprintf(stdout,"%s\n",buffer);
fprintf(stdout,"-------\n");
#endif
scanned = sscanf(buffer,"%d %s %c %d %d %d %d %d %lu %lu \
%lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n",
&pid,
tcomm,
&state,
&ppid,
&pgid,
&sid,
&tty_nr,
&tty_pgrp,
&flags,
&min_flt,
&cmin_flt,
&maj_flt,
&cmaj_flt,
&utime,
&stime,
&cutime,
&cstime,
&priority,
&nice,
&num_threads,
&start_time,
&vsize,
&rss,
&rsslim,
&start_code,
&end_code,
&start_stack,
&esp,
&eip,
/* The signal information here is obsolete.
* It must be decimal for Linux 2.0 compatibility.
* Use /proc/#/status for real-time signals.
*/
&signalpending,
&signalblocked,
&sigign,
&sigcatch,
&wchan,
&dummy0,
&dummy1,
&exit_signal,
&task_cpu,
&rt_priority,
&policy,
&delayticks);
#if 0
fprintf(stdout,"scanned %u items.\n",scanned);
fprintf(stdout,"%d %s %c %d %d %d %d %d %lu %lu \
%lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n",
pid,
tcomm,
state,
ppid,
pgid,
sid,
tty_nr,
tty_pgrp,
flags,
min_flt,
cmin_flt,
maj_flt,
cmaj_flt,
utime,
stime,
cutime,
cstime,
priority,
nice,
num_threads,
start_time,
vsize,
rss,
rsslim,
start_code,
end_code,
start_stack,
esp,
eip,
/* The signal information here is obsolete.
* It must be decimal for Linux 2.0 compatibility.
* Use /proc/#/status for real-time signals.
*/
signalpending,
signalblocked,
sigign,
sigcatch,
wchan,
dummy0,
dummy1,
exit_signal,
task_cpu,
rt_priority,
policy,
delayticks);
#endif
*user = utime;
*kernel = stime;
return utime + stime;
}
int main(int argc, char *argv[])
{
extern int getopt();
extern int optind;
extern char *optarg;
int c_opt;
int socket_val, bind_val, recvfromval, ctrlboardlen,
rc, recvbuff;
unsigned optlen, cameralen;
uint16_t mc_port;
char mc_addr[16];
struct ip_mreq mreq;
struct sockaddr_in ctrlboard, camera;
struct in_addr mcast_address;
struct hostent *h;
unsigned char buffer[BUFSIZE];
FILE *fp;
uint32_t ucnt = 0u,i,cnt=0u;
uint64_t usecs[8];
uint64_t received[8];
uint64_t treceived = 0ul;
struct timeval newtime;
unsigned long p_jiffies, c_jiffies;
unsigned long puser, pkernel;
unsigned long cuser, ckernel;
unsigned long long p_start, c_start;
/* Initialize the buffer */
memset(mc_addr, 0x0, 16);
memset(buffer,0x0,BUFSIZE);
for(i=0;i<8;i++){
usecs[i] = 0ul;
received[8] = 0ul;
}
/* handling of command line options */
while ((c_opt = getopt(argc, argv, "a:b:p:")) != EOF) {
switch (c_opt) {
case 'a':
strncpy(mc_addr,optarg,16);
break;
case 'p':
mc_port = (atoi(optarg));
break;
default:
fprintf(stderr, "%s: Bad Option -%c\n", argv[0], c_opt);
exit(EXIT_FAILURE);
}
}
if(!mc_addr || !mc_port){
usage(argv[0]);
return EXIT_FAILURE;
}
/* Get mcast address to listen to */
h=gethostbyname(mc_addr);
if(h==NULL) {
fprintf(stdout,"Unknown group %s\n",mc_addr);
exit(1);
}
memcpy(&mcast_address, h->h_addr_list[0],h->h_length);
/* Check given address is multicast */
if(!IN_MULTICAST(ntohl(mcast_address.s_addr))) {
fprintf(stdout,"Given address '%s' is not multicast\n",
inet_ntoa(mcast_address));
exit(1);
}
/* Create socket for incoming connections */
socket_val=socket(AF_INET, SOCK_DGRAM, 0);
if (socket_val < 0)
error("Error opening socket");
/* Set content of ctrlboard */
ctrlboardlen = sizeof(ctrlboard);
bzero(&ctrlboard,ctrlboardlen);
/* Fill in the UDP Receiver properties */
ctrlboard.sin_family=AF_INET;
ctrlboard.sin_addr.s_addr=htonl(INADDR_ANY);
ctrlboard.sin_port=htons(mc_port);
/* Set socket options */
recvbuff=128*1024;
if (setsockopt(socket_val,SOL_SOCKET,SO_RCVBUF,(char *) &recvbuff, sizeof(recvbuff)) < 0)
error("Error setting socket options");
/* Get socket options */
optlen=sizeof(recvbuff);
if (getsockopt(socket_val,SOL_SOCKET,SO_RCVBUF,(char *) &recvbuff, &optlen) < 0)
error("Error getting socket options");
fprintf(stdout,"Receive buffer size: %d \n",recvbuff);
/* Bind to associate port number with the socket */
bind_val = bind(socket_val,(struct sockaddr *)&ctrlboard,ctrlboardlen);
if (bind_val < 0)
error("Error bind");
/* join multicast group */
mreq.imr_multiaddr.s_addr=mcast_address.s_addr;
mreq.imr_interface.s_addr=htonl(INADDR_ANY);
rc = setsockopt(socket_val,IPPROTO_IP,IP_ADD_MEMBERSHIP, (void *) &mreq, sizeof(mreq));
if(rc<0){
fprintf(stdout,"Cannot join multicast group '%s'", inet_ntoa(mcast_address));
exit(1);
}
else
fprintf(stdout,"Listening to mgroup %s:%d\n", inet_ntoa(mcast_address), mc_port);
/* Fill in length of struct sockaddr_in camera */
cameralen = sizeof(camera);
gettimeofday(&newtime,NULL);
p_jiffies = scan_stat(&puser, &pkernel);
p_start = getcurrjiffies();
/* Loop */
while (1) {
recvfromval = recvfrom(socket_val,buffer,BUFSIZE,0,(struct sockaddr *)&camera,&cameralen);
treceived += recvfromval;
if (recvfromval < 0) error("Error recvfrom");
if(!(cnt&0xff)){
double cbitrate = 0.0;
uint8_t ccnt = ucnt&0x7;
uint8_t pcnt = ccnt?ccnt-1:0x7;
uint64_t dt;
c_jiffies = scan_stat(&cuser,&ckernel);
c_start = getcurrjiffies();
gettimeofday(&newtime,NULL);
usecs[ccnt] = (uint64_t)(newtime.tv_sec*1e6+newtime.tv_usec);
received[ccnt] = treceived;
dt = usecs[ccnt] - usecs[pcnt];
if(usecs[ccnt]<usecs[pcnt]){
ucnt = 0;
}
else{
unsigned lcnt = 0u;
for(i=0;i<8;i++){
lcnt = received[i];
}
if(dt){
cbitrate = (((double)lcnt*8)/((double)dt))*1e3;
fprintf(stdout,"Approx bitrate is %2.2lf kbps, system load is %2.2f%%, process %2.2f%% (u %2.2f%%, s %2.2f).\n",
cbitrate,
getsysload()*100,
(double)(c_jiffies - p_jiffies)*100/(c_start - p_start),
(double)(cuser-puser)*100/(c_start - p_start),
(double)(ckernel-pkernel)*100/(c_start - p_start)
);
}
}
ucnt++;
treceived = 0;
p_jiffies = c_jiffies;
p_start = c_start;
pkernel = ckernel;
puser = cuser;
}
cnt++;
}
close(socket_val);
fclose(fp);
}
void error(char *msg) {
perror(msg);
exit(0);
}
[-- Attachment #1.3: send_mcast.c --]
[-- Type: text/x-csrc, Size: 9247 bytes --]
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <fcntl.h>
#include <netdb.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/time.h>
#include <time.h>
#define STATFILE "/proc/stat"
#define SELFSTATFILE "/proc/self/stat"
void usage(const char* program)
{
fprintf(stdout,"Usage: %s -a address -p port -b bitrate\n",program);
fprintf(stdout," bitrate in kbps.\n");
}
float getsysload()
{
unsigned long user = 0ul, nice = 0ul, system = 0ul, idle = 0ul;
char buf[BUFSIZ];
FILE *fp;
char buffer[BUFSIZ];
memset(buffer,0x0,BUFSIZ);
if((fp=fopen(STATFILE,"r"))<=0){
fprintf(stderr, "Problem opening %s\n", STATFILE);
return EXIT_FAILURE;
}
fread(buffer, sizeof(char), BUFSIZ, fp);
fclose(fp);
if(sscanf(buffer,"%s %lu %lu %lu %lu",buf, &user, &nice, &system, &idle)){
return (float)(user+system)/(user+system+nice+idle);
}
else{
fprintf(stdout, "no matching strings found\n");
}
}
unsigned long getcurrjiffies()
{
FILE *fp;
char buffer[BUFSIZ];
memset(buffer, 0x0, BUFSIZ);
fp=popen("cat /proc/self/stat | cut -d \\ -f 22","r");
fread(buffer, sizeof(char), BUFSIZ, fp);
pclose(fp);
return atoi(buffer);
}
unsigned long scan_stat(unsigned long *user, unsigned long *kernel)
{
FILE *fp;
char buffer[BUFSIZ];
uint32_t scanned = 0u;
signed int pid = 0;
char tcomm[BUFSIZ];
char state = 0x0;
signed int ppid = 0;
signed int pgid = 0;
signed int sid = 0;
signed int tty_nr = 0;
signed int tty_pgrp = 0;
unsigned long flags = 0ul;
unsigned long min_flt = 0ul;
unsigned long cmin_flt = 0ul;
unsigned long maj_flt = 0ul;
unsigned long cmaj_flt = 0ul;
unsigned long utime = 0ul;
unsigned long stime = 0ul;
signed long cutime = 0l;
signed long cstime = 0l;
signed long priority = 0l;
signed long nice = 0l;
signed int num_threads = 0;
unsigned long long start_time = 0ul;
unsigned long vsize = 0ul;
signed long rss = 0l;
unsigned long rsslim = 0ul;
unsigned long start_code = 0ul;
unsigned long end_code = 0ul;
unsigned long start_stack = 0ul;
unsigned long esp = 0ul;
unsigned long eip = 0ul;
/* The signal information here is obsolete.
* It must be decimal for Linux 2.0 compatibility.
* Use /proc/#/status for real-time signals.
*/
unsigned long signalpending = 0ul;
unsigned long signalblocked = 0ul;
unsigned long sigign = 0ul;
unsigned long sigcatch = 0ul;
unsigned long wchan = 0ul;
unsigned long dummy0 = 0ul;
unsigned long dummy1 = 0ul;
int exit_signal = 0;
int task_cpu = 0;
unsigned long rt_priority = 0ul;
unsigned long policy = 0ul;
unsigned long long delayticks = 0ull;
memset(buffer,0x0,BUFSIZ);
memset(tcomm,0x0,BUFSIZ);
if(!(fp=fopen(SELFSTATFILE,"r"))){
fprintf(stderr,"Error opening \"%s\".\n",SELFSTATFILE);
return 0;
}
fread(buffer, sizeof(char), BUFSIZ, fp);
fclose(fp);
#if 0
fprintf(stdout,"Reference:\n");
fprintf(stdout,"-------\n");
fprintf(stdout,"%s\n",buffer);
fprintf(stdout,"-------\n");
#endif
scanned = sscanf(buffer,"%d %s %c %d %d %d %d %d %lu %lu \
%lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n",
&pid,
tcomm,
&state,
&ppid,
&pgid,
&sid,
&tty_nr,
&tty_pgrp,
&flags,
&min_flt,
&cmin_flt,
&maj_flt,
&cmaj_flt,
&utime,
&stime,
&cutime,
&cstime,
&priority,
&nice,
&num_threads,
&start_time,
&vsize,
&rss,
&rsslim,
&start_code,
&end_code,
&start_stack,
&esp,
&eip,
/* The signal information here is obsolete.
* It must be decimal for Linux 2.0 compatibility.
* Use /proc/#/status for real-time signals.
*/
&signalpending,
&signalblocked,
&sigign,
&sigcatch,
&wchan,
&dummy0,
&dummy1,
&exit_signal,
&task_cpu,
&rt_priority,
&policy,
&delayticks);
#if 0
fprintf(stdout,"scanned %u items.\n",scanned);
fprintf(stdout,"%d %s %c %d %d %d %d %d %lu %lu \
%lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n",
pid,
tcomm,
state,
ppid,
pgid,
sid,
tty_nr,
tty_pgrp,
flags,
min_flt,
cmin_flt,
maj_flt,
cmaj_flt,
utime,
stime,
cutime,
cstime,
priority,
nice,
num_threads,
start_time,
vsize,
rss,
rsslim,
start_code,
end_code,
start_stack,
esp,
eip,
/* The signal information here is obsolete.
* It must be decimal for Linux 2.0 compatibility.
* Use /proc/#/status for real-time signals.
*/
signalpending,
signalblocked,
sigign,
sigcatch,
wchan,
dummy0,
dummy1,
exit_signal,
task_cpu,
rt_priority,
policy,
delayticks);
#endif
*user = utime;
*kernel = stime;
return utime + stime;
}
int main(int argc, char *argv[])
{
extern int getopt();
extern int optind;
extern char *optarg;
int c_opt;
unsigned int mc_server_socket;
struct sockaddr_in mc_addr_sockaddr;
uint8_t TTL = 0u;
uint8_t buffer[BUFSIZ];
int retval;
uint16_t mc_port = 0u;
char mc_addr[16];
uint32_t mc_bitrate = 0u, i, cnt = 0u, ucnt=0u;
float delay = 0;
unsigned long usecs[8];
unsigned long p_jiffies, c_jiffies;
unsigned long puser, pkernel;
unsigned long cuser, ckernel;
unsigned long long p_start, c_start;
struct timeval newtime;
/* Init */
memset(mc_addr, 0x0, 16);
for(i=0;i<8;i++){
usecs[i] = 0ul;
}
for(i=0;i<BUFSIZ>>2;i++){
((uint32_t*)buffer)[i]=0xbadc0ffe;
}
/* handling of command line options */
while ((c_opt = getopt(argc, argv, "a:b:p:")) != EOF) {
switch (c_opt) {
case 'a':
strncpy(mc_addr,optarg,16);
break;
case 'b':
mc_bitrate = (atoi(optarg));
break;
case 'p':
mc_port = (atoi(optarg));
break;
default:
fprintf(stderr, "%s: Bad Option -%c\n", argv[0], c_opt);
exit(EXIT_FAILURE);
}
}
if(!mc_addr || !mc_port || !mc_bitrate){
usage(argv[0]);
return EXIT_FAILURE;
}
/* Create a multicast socket */
mc_server_socket=socket(AF_INET, SOCK_DGRAM,0);
/* Create multicast group address information */
mc_addr_sockaddr.sin_family = AF_INET;
mc_addr_sockaddr.sin_addr.s_addr = inet_addr(mc_addr);
mc_addr_sockaddr.sin_port = htons(mc_port);
/* Set the TTL for the sends using a setsockopt() */
TTL = 1;
retval = setsockopt(mc_server_socket, IPPROTO_IP, IP_MULTICAST_TTL, (char *)&TTL, sizeof(TTL));
if (retval < 0){
fprintf(stdout,"ERROR setsockopt() failed with %d \n", retval);
return EXIT_FAILURE;
}
/* get estimated us delay */
delay = 1e6/((float)(mc_bitrate<<10)/(sizeof(buffer)<<3));
/* Send MC message */
fprintf(stdout,"Multicast to socket %s:%u.\n",mc_addr, mc_port);
fprintf(stdout,"Requested bitrate is %u kbps.\n",mc_bitrate);
fprintf(stdout,"Need %2.2f packets of %u bytes per second.\n",((float)(mc_bitrate<<10)/(sizeof(buffer)<<3)),sizeof(buffer));
fprintf(stdout,"Setting interpacket delay at %2.2f usec.\n",delay);
gettimeofday(&newtime,NULL);
p_jiffies = scan_stat(&puser, &pkernel);
p_start = getcurrjiffies();
// usecs[ucnt++] = newtime.tv_sec*1e6+newtime.tv_usec;
while(1){
/* Send buffer as a datagram to the multicast group */
sendto(mc_server_socket, buffer, sizeof(buffer), 0,
(struct sockaddr*)&mc_addr_sockaddr, sizeof(mc_addr_sockaddr));
usleep(delay);
if(!(cnt&0xff)){
double cbitrate = 0.0;
uint8_t ccnt = ucnt&0x7;
uint8_t pcnt = ccnt?ccnt-1:0x7;
unsigned long dt;
c_jiffies = scan_stat(&cuser,&ckernel);
c_start = getcurrjiffies();
gettimeofday(&newtime,NULL);
usecs[ccnt] = newtime.tv_sec*1e6+newtime.tv_usec;
dt = usecs[ccnt] - usecs[pcnt];
if(usecs[ccnt]<usecs[pcnt]){
ucnt = 0;
}
else{
cbitrate = ((((double)sizeof(buffer)*8)*0xff)/((float)dt))*1e3;
fprintf(stdout,"Approx bitrate is %2.2lf kbps, system load is %2.2f%%, process %2.2f%% (u %2.2f%%, s %2.2f).\n",
cbitrate,
getsysload()*100,
(double)(c_jiffies - p_jiffies)*100/(c_start - p_start),
(double)(cuser-puser)*100/(c_start - p_start),
(double)(ckernel-pkernel)*100/(c_start - p_start)
);
if(((cbitrate+256)<mc_bitrate)){
delay -= 500;
fprintf(stdout,"Interpacket delay adjusted to %2.2f usec\n",delay);
}
else if ((cbitrate-256)>mc_bitrate){
delay += 500;
fprintf(stdout,"Interpacket delay adjusted to %2.2f usec\n",delay);
}
if(delay<0){
fprintf(stderr,"Cannot send data out fast enough.\n");
mc_bitrate -= 1024;
delay = 0;
fprintf(stdout,"Limiting data to %u kbps.\n",mc_bitrate);
}
ucnt++;
}
p_jiffies = c_jiffies;
p_start = c_start;
pkernel = ckernel;
puser = cuser;
}
cnt++;
}
/* Close and clean-up */
close(mc_server_socket);
return EXIT_SUCCESS;
}
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.4/2.6/ppc/powerpc/8245/8347e
2007-06-28 18:06 ` 2.4/2.6/ppc/powerpc/8245/8347e Marc Leeman
@ 2007-06-29 14:59 ` Marc Leeman
2007-07-09 15:47 ` 2.4/2.6/ppc/powerpc/8245/8347e Marc Leeman
0 siblings, 1 reply; 12+ messages in thread
From: Marc Leeman @ 2007-06-29 14:59 UTC (permalink / raw)
To: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 3657 bytes --]
More platforms and higher bitrate tests (I've left the previous post in
comment):
> a) 8245/2.4.34/e100: 2.3.43-k1, @400 MHz
> b) 8245/2.6.17/e100: 2.3.43-k1 [2] @350 MHz
> c) 8347e/2.6.21.1/gianfar @400 Mhz
> d) XScale-IXP42x/2.6.18-4/ixp4xx @266 MHz (NSLU2)
e) 8245/2.6.17/e100: 3.5.10-k2-NAPI @350 MHz
f) 405/2.6.22-rc6/smsc9117 @200 MHz
g) 405/2.4.32/IBM OCP EMAC: 2.0 @266 MHz
h) Coppermine/2.6.18/e100: 3.5.10-k2-NAPI @930 MHz
> (2.3.43-k1 is the e100 driver version).
Platform h is just an old server as reference to see if a 2.6.x scales
as bad with an e100 on a different architecture.
>
> The process load for taking in the data is:
>
> a) 4-5% [3]
> b) 10-11%
> c) 13-14%
> d) 4-5%
e) 10-11%
f) 2-3%
g) 5%
h) 0%
This situation is even (a lot) worse when increasing the bitrate. When
a bitrate of 12 Mbps is used, we get the following results:
a) 4-5%
b) 18%
c) 35%
d) 4-7%
e) 18%
f) -
g) -
h) 1-2 %
> While the current 8347/gianfar platform is the worst performer, the
> 2.6 kernel with the 2.4 e100 (before the rewrite) seems to perform
> poorly too [4].
>
> So the 834x preforms worse wrt the 8245 based configuration even though
> it is slightly higher clocked.
>
> It seems as if I bumped into the problem that lead me stick with the 2.4
> in the first place for this 8245 platform; but never got round to
> investigating. I find these results especially intriguing when
> considering an ARM platform (NSLU2 device) that I had around, clocked at
> only 66% of the 8347 and at 80% of the 8245 performs certainly in par
> with the last one...
The load is even worsening in a non linearly as the bitrate goes up (I
coult not test all the platforms for this since not all the embedded
platforms are located in our network and I've rallied some collegues
from over the company to get some other platforms tested, probably I
will get more data next week).
> Even though I will need to recheck this (results to follow), a quick
> test didn't reveal any significant difference between a ppc and powerpc
> arch in the kernel.
>
> It does look like, on our 8245/83xx platforms, the 2.6.x kernel performs
> worse wrt the 2.4 ppc kernels and the 83xx configuration is worse wrt
> the 8245 based configuration [5]. In retrospect, we had signals that
> there was a problem with the 8245/83xx performance over the network last
> year when investigating gstreamer, but due to time pressure but assumed
> it was due to gstreamer and not the processor. This came as a suprise to
> some of the ppl on the gstreamer mailing list that reported performant
> ports to ARM architectures.
If I look at platform (f), 405/2.6.22-rc6, it doesn't seem to be a
general powerpc problem, but just a 824x/83xx or platform issue.
> The results with the NSLU2 will certainly put heat on us from management
> when redesigning or for follow up designs :(
>
> Anyhow, I'm currently extending my test setups since this is an
> important problem and set back.
>
> If anyone has a hint to explaining what is going on here, please do
> since solving this will certainly beat redesigning (esp. considering the
> timeframe we've been assigned).
>
>
> I've only found one relevant reference to 2.4/2.6 network performance
> decrease at this point [6].
>
>
> [1] sources attached
mcrecv -a 225.1.2.3 -p 12345
mcsend -a 225.1.2.3 -p 12345 -b 8192
--
greetz, marc
Aeryn, did I say or do anything to piss you off? I mean other than
caving in the side of your head?
Crichton - Die Me, Dichotomy
chiana 2.6.18-4-ixp4xx #1 Tue Mar 27 18:01:56 BST 2007 GNU/Linux
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.4/2.6/ppc/powerpc/8245/8347e
2007-06-29 14:59 ` 2.4/2.6/ppc/powerpc/8245/8347e Marc Leeman
@ 2007-07-09 15:47 ` Marc Leeman
2007-07-09 19:56 ` 2.4/2.6/ppc/powerpc/8245/8347e Linas Vepstas
0 siblings, 1 reply; 12+ messages in thread
From: Marc Leeman @ 2007-07-09 15:47 UTC (permalink / raw)
To: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 834 bytes --]
> More platforms and higher bitrate tests (I've left the previous post in
> comment):
I finally was able to figure out the culprid:
CONFIG_SLOB=y
instead of
CONFIG_SLAB=y
--------
CONFIG_SLAB:
Disabling this replaces the advanced SLAB allocator and
kmalloc support with the drastically simpler SLOB allocator.
SLOB is more space efficient but does not scale well and is
more susceptible to fragmentation.
--------
I was expecting a lower DMM performance but wasn't expecting such a
drain on kernel/network load.
The original reason for this change was a fixed flashmap and a increased
2.6 kernel that didn't fit in this region (backwards compatible).
--
greetz, marc
I feel like I had a spiritual enema.
Jool - Losing Time
chiana 2.6.18-4-ixp4xx #1 Tue Mar 27 18:01:56 BST 2007 GNU/Linux
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.4/2.6/ppc/powerpc/8245/8347e
2007-07-09 15:47 ` 2.4/2.6/ppc/powerpc/8245/8347e Marc Leeman
@ 2007-07-09 19:56 ` Linas Vepstas
2007-07-10 7:55 ` 2.4/2.6/ppc/powerpc/8245/8347e Marc Leeman
0 siblings, 1 reply; 12+ messages in thread
From: Linas Vepstas @ 2007-07-09 19:56 UTC (permalink / raw)
To: Marc Leeman; +Cc: linuxppc-dev
On Mon, Jul 09, 2007 at 05:47:23PM +0200, Marc Leeman wrote:
>
> Disabling this replaces the advanced SLAB allocator and
> kmalloc support with the drastically simpler SLOB allocator.
> SLOB is more space efficient but does not scale well and is
> more susceptible to fragmentation.
> --------
>
> I was expecting a lower DMM performance but wasn't expecting such a
> drain on kernel/network load.
OK, to be clear: you seem to be saying that using the SLOB instead
of the SLAB allocator results in such terrible memory fragmentation
that network performance is degraded by large factors (2x or 5x or
something like that, if I remember your earlier emails). Is that right?
I thought I heard about some memory-defrag patches being posted.
What happens if these are used together with SLOB? Does one regain the
lost performance? Perhaps maybe one gets even better performance?
--linas
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.4/2.6/ppc/powerpc/8245/8347e
2007-07-09 19:56 ` 2.4/2.6/ppc/powerpc/8245/8347e Linas Vepstas
@ 2007-07-10 7:55 ` Marc Leeman
0 siblings, 0 replies; 12+ messages in thread
From: Marc Leeman @ 2007-07-10 7:55 UTC (permalink / raw)
To: Linas Vepstas; +Cc: linuxppc-dev
[-- Attachment #1: Type: text/plain, Size: 1539 bytes --]
> > I was expecting a lower DMM performance but wasn't expecting such a
> > drain on kernel/network load.
>
> OK, to be clear: you seem to be saying that using the SLOB instead
> of the SLAB allocator results in such terrible memory fragmentation
> that network performance is degraded by large factors (2x or 5x or
> something like that, if I remember your earlier emails). Is that right?
Yep, I thought I would at least post my findings after hurracing the
list with my posts.
Well, I don't really know if it is the fragmentation that comes into
play, or if it is simply the implementation of the slob allocator that
much more inefficient in allocating free blocks of memory; but that's
about right.
> I thought I heard about some memory-defrag patches being posted.
> What happens if these are used together with SLOB? Does one regain the
> lost performance? Perhaps maybe one gets even better performance?
In the ChangeLog of the 2.6.22, I saw something about a slub allocator
that I want to test; I'll give your suggestion a go too, though I would
not expect significant improvements: I suspect it's the slob
implementation that is slower.
But I had a small problem with my flash not being detected anymore when
quickly booting the 2.6.22, I'll look into it today, there was a note in
the ChangeLog for powerpc about this IIRC.
--
greetz, marc
Better wed than dead.
Crichton - Look at the Princess - A Kiss is Just a Kiss
chiana 2.6.18-4-ixp4xx #1 Tue Mar 27 18:01:56 BST 2007 GNU/Linux
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-07-10 7:55 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-21 14:15 network load 8245 vs. 8347E Marc Leeman
2007-06-21 15:33 ` Kumar Gala
2007-06-21 15:56 ` Marc Leeman
2007-06-21 17:13 ` Marc Leeman
[not found] ` <467ABCDC.8060401@genesi-usa.com>
2007-06-21 18:33 ` Marc Leeman
2007-06-21 18:53 ` Matt Sealey
2007-06-21 19:09 ` Marc Leeman
2007-06-28 18:06 ` 2.4/2.6/ppc/powerpc/8245/8347e Marc Leeman
2007-06-29 14:59 ` 2.4/2.6/ppc/powerpc/8245/8347e Marc Leeman
2007-07-09 15:47 ` 2.4/2.6/ppc/powerpc/8245/8347e Marc Leeman
2007-07-09 19:56 ` 2.4/2.6/ppc/powerpc/8245/8347e Linas Vepstas
2007-07-10 7:55 ` 2.4/2.6/ppc/powerpc/8245/8347e Marc Leeman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox