public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed
From: Matt Sealey <matt@genesi-usa•com>
To: David Jander <david.jander@protonic•nl>
Cc: linuxppc-dev@ozlabs•org
Subject: Re: Efficient memcpy()/memmove() for G2/G3 cores...
Date: Mon, 25 Aug 2008 12:00:10 +0100	[thread overview]
Message-ID: <48B290BA.7060202@genesi-usa.com> (raw)
In-Reply-To: <200808251131.02071.david.jander@protonic.nl>

Hi David,

The focus has definitely been on VMX but that's not to say lower power
processors were forgotten :)

Gunnar von Boehn did some benchmarking with an assembly optimized routine,
for Cell, 603e and so on (basically the whole gamut from embedded up to
sever class IBM chips) and got some pretty good results;

http://www.powerdeveloper.org/forums/viewtopic.php?t=1426

It is definitely something that needs fixing. The generic routine in glibc
just copies words with no benefit of knowing the cache line size or any
cache block buffers in the chip, and certainly no use of cache control or
data streaming on higher end chips.

With knowledge of the right way to unroll the loops, how many copies to
do at once to try and get a burst, reducing cache usage etc. you can get
very impressive performance (as you can see, 50MB up to 78MB at the
smallest size, the basic improvement is 2x performance).

I hope that helps you a little bit. Gunnar posted code to this list not
long after. I have a copy of the "e300 optimized" routine but I thought
best he should post it here, than myself.

There is a lot of scope I think for optimizing several points (glibc,
kernel, some applications) for embedded processors which nobody is
really taking on. But, not many people want to do this kind of work..

-- 
Matt Sealey <matt@genesi-usa•com>
Genesi, Manager, Developer Relations

David Jander wrote:
> Hello,
> 
> I was wondering if there is a good replacement for GLibc memcpy() functions, 
> that doesn't have horrendous performance on embedded PowerPC processors (such 
> as Glibc has).
> 
> I did some simple benchmarks with this implementation on our custom MPC5121 
> based board (Freescale e300 core, something like a PPC603e, G2, without VMX):
> 
> ...
> unsigned long int a,b,c,d;
> unsigned long int a1,b1,c1,d1;
> ...
> while (len >= 32)
> {
>     a =  plSrc[0];
>     b =  plSrc[1];
>     c =  plSrc[2];
>     d =  plSrc[3];
>     a1 = plSrc[4];
>     b1 = plSrc[5];
>     c1 = plSrc[6];
>     d1 = plSrc[7];
>     plSrc += 8;
>     plDst[0] = a;
>     plDst[1] = b;
>     plDst[2] = c;
>     plDst[3] = d;
>     plDst[4] = a1;
>     plDst[5] = b1;
>     plDst[6] = c1;
>     plDst[7] = d1;
>     plDst += 8;
>     len -= 32;
> }
> ...
> 
> And the results are more than telling.... by linking this with LD_PRELOAD, 
> some programs get an enourmous performance boost.
> For example a small test program that copies frames into video memory (just 
> RAM) improved throughput from 13.2 MiB/s to 69.5 MiB/s.
> I have googled for this issue, but most optimized versions of memcpy() and 
> friends seem to focus on AltiVec/VMX, which this processor does not have.
> Now I am certain that most of the G2/G3 users on this list _must_ have a 
> better solution for this. Any suggestions?
> 
> Btw, the tests are done on Ubuntu/PowerPC 7.10, don't know if that matters 
> though...
> 
> Best regards,
> 

  reply	other threads:[~2008-08-25 11:00 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-25  9:31 Efficient memcpy()/memmove() for G2/G3 cores David Jander
2008-08-25 11:00 ` Matt Sealey [this message]
2008-08-25 13:06   ` David Jander
2008-08-25 22:28     ` Benjamin Herrenschmidt
2008-08-27 21:04       ` Steven Munroe
2008-08-29 11:48         ` David Jander
2008-08-29 12:21           ` Joakim Tjernlund
2008-09-01  7:23             ` David Jander
2008-09-01  9:36               ` Joakim Tjernlund
2008-09-02 13:12                 ` David Jander
2008-09-03  6:43                   ` Joakim Tjernlund
2008-09-03 20:33                   ` prodyut hazarika
2008-09-04  2:04                     ` Paul Mackerras
2008-09-04 12:05                       ` David Jander
2008-09-04 12:19                         ` Josh Boyer
2008-09-04 12:59                           ` David Jander
2008-09-04 14:31                             ` Steven Munroe
2008-09-04 14:45                               ` Gunnar Von Boehn
2008-09-04 15:14                               ` Gunnar Von Boehn
2008-09-04 16:25                               ` David Jander
2008-09-04 15:01                             ` Gunnar Von Boehn
2008-09-04 16:32                               ` David Jander
2008-09-04 18:14                       ` prodyut hazarika
2008-08-29 20:34           ` Steven Munroe
2008-09-01  8:29             ` David Jander
2008-08-31  8:28           ` Benjamin Herrenschmidt
2008-09-01  6:42             ` David Jander

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48B290BA.7060202@genesi-usa.com \
    --to=matt@genesi-usa$(echo .)com \
    --cc=david.jander@protonic$(echo .)nl \
    --cc=linuxppc-dev@ozlabs$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox