public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed
From: Gabriel Paubert <paubert@iram•es>
To: David Laight <David.Laight@ACULAB•COM>
Cc: James Yang <James.Yang@freescale•com>,
	Chris Proctor <cproctor@csc•com.au>,
	Stephen N Chivers <schivers@csc•com.au>,
	"linuxppc-dev@lists•ozlabs.org" <linuxppc-dev@lists•ozlabs.org>
Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
Date: Mon, 10 Feb 2014 13:21:38 +0100	[thread overview]
Message-ID: <20140210122138.GA30356@visitor2.iram.es> (raw)
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D0F6BBCE9@AcuExch.aculab.com>

On Mon, Feb 10, 2014 at 11:17:38AM +0000, David Laight wrote:
> > > However, your other solutions are better.
> > >
> > >
> > > > > >
> > > > > > mask = (FM & 1);
> > > > > > mask |= (FM << 3) & 0x10;
> > > > > > mask |= (FM << 6) & 0x100;
> > > > > > mask |= (FM << 9) & 0x1000;
> > > > > > mask |= (FM << 12) & 0x10000;
> > > > > > mask |= (FM << 15) & 0x100000;
> > > > > > mask |= (FM << 18) & 0x1000000;
> > > > > > mask |= (FM << 21) & 0x10000000;
> > > > > > mask *= 15;
> > > > > >
> > > > > > should do the job, in less code space and without a single branch.
> ...
> > > > > > Another way of optomizing this could be:
> > > > > >
> > > > > > mask = (FM & 0x0f) | ((FM << 12) & 0x000f0000);
> > > > > > mask = (mask & 0x00030003) | ((mask << 6) & 0x03030303);
> > > > > > mask = (mask & 0x01010101) | ((mask << 3) & 0x10101010);
> > > > > > mask *= 15;
> ...
> > Ok, if you have measured that method1 is faster than method2, let us go for it.
> > I believe method2 would be faster if you had a large out-of-order execution
> > window, because more parallelism can be extracted from it, but this is probably
> > only true for high end cores, which do not need FPU emulation in the first place.
> 
> FWIW the second has a long dependency chain on 'mask', whereas the first can execute
> the shift/and in any order and then merge the results.
> So on most superscalar cpu, or one with result delays for arithmetic, the first
> is likely to be faster.

I disagree, perhaps mostly because the compiler is not clever enough, but right
now the code for solution 1 is (actually I have rewritten the code
and it reads:

	mask = (FM & 1)
			| ((FM << 3) & 0x10)
			| ((FM << 6) & 0x100)
			| ((FM << 9) & 0x1000)
			| ((FM << 12) & 0x10000)
			| ((FM << 15) & 0x100000)
			| ((FM << 18) & 0x1000000)
			| ((FM << 21) & 0x10000000);
to avoid sequence point in case it hampers the compiler)

and the output is:

        rlwinm 10,3,3,27,27      # D.11621, FM,,
        rlwinm 9,3,6,23,23       # D.11621, FM,,
        or 9,10,9        #, D.11621, D.11621, D.11621
        rlwinm 10,3,0,31,31      # D.11621, FM,
        or 9,9,10        #, D.11621, D.11621, D.11621
        rlwinm 10,3,9,19,19      # D.11621, FM,,
        or 9,9,10        #, D.11621, D.11621, D.11621
        rlwinm 10,3,12,15,15     # D.11621, FM,,
        or 9,9,10        #, D.11621, D.11621, D.11621
        rlwinm 10,3,15,11,11     # D.11621, FM,,
        or 9,9,10        #, D.11621, D.11621, D.11621
        rlwinm 10,3,18,7,7       # D.11621, FM,,
        or 9,9,10        #, D.11621, D.11621, D.11621
        rlwinm 3,3,21,3,3        # D.11621, FM,,
        or 9,9,3         #, mask, D.11621, D.11621
        mulli 9,9,15     # mask, mask,

see that r9 is used 7 times as both input and output operand, plus
once for rlwinm. This gives a dependency length of 8 at least.

In the other case (I've deleted the code) the dependency length
was significantly shorter. In any case that one is fewer instructions, 
which is good for occasional use. 

	Gabriel

  reply	other threads:[~2014-02-10 12:21 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-06  2:09 arch/powerpc/math-emu/mtfsf.c - incorrect mask? Stephen N Chivers
2014-02-06  8:26 ` Gabriel Paubert
2014-02-07  1:27   ` Stephen N Chivers
2014-02-07 10:10     ` Gabriel Paubert
2014-02-07 20:49       ` James Yang
2014-02-09 19:42         ` Stephen N Chivers
2014-02-10 16:50           ` James Yang
2014-02-10 11:03         ` Gabriel Paubert
2014-02-10 11:17           ` David Laight
2014-02-10 12:21             ` Gabriel Paubert [this message]
2014-02-10 12:32               ` David Laight
2014-02-10 13:00                 ` Gabriel Paubert
2014-02-10 17:03           ` James Yang
2014-02-11  7:26             ` Gabriel Paubert
2014-02-11 20:57               ` Linux-3.14-rc2: Order of serial node compatibles in DTS files Stephen N Chivers
2014-02-11 22:33                 ` Kumar Gala
2014-02-11 22:51                   ` Sebastian Hesselbarth
2014-02-11 23:38                     ` Stephen N Chivers
2014-02-11 23:43                       ` Sebastian Hesselbarth
2014-02-12 11:00                         ` Arnd Bergmann
2014-02-11 23:41                     ` Scott Wood
2014-02-11 23:46                       ` Sebastian Hesselbarth
2014-02-12  0:21                         ` Stephen N Chivers
2014-02-12  5:28                           ` Kevin Hao
2014-02-12  8:30                             ` Sebastian Hesselbarth
2014-02-12 10:31                               ` Kevin Hao
2014-02-12 11:26                                 ` Sebastian Hesselbarth
2014-02-12 11:32                                   ` Kevin Hao
2014-02-12  8:25                           ` Sebastian Hesselbarth
2014-02-12 10:35                             ` Kevin Hao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140210122138.GA30356@visitor2.iram.es \
    --to=paubert@iram$(echo .)es \
    --cc=David.Laight@ACULAB$(echo .)COM \
    --cc=James.Yang@freescale$(echo .)com \
    --cc=cproctor@csc$(echo .)com.au \
    --cc=linuxppc-dev@lists$(echo .)ozlabs.org \
    --cc=schivers@csc$(echo .)com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox