From: Gabriel Paubert <paubert@iram•es>
To: Benjamin Herrenschmidt <benh@kernel•crashing.org>
Cc: Tejun Heo <htejun@gmail•com>,
linuxppc-dev@lists•ozlabs.org, Christoph Lameter <cl@gentwo•org>
Subject: Re: power and percpu: Could we move the paca into the percpu area?
Date: Wed, 11 Jun 2014 23:03:51 +0200 [thread overview]
Message-ID: <20140611210351.GA7155@visitor2.iram.es> (raw)
In-Reply-To: <1402518131.14780.60.camel@pasglop>
On Thu, Jun 12, 2014 at 06:22:11AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2014-06-11 at 14:37 -0500, Christoph Lameter wrote:
> > Looking at arch/powerpc/include/asm/percpu.h I see that the per cpu offset
> > comes from a local_paca field and local_paca is in r13. That means that
> > for all percpu operations we first have to determine the address through a
> > memory access.
> >
> > Would it be possible to put the paca at the beginning of the percpu data
> > area and then have r31 point to the percpu area?
> >
> > power has these nice instructions that fetch from an offset relative to a
> > base register which could be used throughout for percpu operations in the
> > kernel (similar to x86 segment registers).
> >
> > With that we may also be able to use the atomic ops for fast percpu access
> > so that we can avoid the irq enable/disable sequence that is now required
> > for percpu atomics. Would result in fast and reliable percpu
> > counters for powerpc.
>
> So.... this is complicated :) And it's something I did want to tackle
> for a while but haven't had a chance.
>
> The issues off the top of my head are:
>
> - The PACA must be accessible in real mode, which means that when
> running under a hypervisor, it must be allocated in the "RMA" which is
> the low part of memory up to a limit that depends on the hypervisor, but
> can be as low as 128M on some older machines.
>
> - However, we use percpu more than paca in normal kernel C code, the
> PACA is mostly used during exception entry/exit, KVM, and for interrupt
> soft-enable/disable. So it might make sense to change things so that r13
> contains the per-cpu offset instead. However, doing that change and
> updating the asm to cope isn't a trivial undertaking.
>
> - Direct offset from r13 in asm ... works as long as the offset is
> within the signed 32k range. Otherwise we need at least one more addis
> instruction. Anton mentioned the linker may have some smarts however for
> removing that addis if the high part of the offset happens to be 0.
>
> - For atomics, the jury is still out as to whether it would be faster
> or not. The atomic ops (lwarx/stwcx.) are expensive. They flush the
> value out of the L1 (to L2) among others. On the other hand we have
> interrupts soft-disable so masking interrupts isn't very expensive.
> Unmasking, while cheap, is currently out of line however. I have been
> wondering if we could move some of the soft-irq state instead to a CR
> field and mark that -ffixed with gcc so we can make irq
> soft-disable/enable even faster and more in-line.
Actually, from gcc/config/rs6000.h:
/* 1 for registers that have pervasive standard uses
and are not available for the register allocator.
On RS/6000, r1 is used for the stack. On Darwin, r2 is available
as a local register; for all other OS's r2 is the TOC pointer.
cr5 is not supposed to be used.
On System V implementations, r13 is fixed and not available for use. */
#define FIXED_REGISTERS \
{0, 1, FIXED_R2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, FIXED_R13, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, \
/* AltiVec registers. */ \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
1, 1 \
, 1, 1, 1, 1, 1, 1 \
}
So cr5, which is number 73, is never used by gcc.
Disassembling a few kernels seems to confirm this.
This gives you 4 booleans to play with...
Gabriel
next prev parent reply other threads:[~2014-06-11 21:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-11 19:37 power and percpu: Could we move the paca into the percpu area? Christoph Lameter
2014-06-11 20:22 ` Benjamin Herrenschmidt
2014-06-11 21:03 ` Gabriel Paubert [this message]
2014-06-12 12:26 ` Segher Boessenkool
2014-06-12 21:57 ` Benjamin Herrenschmidt
2014-06-12 22:15 ` Segher Boessenkool
2014-06-13 11:55 ` Gabriel Paubert
2014-06-13 14:16 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140611210351.GA7155@visitor2.iram.es \
--to=paubert@iram$(echo .)es \
--cc=benh@kernel$(echo .)crashing.org \
--cc=cl@gentwo$(echo .)org \
--cc=htejun@gmail$(echo .)com \
--cc=linuxppc-dev@lists$(echo .)ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox