Re: 4.13-rc3: Unrecoverable exception 4100

public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed

From: Michael Ellerman <mpe@ellerman•id.au>
To: Nicholas Piggin <npiggin@gmail•com>
Cc: Andreas Schwab <schwab@linux-m68k•org>, linuxppc-dev@ozlabs•org
Subject: Re: 4.13-rc3: Unrecoverable exception 4100
Date: Mon, 07 Aug 2017 21:49:00 +1000	[thread overview]
Message-ID: <87valzoaxf.fsf@concordia.ellerman.id.au> (raw)
In-Reply-To: <20170807202638.4239c2d4@roar.ozlabs.ibm.com>

Nicholas Piggin <npiggin@gmail•com> writes:
> On Mon, 07 Aug 2017 19:56:28 +1000
> Michael Ellerman <mpe@ellerman•id.au> wrote:
>> Nicholas Piggin <npiggin@gmail•com> writes:
>> > On Fri, 04 Aug 2017 21:54:57 +0200
>> > Andreas Schwab <schwab@linux-m68k•org> wrote:
>> >  
>> >> No, this is really a 4.13-rc1 regression.
>> >
>> > SLB miss with MSR[RI]=0 on
>> >
>> > lbz     r0,THREAD+THREAD_LOAD_FP(r7)
>> >
>> > Caused by bc4f65e4cf9d6cc43e0e9ba0b8648cf9201cd55f  
>> 
>> > Hmm, I'll see if something can be done, but that MSR_RI stuff in syscall
>> > exit makes things fairly difficult (and will reduce performance improvement
>> > of this patch anyway).
>> >
>> > I'm trying to work to a point where we have a soft-RI bit for these kinds of
>> > uses that would avoid all this complexity. Until then it may be best to
>> > just revert this patch.  
>> 
>> OK. Let me know in the next day or two what you want to do.
>> 
>> One option would be to load THREAD_LOAD_FP/THREAD_LOAD_VEC before we
>> turn off RI.
>
> Yeah, although that's a couple of unnecessary loads when we haven't
> used the fp regs.
>
> This path hits often on return from context switch, but for general
> syscalls it's less clear. And considering it's fairly tricky code at
> this point I'm thinking maybe just revert it for now?

Yeah OK.

Related thought, why the hell do we use 0x4100 for unrecoverable SLB.
That is really confusing now that we have AIL.

... lots of git blaming ...

Looks like it first appeard in the commit below, a classic :)

We should really change it some other value.

cheers

  37b9416e7d6efb2168119ef12ce0b093da28ea19
  Author:     Andrew Morton <akpm@osdl•org>
  AuthorDate: Thu Mar 18 14:58:53 2004 -0800
  Commit:     Linus Torvalds <torvalds@ppc970•osdl.org>
  CommitDate: Thu Mar 18 14:58:53 2004 -0800

  [PATCH] ppc64: Fix SLB reload bug

  From: Paul Mackerras <paulus@samba•org>

  Recently we found a particularly nasty bug in the segment handling in the
  ppc64 kernel.  It would only happen rarely under heavy load, but when it
  did the machine would lock up with the whole of memory filled with
  exception stack frames.

  The primary cause was that we were losing the translation for the kernel
  stack from the SLB, but we still had it in the ERAT for a while longer.
  Now, there is a critical region in various exception exit paths where we
  have loaded the SRR0 and SRR1 registers from GPRs and we are loading those
  GPRs and the stack pointer from the exception frame on the kernel stack.
  If we lose the ERAT entry for the kernel stack in that region, we take an
  SLB miss on the next access to the kernel stack.  Taking the exception
  overwrites the values we have put into SRR0 and SRR1, which means we lose
  state.  In fact we ended up repeating that last section of the exception
  exit path, but using the user stack pointer this time.  That caused another
  exception (or if it didn't, we loaded a new value from the user stack and
  then went around and tried to use that).  And it spiralled downwards from
  there.

  The patch below fixes the primary problem by making sure that we really
  never cast out the SLB entry for the kernel stack.  It also improves
  debuggability in case anything like this happens again by:

  - In our exception exit paths, we now check whether the RI bit in the
    SRR1 value is 0.  We already set the RI bit to 0 before starting the
    critical region, but we never checked it.  Now, if we do ever get an
    exception in one of the critical regions, we will detect it before
    returning to the critical region, and instead we will print a nasty
    message and oops.

  - In the exception entry code, we now check that the kernel stack pointer
    value we're about to use isn't a userspace address.  If it is, we print a
    nasty message and oops.

  This has been tested on G5 and pSeries (both with and without hypervisor)
  and compile-tested on iSeries.

...

+unrecov_stab:
+	EXCEPTION_PROLOG_COMMON
+	li	r6,0x4100			<- ends up in regs->trap
+	li	r20,0
+	bl	.save_remaining_regs
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	.unrecoverable_exception
+	b	1b

next prev parent reply	other threads:[~2017-08-07 11:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87vam3lhtn.fsf__12885.8019285419$1501844971$gmane$org@linux-m68k.org>
2017-08-04 19:39 ` 4.13-rc3: Unrecoverable exception 4100 Andreas Schwab
     [not found] ` <87fud7ktqf.fsf__17563.3519575515$1501875675$gmane$org@linux-m68k.org>
2017-08-04 19:54   ` Andreas Schwab
2017-08-05 13:39     ` Nicholas Piggin
2017-08-07  9:56       ` Michael Ellerman
2017-08-07 10:26         ` Nicholas Piggin
2017-08-07 11:49           ` Michael Ellerman [this message]
2017-08-07 10:40         ` Andreas Schwab
2017-08-07 11:20           ` Michael Ellerman
2017-08-04 10:59 Andreas Schwab
2017-08-04 13:00 ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87valzoaxf.fsf@concordia.ellerman.id.au \
    --to=mpe@ellerman$(echo .)id.au \
    --cc=linuxppc-dev@ozlabs$(echo .)org \
    --cc=npiggin@gmail$(echo .)com \
    --cc=schwab@linux-m68k$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox