public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed
From: Michael Ellerman <mpe@ellerman•id.au>
To: Greg Kurz <groug@kaod•org>
Cc: Nathan Lynch <nathanl@linux•ibm.com>,
	linuxppc-dev@lists•ozlabs.org,
	Michael Roth <mdroth@linux•vnet.ibm.com>,
	Thiago Jung Bauermann <bauerman@linux•ibm.com>,
	Cedric Le Goater <clg@kaod•org>
Subject: Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
Date: Wed, 05 Aug 2020 13:07:08 +1000	[thread overview]
Message-ID: <87zh79yen7.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <20200804161609.6cb2cb71@bahia.lan>

Greg Kurz <groug@kaod•org> writes:
> On Tue, 04 Aug 2020 23:35:10 +1000
> Michael Ellerman <mpe@ellerman•id.au> wrote:
>> There is a bit of history to this code, but not in a good way :)
>> 
>> Michael Roth <mdroth@linux•vnet.ibm.com> writes:
>> > For a power9 KVM guest with XIVE enabled, running a test loop
>> > where we hotplug 384 vcpus and then unplug them, the following traces
>> > can be seen (generally within a few loops) either from the unplugged
>> > vcpu:
>> >
>> >   [ 1767.353447] cpu 65 (hwid 65) Ready to die...
>> >   [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
>> >   [ 1767.952311] list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
>> ...
>> >
>> > At that point the worker thread assumes the unplugged CPU is in some
>> > unknown/dead state and procedes with the cleanup, causing the race with
>> > the XIVE cleanup code executed by the unplugged CPU.
>> >
>> > Fix this by inserting an msleep() after each RTAS call to avoid
>> 
>> We previously had an msleep(), but it was removed:
>> 
>>   b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")
>
> Ah, I hadn't seen that one...
>
>> > pseries_cpu_die() returning prematurely, and double the number of
>> > attempts so we wait at least a total of 5 seconds. While this isn't an
>> > ideal solution, it is similar to how we dealt with a similar issue for
>> > cede_offline mode in the past (940ce422a3).
>> 
>> Thiago tried to fix this previously but there was a bit of discussion
>> that didn't quite resolve:
>> 
>>   https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauerman@linux.ibm.com/
>
> Yeah it appears that the motivation at the time was to make the "Querying DEAD?"
> messages to disappear and to avoid potentially concurrent calls to rtas-stop-self
> which is prohibited by PAPR... not fixing actual crashes.

I'm pretty sure at one point we were triggering crashes *in* RTAS via
this path, I think that got resolved.

>> Spinning forever seems like a bad idea, but as has been demonstrated at
>> least twice now, continuing when we don't know the state of the other
>> CPU can lead to straight up crashes.
>> 
>> So I think I'm persuaded that it's preferable to have the kernel stuck
>> spinning rather than oopsing.
>> 
>
> +1
>
>> I'm 50/50 on whether we should have a cond_resched() in the loop. My
>> first instinct is no, if we're stuck here for 20s a stack trace would be
>> good. But then we will probably hit that on some big and/or heavily
>> loaded machine.
>> 
>> So possibly we should call cond_resched() but have some custom logic in
>> the loop to print a warning if we are stuck for more than some
>> sufficiently long amount of time.
>
> How long should that be ?

Yeah good question.

I guess step one would be seeing how long it can take on the 384 vcpu
machine. And we can probably test on some other big machines.

Hopefully Nathan can give us some idea of how long he's seen it take on
large systems? I know he was concerned about the 20s timeout of the
softlockup detector.

Maybe a minute or two?

>> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
>> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
>> 
>> This is not public.
>
> I'll have a look at changing that.

Thanks.

cheers

  reply	other threads:[~2020-08-05  3:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-04  3:29 [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death Michael Roth
2020-08-04 13:35 ` Michael Ellerman
2020-08-04 14:16   ` Greg Kurz
2020-08-05  3:07     ` Michael Ellerman [this message]
2020-08-05  4:01       ` Thiago Jung Bauermann
2020-08-05  4:37       ` Michael Roth
2020-08-05 22:29         ` Michael Roth
2020-08-05 22:31           ` Michael Roth
2020-08-06 12:51           ` Michael Ellerman
2020-08-07  7:05       ` Nathan Lynch
2020-08-11  5:39         ` Michael Roth
2020-08-11 11:56           ` Michael Ellerman
2020-08-11 14:46             ` Nathan Lynch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zh79yen7.fsf@mpe.ellerman.id.au \
    --to=mpe@ellerman$(echo .)id.au \
    --cc=bauerman@linux$(echo .)ibm.com \
    --cc=clg@kaod$(echo .)org \
    --cc=groug@kaod$(echo .)org \
    --cc=linuxppc-dev@lists$(echo .)ozlabs.org \
    --cc=mdroth@linux$(echo .)vnet.ibm.com \
    --cc=nathanl@linux$(echo .)ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox