public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed
From: Thiago Jung Bauermann <bauerman@linux•ibm.com>
To: Michael Ellerman <mpe@ellerman•id.au>
Cc: Nathan Lynch <nathanl@linux•ibm.com>,
	Cedric Le Goater <clg@kaod•org>,
	linuxppc-dev@lists•ozlabs.org, Greg Kurz <groug@kaod•org>,
	Michael Roth <mdroth@linux•vnet.ibm.com>
Subject: Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
Date: Wed, 05 Aug 2020 01:01:33 -0300	[thread overview]
Message-ID: <878setaggy.fsf@morokweng.localdomain> (raw)
In-Reply-To: <87zh79yen7.fsf@mpe.ellerman.id.au>


Michael Ellerman <mpe@ellerman•id.au> writes:

> Greg Kurz <groug@kaod•org> writes:
>> On Tue, 04 Aug 2020 23:35:10 +1000
>> Michael Ellerman <mpe@ellerman•id.au> wrote:
>>> There is a bit of history to this code, but not in a good way :)
>>>
>>> Michael Roth <mdroth@linux•vnet.ibm.com> writes:
>>> > For a power9 KVM guest with XIVE enabled, running a test loop
>>> > where we hotplug 384 vcpus and then unplug them, the following traces
>>> > can be seen (generally within a few loops) either from the unplugged
>>> > vcpu:
>>> >
>>> >   [ 1767.353447] cpu 65 (hwid 65) Ready to die...
>>> >   [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
>>> >   [ 1767.952311] list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
>>> ...
>>> >
>>> > At that point the worker thread assumes the unplugged CPU is in some
>>> > unknown/dead state and procedes with the cleanup, causing the race with
>>> > the XIVE cleanup code executed by the unplugged CPU.
>>> >
>>> > Fix this by inserting an msleep() after each RTAS call to avoid
>>>
>>> We previously had an msleep(), but it was removed:
>>>
>>>   b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")
>>
>> Ah, I hadn't seen that one...
>>
>>> > pseries_cpu_die() returning prematurely, and double the number of
>>> > attempts so we wait at least a total of 5 seconds. While this isn't an
>>> > ideal solution, it is similar to how we dealt with a similar issue for
>>> > cede_offline mode in the past (940ce422a3).
>>>
>>> Thiago tried to fix this previously but there was a bit of discussion
>>> that didn't quite resolve:
>>>
>>>   https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauerman@linux.ibm.com/
>>
>> Yeah it appears that the motivation at the time was to make the "Querying DEAD?"
>> messages to disappear and to avoid potentially concurrent calls to rtas-stop-self
>> which is prohibited by PAPR... not fixing actual crashes.
>
> I'm pretty sure at one point we were triggering crashes *in* RTAS via
> this path, I think that got resolved.

Yes, pHyp's RTAS now tolerates concurrent calls to stop-self. The
original bug that was reported when I worked on this ended in an RTAS
crash because of this timeout. The crash was fixed then.

--
Thiago Jung Bauermann
IBM Linux Technology Center

  reply	other threads:[~2020-08-05  4:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-04  3:29 [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death Michael Roth
2020-08-04 13:35 ` Michael Ellerman
2020-08-04 14:16   ` Greg Kurz
2020-08-05  3:07     ` Michael Ellerman
2020-08-05  4:01       ` Thiago Jung Bauermann [this message]
2020-08-05  4:37       ` Michael Roth
2020-08-05 22:29         ` Michael Roth
2020-08-05 22:31           ` Michael Roth
2020-08-06 12:51           ` Michael Ellerman
2020-08-07  7:05       ` Nathan Lynch
2020-08-11  5:39         ` Michael Roth
2020-08-11 11:56           ` Michael Ellerman
2020-08-11 14:46             ` Nathan Lynch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878setaggy.fsf@morokweng.localdomain \
    --to=bauerman@linux$(echo .)ibm.com \
    --cc=clg@kaod$(echo .)org \
    --cc=groug@kaod$(echo .)org \
    --cc=linuxppc-dev@lists$(echo .)ozlabs.org \
    --cc=mdroth@linux$(echo .)vnet.ibm.com \
    --cc=mpe@ellerman$(echo .)id.au \
    --cc=nathanl@linux$(echo .)ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox