public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed
From: Michael Ellerman <mpe@ellerman•id.au>
To: Zhouyi Zhou <zhouzhouyi@gmail•com>,
	"Paul E. McKenney" <paulmck@kernel•org>
Cc: rcu <rcu@vger•kernel.org>,
	Miguel Ojeda <miguel.ojeda.sandonis@gmail•com>,
	linuxppc-dev <linuxppc-dev@lists•ozlabs.org>,
	Nicholas Piggin <npiggin@gmail•com>
Subject: Re: rcu_sched self-detected stall on CPU
Date: Sun, 10 Apr 2022 21:33:43 +1000	[thread overview]
Message-ID: <871qy56ulk.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <CAABZP2zEU8eULq30ZLcUeqxjXuLTKO4b3wm_Jo458Nq_JJ7pEw@mail.gmail.com>

Zhouyi Zhou <zhouzhouyi@gmail•com> writes:
> On Fri, Apr 8, 2022 at 10:07 PM Paul E. McKenney <paulmck@kernel•org> wrote:
>> On Fri, Apr 08, 2022 at 06:02:19PM +0800, Zhouyi Zhou wrote:
>> > On Fri, Apr 8, 2022 at 3:23 PM Michael Ellerman <mpe@ellerman•id.au> wrote:
...
>> > > I haven't seen it in my testing. But using Miguel's config I can
>> > > reproduce it seemingly on every boot.
>> > >
>> > > For me it bisects to:
>> > >
>> > >   35de589cb879 ("powerpc/time: improve decrementer clockevent processing")
>> > >
>> > > Which seems plausible.
>> > I also bisect to 35de589cb879 ("powerpc/time: improve decrementer
>> > clockevent processing")
...
>>
>> > > Reverting that on mainline makes the bug go away.

>> > I also revert that on the mainline, and am currently doing a pressure
>> > test (by repeatedly invoking qemu and checking the console.log) on PPC
>> > VM in Oregon State University.

> After 306 rounds of stress test on mainline without triggering the bug
> (last for 4 hours and 27 minutes), I think the bug is indeed caused by
> 35de589cb879 ("powerpc/time: improve decrementer clockevent
> processing") and stop the test for now.

Thanks for testing, that's pretty conclusive.

I'm not inclined to actually revert it yet.

We need to understand if there's actually a bug in the patch, or if it's
just exposing some existing bug/bad behavior we have. The fact that it
only appears with CONFIG_HIGH_RES_TIMERS=n is suspicious.

Do we have some code that inadvertently relies on something enabled by
HIGH_RES_TIMERS=y, or do we have a bug that is hidden by HIGH_RES_TIMERS=y ?

cheers

  reply	other threads:[~2022-04-10 11:34 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-05 21:41 rcu_sched self-detected stall on CPU Miguel Ojeda
2022-04-06  9:31 ` Zhouyi Zhou
2022-04-06 17:00   ` Paul E. McKenney
2022-04-06 18:25     ` Zhouyi Zhou
2022-04-06 19:50       ` Paul E. McKenney
2022-04-07  2:26         ` Zhouyi Zhou
2022-04-07 10:07           ` Miguel Ojeda
2022-04-07 15:15             ` Paul E. McKenney
2022-04-07 17:05               ` Miguel Ojeda
2022-04-07 17:55                 ` Paul E. McKenney
2022-04-07 23:14                   ` Zhouyi Zhou
2022-04-08  1:43                     ` Paul E. McKenney
2022-04-08  7:23     ` Michael Ellerman
2022-04-08 10:02       ` Zhouyi Zhou
2022-04-08 14:07         ` Paul E. McKenney
2022-04-08 14:25           ` Zhouyi Zhou
2022-04-10 11:33             ` Michael Ellerman [this message]
2022-04-11  3:05               ` Paul E. McKenney
2022-04-12  6:53                 ` Michael Ellerman
2022-04-12 13:36                   ` Paul E. McKenney
2022-04-08 13:52       ` Miguel Ojeda
2022-04-08 14:06       ` Paul E. McKenney
2022-04-08 14:42       ` Michael Ellerman
2022-04-08 15:52         ` Paul E. McKenney
2022-04-08 17:02         ` Miguel Ojeda
2022-04-13  5:11         ` Nicholas Piggin
2022-04-13  6:10           ` Low-res tick handler device not going to ONESHOT_STOPPED when tick is stopped (was: rcu_sched self-detected stall on CPU) Nicholas Piggin
2022-04-14 17:15             ` Paul E. McKenney
2022-04-22 15:53           ` Thomas Gleixner
2022-04-23  2:29             ` Re: Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871qy56ulk.fsf@mpe.ellerman.id.au \
    --to=mpe@ellerman$(echo .)id.au \
    --cc=linuxppc-dev@lists$(echo .)ozlabs.org \
    --cc=miguel.ojeda.sandonis@gmail$(echo .)com \
    --cc=npiggin@gmail$(echo .)com \
    --cc=paulmck@kernel$(echo .)org \
    --cc=rcu@vger$(echo .)kernel.org \
    --cc=zhouzhouyi@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox