public inbox for linux-next@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel•org>
To: Valentin Schneider <vschneid@redhat•com>
Cc: Chen Yu <yu.c.chen@intel•com>,
	Peter Zijlstra <peterz@infradead•org>,
	linux-kernel@vger•kernel.org, sfr@canb•auug.org.au,
	linux-next@vger•kernel.org, kernel-team@meta•com
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error
Date: Thu, 29 Aug 2024 07:13:07 -0700	[thread overview]
Message-ID: <cc537207-68a3-4dda-a8ec-6dda2fc1985d@paulmck-laptop> (raw)
In-Reply-To: <xhsmh1q27o2us.mognet@vschneid-thinkpadt14sgen2i.remote.csb>

On Thu, Aug 29, 2024 at 03:50:03PM +0200, Valentin Schneider wrote:
> On 29/08/24 03:28, Paul E. McKenney wrote:
> > On Wed, Aug 28, 2024 at 11:39:19AM -0700, Paul E. McKenney wrote:
> >>
> >> The 500*TREE03 run had exactly one failure that was the dreaded
> >> enqueue_dl_entity() failure, followed by RCU CPU stall warnings.
> >>
> >> But a huge improvement over the prior state!
> >>
> >> Plus, this failure is likely unrelated (see earlier discussions with
> >> Peter).  I just started a 5000*TREE03 run, just in case we can now
> >> reproduce this thing.
> >
> > And we can now reproduce it!  Again, this might an unrelated bug that
> > was previously a one-off (OK, OK, a two-off!).  Or this series might
> > have made it more probably.  Who knows?
> >
> > Eight of those 5000 runs got us this splat in enqueue_dl_entity():
> >
> >       WARN_ON_ONCE(on_dl_rq(dl_se));
> >
> > Immediately followed by this splat in __enqueue_dl_entity():
> >
> >       WARN_ON_ONCE(!RB_EMPTY_NODE(&dl_se->rb_node));
> >
> > These two splats always happened during rcutorture's testing of
> > RCU priority boosting.  This testing involves spawning a CPU-bound
> > low-priority real-time kthread for each CPU, which is intended to starve
> > the non-realtime RCU readers, which are in turn to be rescued by RCU
> > priority boosting.
> >
> 
> Thanks!
> 
> > I do not entirely trust the following rcutorture diagnostic, but just
> > in case it helps...
> >
> > Many of them also printed something like this as well:
> >
> > [  111.279575] Boost inversion persisted: No QS from CPU 3
> >
> > This message means that rcutorture has decided that RCU priority boosting
> > has failed, but not because a low-priority preempted task was blocking
> > the grace period, but rather because some CPU managed to be running
> > the same task in-kernel the whole time without doing a context switch.
> > In some cases (but not this one), this was simply a side-effect of
> > RCU's grace-period kthread being starved of CPU time.  Such starvation
> > is a surprise in this case because this kthread is running at higher
> > real-time priority than the kthreads that are intended to force RCU
> > priority boosting to happen.
> >
> > Again, I do not entirely trust this rcutorture diagnostic, just in case
> > it helps.
> >
> >                                                       Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > [  287.536845] rcu-torture: rcu_torture_boost is stopping
> > [  287.536867] ------------[ cut here ]------------
> > [  287.540661] WARNING: CPU: 4 PID: 132 at kernel/sched/deadline.c:2003 enqueue_dl_entity+0x50d/0x5c0
> > [  287.542299] Modules linked in:
> > [  287.542868] CPU: 4 UID: 0 PID: 132 Comm: kcompactd0 Not tainted 6.11.0-rc1-00051-gb32d207e39de #1701
> > [  287.544335] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> > [  287.546337] RIP: 0010:enqueue_dl_entity+0x50d/0x5c0
> > [  287.603245]  ? __warn+0x7e/0x120
> > [  287.603752]  ? enqueue_dl_entity+0x54b/0x5c0
> > [  287.604405]  ? report_bug+0x18e/0x1a0
> > [  287.604978]  ? handle_bug+0x3d/0x70
> > [  287.605523]  ? exc_invalid_op+0x18/0x70
> > [  287.606116]  ? asm_exc_invalid_op+0x1a/0x20
> > [  287.606765]  ? enqueue_dl_entity+0x54b/0x5c0
> > [  287.607420]  dl_server_start+0x31/0xe0
> > [  287.608013]  enqueue_task_fair+0x218/0x680
> > [  287.608643]  activate_task+0x21/0x50
> > [  287.609197]  attach_task+0x30/0x50
> > [  287.609736]  sched_balance_rq+0x65d/0xe20
> > [  287.610351]  sched_balance_newidle.constprop.0+0x1a0/0x360
> > [  287.611205]  pick_next_task_fair+0x2a/0x2e0
> > [  287.611849]  __schedule+0x106/0x8b0
> 
> 
> Assuming this is still related to switched_from_fair(), since this is hit
> during priority boosting then it would mean rt_mutex_setprio() gets
> involved, but that uses the same set of DQ/EQ flags as
> __sched_setscheduler().
> 
> I don't see any obvious path in
> 
> dequeue_task_fair()
> `\
>   dequeue_entities()
> 
> that would prevent dl_server_stop() from happening when doing the
> class-switch dequeue_task()... I don't see it in the TREE03 config, but can
> you confirm CONFIG_CFS_BANDWIDTH isn't set in that scenario?
> 
> I'm going to keep digging but I'm not entirely sure yet whether this is
> related to the switched_from_fair() hackery or not, I'll send the patch I
> have as-is and continue digging for a bit.

Makes sense to me, thank you, and glad that the diagnostics helped.

Looking forward to further fixes.  ;-)

							Thanx, Paul

  reply	other threads:[~2024-08-29 14:13 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-21 21:57 [BUG almost bisected] Splat in dequeue_rt_stack() and build error Paul E. McKenney
2024-08-22 23:01 ` Paul E. McKenney
2024-08-23  7:47 ` Peter Zijlstra
2024-08-23 12:46   ` Paul E. McKenney
2024-08-23 21:51     ` Paul E. McKenney
2024-08-24  6:54       ` Peter Zijlstra
2024-08-24 15:26         ` Paul E. McKenney
2024-08-25  2:10           ` Paul E. McKenney
2024-08-25 19:36             ` Paul E. McKenney
2024-08-26 11:44   ` Valentin Schneider
2024-08-26 16:31     ` Paul E. McKenney
2024-08-27 10:03       ` Valentin Schneider
2024-08-27 15:41         ` Valentin Schneider
2024-08-27 17:33           ` Paul E. McKenney
2024-08-27 18:35             ` Paul E. McKenney
2024-08-27 20:30               ` Valentin Schneider
2024-08-27 20:36                 ` Paul E. McKenney
2024-08-28 12:35                   ` Valentin Schneider
2024-08-28 13:03                     ` Paul E. McKenney
2024-08-28 13:40                       ` Paul E. McKenney
2024-08-28 13:44                     ` Chen Yu
2024-08-28 14:32                       ` Valentin Schneider
2024-08-28 16:35                         ` Paul E. McKenney
2024-08-28 18:17                           ` Valentin Schneider
2024-08-28 18:39                             ` Paul E. McKenney
2024-08-29 10:28                               ` Paul E. McKenney
2024-08-29 13:50                                 ` Valentin Schneider
2024-08-29 14:13                                   ` Paul E. McKenney [this message]
2024-09-08 16:32                                     ` Paul E. McKenney
2024-09-13 14:08                                       ` Paul E. McKenney
2024-09-13 16:55                                         ` Valentin Schneider
2024-09-13 18:00                                           ` Paul E. McKenney
2024-09-30 19:09                                             ` Paul E. McKenney
2024-09-30 20:44                                               ` Valentin Schneider
2024-10-01 10:10                                                 ` Paul E. McKenney
2024-10-01 12:52                                                   ` Valentin Schneider
2024-10-01 16:47                                                     ` Paul E. McKenney
2024-10-02  9:01                                                       ` Tomas Glozar
2024-10-02 12:07                                                         ` Paul E. McKenney
2024-10-10 11:24                                                         ` Tomas Glozar
2024-10-10 15:01                                                           ` Paul E. McKenney
2024-10-10 23:28                                                             ` Paul E. McKenney
2024-10-14 18:55                                                               ` Paul E. McKenney
2024-10-21 19:25                                                                 ` Paul E. McKenney
2024-11-14 18:16                                                                   ` Paul E. McKenney
2024-12-15 18:31                                                                     ` Paul E. McKenney
2024-12-16 14:38                                                                       ` Tomas Glozar
2024-12-16 19:36                                                                         ` Paul E. McKenney
2024-12-17 16:42                                                                           ` Paul E. McKenney
2024-10-22  6:33                                                           ` Tomas Glozar
2024-10-03  8:40 ` Peter Zijlstra
2024-10-03  8:47   ` Peter Zijlstra
2024-10-03  9:27     ` Peter Zijlstra
2024-10-03 12:28       ` Peter Zijlstra
2024-10-03 12:45         ` Paul E. McKenney
2024-10-03 14:22           ` Peter Zijlstra
2024-10-03 16:04             ` Paul E. McKenney
2024-10-03 18:50               ` Peter Zijlstra
2024-10-03 19:12                 ` Paul E. McKenney
2024-10-04 13:22                 ` Paul E. McKenney
2024-10-04 13:35                 ` Peter Zijlstra
2024-10-06 20:44                   ` Paul E. McKenney
2024-10-07  9:34                     ` Peter Zijlstra
2024-10-08 11:11                     ` Peter Zijlstra
2024-10-08 16:24                       ` Paul E. McKenney
2024-10-08 22:34                         ` Paul E. McKenney
2024-10-03 12:44       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc537207-68a3-4dda-a8ec-6dda2fc1985d@paulmck-laptop \
    --to=paulmck@kernel$(echo .)org \
    --cc=kernel-team@meta$(echo .)com \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linux-next@vger$(echo .)kernel.org \
    --cc=peterz@infradead$(echo .)org \
    --cc=sfr@canb$(echo .)auug.org.au \
    --cc=vschneid@redhat$(echo .)com \
    --cc=yu.c.chen@intel$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox