* Shutdown-time hangs in -next in locktorture
@ 2025-12-20 0:29 Paul E. McKenney
2025-12-20 5:01 ` Paul E. McKenney
2025-12-20 12:52 ` Peter Zijlstra
0 siblings, 2 replies; 5+ messages in thread
From: Paul E. McKenney @ 2025-12-20 0:29 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, linux-kernel, linux-next, Stephen Rothwell,
Mark Brown, kernel-team
Hello, Peter,
I started hitting shutdown-time hangs in next-20251217 which persist
in next-20251219. This hang happens on both x86 and arm64. Once I
figured out that the failure is high probability, but not deterministic,
bisection converged here:
5d1f0b2f278e ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
This commit reverts cleanly, and doing so restores hang-free operation.
The reproducer is shown below.
Thoughts?
Thanx, Paul
------------------------------------------------------------------------
for i in 1 2 3 4 5
do
tools/testing/selftests/rcutorture/bin/torture.sh --duration 20 --do-none --do-normal --do-locktorture --do-kasan --configs-locktorture "LOCK09"
ret=$?
if test "$ret" -ne 0
then
exit "$ret"
fi
echo Test $i succeeded
done
exit 0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Shutdown-time hangs in -next in locktorture
2025-12-20 0:29 Shutdown-time hangs in -next in locktorture Paul E. McKenney
@ 2025-12-20 5:01 ` Paul E. McKenney
2025-12-20 12:52 ` Peter Zijlstra
1 sibling, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2025-12-20 5:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, linux-kernel, linux-next, Stephen Rothwell,
Mark Brown, kernel-team
On Fri, Dec 19, 2025 at 04:29:27PM -0800, Paul E. McKenney wrote:
> Hello, Peter,
>
> I started hitting shutdown-time hangs in next-20251217 which persist
> in next-20251219. This hang happens on both x86 and arm64. Once I
> figured out that the failure is high probability, but not deterministic,
> bisection converged here:
>
> 5d1f0b2f278e ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
>
> This commit reverts cleanly, and doing so restores hang-free operation.
>
> The reproducer is shown below.
>
> Thoughts?
With Chris Mason's help, I checked with a friendly local LLM, which
noted that a call to rq_modified_above() remains in kernel/sched/ext.c
in function do_pick_task_scx(). Of course, that does not explain a
locktorture hang, especially given that locktorture does not build
that file. But in case it is helpful.
Thanx, Paul
> ------------------------------------------------------------------------
>
> for i in 1 2 3 4 5
> do
> tools/testing/selftests/rcutorture/bin/torture.sh --duration 20 --do-none --do-normal --do-locktorture --do-kasan --configs-locktorture "LOCK09"
> ret=$?
> if test "$ret" -ne 0
> then
> exit "$ret"
> fi
> echo Test $i succeeded
> done
> exit 0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Shutdown-time hangs in -next in locktorture
2025-12-20 0:29 Shutdown-time hangs in -next in locktorture Paul E. McKenney
2025-12-20 5:01 ` Paul E. McKenney
@ 2025-12-20 12:52 ` Peter Zijlstra
2025-12-20 16:49 ` Paul E. McKenney
1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2025-12-20 12:52 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Ingo Molnar, linux-kernel, linux-next, Stephen Rothwell,
Mark Brown, kernel-team
On Fri, Dec 19, 2025 at 04:29:26PM -0800, Paul E. McKenney wrote:
> Hello, Peter,
>
> I started hitting shutdown-time hangs in next-20251217 which persist
> in next-20251219. This hang happens on both x86 and arm64. Once I
> figured out that the failure is high probability, but not deterministic,
> bisection converged here:
>
> 5d1f0b2f278e ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
That commit no longer exists in tip/sched/core; it was fixed a few days
ago, except other problems made -next use an old tip branch which has
caused this fix to have delayed visibility :-(
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Shutdown-time hangs in -next in locktorture
2025-12-20 12:52 ` Peter Zijlstra
@ 2025-12-20 16:49 ` Paul E. McKenney
2026-01-15 0:16 ` Paul E. McKenney
0 siblings, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2025-12-20 16:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, linux-kernel, linux-next, Stephen Rothwell,
Mark Brown, kernel-team
On Sat, Dec 20, 2025 at 01:52:01PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 19, 2025 at 04:29:26PM -0800, Paul E. McKenney wrote:
> > Hello, Peter,
> >
> > I started hitting shutdown-time hangs in next-20251217 which persist
> > in next-20251219. This hang happens on both x86 and arm64. Once I
> > figured out that the failure is high probability, but not deterministic,
> > bisection converged here:
> >
> > 5d1f0b2f278e ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
>
> That commit no longer exists in tip/sched/core; it was fixed a few days
> ago, except other problems made -next use an old tip branch which has
> caused this fix to have delayed visibility :-(
Very good, I will retry later.
Thanx, Paul
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Shutdown-time hangs in -next in locktorture
2025-12-20 16:49 ` Paul E. McKenney
@ 2026-01-15 0:16 ` Paul E. McKenney
0 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2026-01-15 0:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, linux-kernel, linux-next, Stephen Rothwell,
Mark Brown, kernel-team
On Sat, Dec 20, 2025 at 08:49:01AM -0800, Paul E. McKenney wrote:
> On Sat, Dec 20, 2025 at 01:52:01PM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 19, 2025 at 04:29:26PM -0800, Paul E. McKenney wrote:
> > > Hello, Peter,
> > >
> > > I started hitting shutdown-time hangs in next-20251217 which persist
> > > in next-20251219. This hang happens on both x86 and arm64. Once I
> > > figured out that the failure is high probability, but not deterministic,
> > > bisection converged here:
> > >
> > > 5d1f0b2f278e ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
> >
> > That commit no longer exists in tip/sched/core; it was fixed a few days
> > ago, except other problems made -next use an old tip branch which has
> > caused this fix to have delayed visibility :-(
>
> Very good, I will retry later.
A bit later than I was planning, but here we are on next-20260113. This
has a very similar failure on arm64 with that same repeat-by as before:
tools/testing/selftests/rcutorture/bin/torture.sh --duration 20 --do-none --do-normal --do-locktorture --do-kasan --configs-locktorture "LOCK09"
Bisection converges here:
704069649b5b ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
This commit does not revert cleanly, but retesting on both this commit
and the previous commit confirms the bisection result.
I have not yet checked this carefully on x86.
Or is this another case of stale commits in -next? If not, please let
me know if there are debug options/patches that would be helpful.
Thanx, Paul
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-01-15 0:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-20 0:29 Shutdown-time hangs in -next in locktorture Paul E. McKenney
2025-12-20 5:01 ` Paul E. McKenney
2025-12-20 12:52 ` Peter Zijlstra
2025-12-20 16:49 ` Paul E. McKenney
2026-01-15 0:16 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox