public inbox for linux-next@vger.kernel.org 
 help / color / mirror / Atom feed
* Shutdown-time hangs in -next in locktorture
@ 2025-12-20  0:29 Paul E. McKenney
  2025-12-20  5:01 ` Paul E. McKenney
  2025-12-20 12:52 ` Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Paul E. McKenney @ 2025-12-20  0:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, linux-next, Stephen Rothwell,
	Mark Brown, kernel-team

Hello, Peter,

I started hitting shutdown-time hangs in next-20251217 which persist
in next-20251219.  This hang happens on both x86 and arm64.  Once I
figured out that the failure is high probability, but not deterministic,
bisection converged here:

5d1f0b2f278e ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")

This commit reverts cleanly, and doing so restores hang-free operation.

The reproducer is shown below.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

for i in 1 2 3 4 5
do
	tools/testing/selftests/rcutorture/bin/torture.sh --duration 20 --do-none --do-normal --do-locktorture --do-kasan --configs-locktorture "LOCK09"
	ret=$?
	if test "$ret" -ne 0
	then
		exit "$ret"
	fi
	echo Test $i succeeded
done
exit 0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Shutdown-time hangs in -next in locktorture
  2025-12-20  0:29 Shutdown-time hangs in -next in locktorture Paul E. McKenney
@ 2025-12-20  5:01 ` Paul E. McKenney
  2025-12-20 12:52 ` Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2025-12-20  5:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, linux-next, Stephen Rothwell,
	Mark Brown, kernel-team

On Fri, Dec 19, 2025 at 04:29:27PM -0800, Paul E. McKenney wrote:
> Hello, Peter,
> 
> I started hitting shutdown-time hangs in next-20251217 which persist
> in next-20251219.  This hang happens on both x86 and arm64.  Once I
> figured out that the failure is high probability, but not deterministic,
> bisection converged here:
> 
> 5d1f0b2f278e ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
> 
> This commit reverts cleanly, and doing so restores hang-free operation.
> 
> The reproducer is shown below.
> 
> Thoughts?

With Chris Mason's help, I checked with a friendly local LLM, which
noted that a call to rq_modified_above() remains in kernel/sched/ext.c
in function do_pick_task_scx().  Of course, that does not explain a
locktorture hang, especially given that locktorture does not build
that file.  But in case it is helpful.

							Thanx, Paul

> ------------------------------------------------------------------------
> 
> for i in 1 2 3 4 5
> do
> 	tools/testing/selftests/rcutorture/bin/torture.sh --duration 20 --do-none --do-normal --do-locktorture --do-kasan --configs-locktorture "LOCK09"
> 	ret=$?
> 	if test "$ret" -ne 0
> 	then
> 		exit "$ret"
> 	fi
> 	echo Test $i succeeded
> done
> exit 0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Shutdown-time hangs in -next in locktorture
  2025-12-20  0:29 Shutdown-time hangs in -next in locktorture Paul E. McKenney
  2025-12-20  5:01 ` Paul E. McKenney
@ 2025-12-20 12:52 ` Peter Zijlstra
  2025-12-20 16:49   ` Paul E. McKenney
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2025-12-20 12:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, linux-next, Stephen Rothwell,
	Mark Brown, kernel-team

On Fri, Dec 19, 2025 at 04:29:26PM -0800, Paul E. McKenney wrote:
> Hello, Peter,
> 
> I started hitting shutdown-time hangs in next-20251217 which persist
> in next-20251219.  This hang happens on both x86 and arm64.  Once I
> figured out that the failure is high probability, but not deterministic,
> bisection converged here:
> 
> 5d1f0b2f278e ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")

That commit no longer exists in tip/sched/core; it was fixed a few days
ago, except other problems made -next use an old tip branch which has
caused this fix to have delayed visibility :-(

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Shutdown-time hangs in -next in locktorture
  2025-12-20 12:52 ` Peter Zijlstra
@ 2025-12-20 16:49   ` Paul E. McKenney
  2026-01-15  0:16     ` Paul E. McKenney
  0 siblings, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2025-12-20 16:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, linux-next, Stephen Rothwell,
	Mark Brown, kernel-team

On Sat, Dec 20, 2025 at 01:52:01PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 19, 2025 at 04:29:26PM -0800, Paul E. McKenney wrote:
> > Hello, Peter,
> > 
> > I started hitting shutdown-time hangs in next-20251217 which persist
> > in next-20251219.  This hang happens on both x86 and arm64.  Once I
> > figured out that the failure is high probability, but not deterministic,
> > bisection converged here:
> > 
> > 5d1f0b2f278e ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
> 
> That commit no longer exists in tip/sched/core; it was fixed a few days
> ago, except other problems made -next use an old tip branch which has
> caused this fix to have delayed visibility :-(

Very good, I will retry later.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Shutdown-time hangs in -next in locktorture
  2025-12-20 16:49   ` Paul E. McKenney
@ 2026-01-15  0:16     ` Paul E. McKenney
  0 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2026-01-15  0:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, linux-next, Stephen Rothwell,
	Mark Brown, kernel-team

On Sat, Dec 20, 2025 at 08:49:01AM -0800, Paul E. McKenney wrote:
> On Sat, Dec 20, 2025 at 01:52:01PM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 19, 2025 at 04:29:26PM -0800, Paul E. McKenney wrote:
> > > Hello, Peter,
> > > 
> > > I started hitting shutdown-time hangs in next-20251217 which persist
> > > in next-20251219.  This hang happens on both x86 and arm64.  Once I
> > > figured out that the failure is high probability, but not deterministic,
> > > bisection converged here:
> > > 
> > > 5d1f0b2f278e ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
> > 
> > That commit no longer exists in tip/sched/core; it was fixed a few days
> > ago, except other problems made -next use an old tip branch which has
> > caused this fix to have delayed visibility :-(
> 
> Very good, I will retry later.

A bit later than I was planning, but here we are on next-20260113.  This
has a very similar failure on arm64 with that same repeat-by as before:

tools/testing/selftests/rcutorture/bin/torture.sh --duration 20 --do-none --do-normal --do-locktorture --do-kasan --configs-locktorture "LOCK09"

Bisection converges here:

704069649b5b ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")

This commit does not revert cleanly, but retesting on both this commit
and the previous commit confirms the bisection result.

I have not yet checked this carefully on x86.

Or is this another case of stale commits in -next?  If not, please let
me know if there are debug options/patches that would be helpful.



							Thanx, Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-01-15  0:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-20  0:29 Shutdown-time hangs in -next in locktorture Paul E. McKenney
2025-12-20  5:01 ` Paul E. McKenney
2025-12-20 12:52 ` Peter Zijlstra
2025-12-20 16:49   ` Paul E. McKenney
2026-01-15  0:16     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox