public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed
From: Michael Ellerman <mpe@ellerman•id.au>
To: Sukadev Bhattiprolu <sukadev@linux•vnet.ibm.com>
Cc: "bruno@wolff•to" <bruno@wolff•to>,
	Michael Ellerman <michaele@au1•ibm.com>,
	"jwboyer@redhat•com" <jwboyer@redhat•com>,
	"linux-kernel@vger•kernel.org" <linux-kernel@vger•kernel.org>,
	"peterz@infrdead•org" <peterz@infrdead•org>,
	"linuxppc-dev@lists•ozlabs.org" <linuxppc-dev@lists•ozlabs.org>,
	Dietmar Eggemann <dietmar.eggemann@arm•com>
Subject: Re: scheduler crash on Power
Date: Mon, 04 Aug 2014 13:20:32 +1000	[thread overview]
Message-ID: <1407122432.2286.0.camel@concordia> (raw)
In-Reply-To: <20140801212447.GA25435@us.ibm.com>

On Fri, 2014-08-01 at 14:24 -0700, Sukadev Bhattiprolu wrote:
> Dietmar Eggemann [dietmar.eggemann@arm•com] wrote:
> | > ltcbrazos2-lp07 login: [  181.915974] ------------[ cut here ]------------
> | > [  181.915991] WARNING: at ../kernel/sched/core.c:5881
> | 
> | This warning indicates the problem. One of the struct sched_domains does
> | not have it's groups member set.
> | 
> | And its happening during a rebuild of the sched domain hierarchy, not
> | during the initial build.
> | 
> | You could run your system with the following patch-let (on top of
> | https://lkml.org/lkml/2014/7/17/288)  w/ and w/o the perf related
> | patches (w/ CONFIG_SCHED_DEBUG enabled).
> | 
> | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu,
> | struct sched_domain *sd)
> |  {
> |         struct sched_group *sg = sd->groups;
> | 
> | +#ifdef CONFIG_SCHED_DEBUG
> | +       printk("sd name: %s span: %pc\n", sd->name, sd->span);
> | +#endif
> |         WARN_ON(!sg);
> | 
> |         do {
> | 
> | This will show if the rebuild of the sched domain hierarchy happens on
> | both systems and hopefully indicate for which sched_domain the
> | sd->groups is not set.
> 
> Thanks for the patch. It appears that the NUMA sched domain does not
> have the sd->groups set - snippet of the error (with your patch and
> Peter's patch)
> 
> [  181.914494] build_sched_groups: got group c000000006da0000 with cpus: 
> [  181.914498] build_sched_groups: got group c0000000dd830000 with cpus: 
> [  181.915234] sd name: SMT span: 8-15
> [  181.915239] sd name: DIE span: 0-7
> [  181.915242] sd name: NUMA span: 0-15
> [  181.915250] ------------[ cut here ]------------
> [  181.915253] WARNING: at ../kernel/sched/core.c:5891
> 
> Patched code:
> 
> 	5884 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
> 	5885 {
> 	5886         struct sched_group *sg = sd->groups;
> 	5887 
> 	5888 #ifdef CONFIG_SCHED_DEBUG
> 	5889         printk("sd name: %s span: %pc\n", sd->name, sd->span);
> 	5890 #endif
> 	5891         WARN_ON(!sg);
> 
> Complete log below.
> 
> I was able to bisect it down to this patch in the 24x7 patchset
> 
> 	https://lkml.org/lkml/2014/5/27/804
> 
> I replaced the kfree(page) calls in the patch with
> kmem_cache_free(hv_page_cache, page).
> 
> The problem sems to disappear if the call to create_events_from_catalog()
> in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch.

Is that patch just clobbering memory it doesn't own and corrupting the
scheduler data structures?

cheers

  reply	other threads:[~2014-08-04  3:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-30  7:22 scheduler crash on Power Sukadev Bhattiprolu
2014-07-31 11:57 ` Dietmar Eggemann
2014-08-01 21:24   ` Sukadev Bhattiprolu
2014-08-04  3:20     ` Michael Ellerman [this message]
2014-08-04 11:31       ` Dietmar Eggemann
2014-08-01  1:53 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1407122432.2286.0.camel@concordia \
    --to=mpe@ellerman$(echo .)id.au \
    --cc=bruno@wolff$(echo .)to \
    --cc=dietmar.eggemann@arm$(echo .)com \
    --cc=jwboyer@redhat$(echo .)com \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linuxppc-dev@lists$(echo .)ozlabs.org \
    --cc=michaele@au1$(echo .)ibm.com \
    --cc=peterz@infrdead$(echo .)org \
    --cc=sukadev@linux$(echo .)vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox