From: Michael Ellerman <mpe@ellerman•id.au>
To: Sukadev Bhattiprolu <sukadev@linux•vnet.ibm.com>
Cc: "bruno@wolff•to" <bruno@wolff•to>,
Michael Ellerman <michaele@au1•ibm.com>,
"jwboyer@redhat•com" <jwboyer@redhat•com>,
"linux-kernel@vger•kernel.org" <linux-kernel@vger•kernel.org>,
"peterz@infrdead•org" <peterz@infrdead•org>,
"linuxppc-dev@lists•ozlabs.org" <linuxppc-dev@lists•ozlabs.org>,
Dietmar Eggemann <dietmar.eggemann@arm•com>
Subject: Re: scheduler crash on Power
Date: Mon, 04 Aug 2014 13:20:32 +1000 [thread overview]
Message-ID: <1407122432.2286.0.camel@concordia> (raw)
In-Reply-To: <20140801212447.GA25435@us.ibm.com>
On Fri, 2014-08-01 at 14:24 -0700, Sukadev Bhattiprolu wrote:
> Dietmar Eggemann [dietmar.eggemann@arm•com] wrote:
> | > ltcbrazos2-lp07 login: [ 181.915974] ------------[ cut here ]------------
> | > [ 181.915991] WARNING: at ../kernel/sched/core.c:5881
> |
> | This warning indicates the problem. One of the struct sched_domains does
> | not have it's groups member set.
> |
> | And its happening during a rebuild of the sched domain hierarchy, not
> | during the initial build.
> |
> | You could run your system with the following patch-let (on top of
> | https://lkml.org/lkml/2014/7/17/288) w/ and w/o the perf related
> | patches (w/ CONFIG_SCHED_DEBUG enabled).
> |
> | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu,
> | struct sched_domain *sd)
> | {
> | struct sched_group *sg = sd->groups;
> |
> | +#ifdef CONFIG_SCHED_DEBUG
> | + printk("sd name: %s span: %pc\n", sd->name, sd->span);
> | +#endif
> | WARN_ON(!sg);
> |
> | do {
> |
> | This will show if the rebuild of the sched domain hierarchy happens on
> | both systems and hopefully indicate for which sched_domain the
> | sd->groups is not set.
>
> Thanks for the patch. It appears that the NUMA sched domain does not
> have the sd->groups set - snippet of the error (with your patch and
> Peter's patch)
>
> [ 181.914494] build_sched_groups: got group c000000006da0000 with cpus:
> [ 181.914498] build_sched_groups: got group c0000000dd830000 with cpus:
> [ 181.915234] sd name: SMT span: 8-15
> [ 181.915239] sd name: DIE span: 0-7
> [ 181.915242] sd name: NUMA span: 0-15
> [ 181.915250] ------------[ cut here ]------------
> [ 181.915253] WARNING: at ../kernel/sched/core.c:5891
>
> Patched code:
>
> 5884 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
> 5885 {
> 5886 struct sched_group *sg = sd->groups;
> 5887
> 5888 #ifdef CONFIG_SCHED_DEBUG
> 5889 printk("sd name: %s span: %pc\n", sd->name, sd->span);
> 5890 #endif
> 5891 WARN_ON(!sg);
>
> Complete log below.
>
> I was able to bisect it down to this patch in the 24x7 patchset
>
> https://lkml.org/lkml/2014/5/27/804
>
> I replaced the kfree(page) calls in the patch with
> kmem_cache_free(hv_page_cache, page).
>
> The problem sems to disappear if the call to create_events_from_catalog()
> in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch.
Is that patch just clobbering memory it doesn't own and corrupting the
scheduler data structures?
cheers
next prev parent reply other threads:[~2014-08-04 3:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-30 7:22 scheduler crash on Power Sukadev Bhattiprolu
2014-07-31 11:57 ` Dietmar Eggemann
2014-08-01 21:24 ` Sukadev Bhattiprolu
2014-08-04 3:20 ` Michael Ellerman [this message]
2014-08-04 11:31 ` Dietmar Eggemann
2014-08-01 1:53 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1407122432.2286.0.camel@concordia \
--to=mpe@ellerman$(echo .)id.au \
--cc=bruno@wolff$(echo .)to \
--cc=dietmar.eggemann@arm$(echo .)com \
--cc=jwboyer@redhat$(echo .)com \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=linuxppc-dev@lists$(echo .)ozlabs.org \
--cc=michaele@au1$(echo .)ibm.com \
--cc=peterz@infrdead$(echo .)org \
--cc=sukadev@linux$(echo .)vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox