public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed
From: Nathan Lynch <nathanl@linux•ibm.com>
To: Srikar Dronamraju <srikar@linux•vnet.ibm.com>
Cc: Satheesh Rajendran <sathnaga@linux•vnet.ibm.com>,
	linuxppc-dev <linuxppc-dev@lists•ozlabs.org>,
	Nicholas Piggin <npiggin@gmail•com>
Subject: Re: [PATCH v2 3/4] powerpc/numa: Early request for home node associativity
Date: Thu, 05 Sep 2019 15:04:00 -0500	[thread overview]
Message-ID: <87tv9qqzm7.fsf@linux.ibm.com> (raw)
In-Reply-To: <20190829055023.6171-4-srikar@linux.vnet.ibm.com>

Hi Srikar,

Srikar Dronamraju <srikar@linux•vnet.ibm.com> writes:
> Currently the kernel detects if its running on a shared lpar platform
> and requests home node associativity before the scheduler sched_domains
> are setup. However between the time NUMA setup is initialized and the
> request for home node associativity, workqueue initializes its per node
> cpumask. The per node workqueue possible cpumask may turn invalid
> after home node associativity resulting in weird situations like
> workqueue possible cpumask being a subset of workqueue online cpumask.
>
> This can be fixed by requesting home node associativity earlier just
> before NUMA setup. However at the NUMA setup time, kernel may not be in
> a position to detect if its running on a shared lpar platform. So
> request for home node associativity and if the request fails, fallback
> on the device tree property.
>
> While here, fix a problem where of_node_put could be called even when
> of_get_cpu_node was not successful.

of_node_put() handles NULL arguments, so this should not be necessary.

> +static int vphn_get_nid(unsigned long cpu, bool get_hwid)

[...]

> +static int numa_setup_cpu(unsigned long lcpu, bool get_hwid)

[...]

> @@ -528,7 +561,7 @@ static int ppc_numa_cpu_prepare(unsigned int cpu)
>  {
>  	int nid;
>  
> -	nid = numa_setup_cpu(cpu);
> +	nid = numa_setup_cpu(cpu, true);
>  	verify_cpu_node_mapping(cpu, nid);
>  	return 0;
>  }
> @@ -875,7 +908,7 @@ void __init mem_topology_setup(void)
>  	reset_numa_cpu_lookup_table();
>  
>  	for_each_present_cpu(cpu)
> -		numa_setup_cpu(cpu);
> +		numa_setup_cpu(cpu, false);
>  }

I'm open to other points of view here, but I would prefer two separate
functions, something like vphn_get_nid() for runtime and
vphn_get_nid_early() (which could be __init) for boot-time
initialization. Propagating a somewhat unexpressive boolean flag through
two levels of function calls in this code is unappealing...

Regardless, I have an annoying question :-) Isn't it possible that,
while Linux is calling vphn_get_nid() for each logical cpu in sequence,
the platform could change a virtual processor's node assignment,
potentially causing sibling threads to get different node assignments
and producing an incoherent topology (which then leads to sched domain
assertions etc)?

If so, I think more care is needed. The algorithm should make the vphn
call only once per cpu node, I think?

  reply	other threads:[~2019-09-05 20:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-29  5:50 [PATCH v2 0/4] Early node associativity Srikar Dronamraju
2019-08-29  5:50 ` [PATCH v2 1/4] powerpc/vphn: Check for error from hcall_vphn Srikar Dronamraju
2019-08-29  5:50 ` [PATCH v2 2/4] powerpc/numa: Handle extra hcall_vphn error cases Srikar Dronamraju
2019-08-29  5:50 ` [PATCH v2 3/4] powerpc/numa: Early request for home node associativity Srikar Dronamraju
2019-09-05 20:04   ` Nathan Lynch [this message]
2019-09-06  3:41     ` Srikar Dronamraju
2019-09-06 13:49       ` Srikar Dronamraju
2019-09-06 21:34       ` Nathan Lynch
2019-08-29  5:50 ` [PATCH v2 4/4] powerpc/numa: Remove late " Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tv9qqzm7.fsf@linux.ibm.com \
    --to=nathanl@linux$(echo .)ibm.com \
    --cc=linuxppc-dev@lists$(echo .)ozlabs.org \
    --cc=npiggin@gmail$(echo .)com \
    --cc=sathnaga@linux$(echo .)vnet.ibm.com \
    --cc=srikar@linux$(echo .)vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox