From: Nathan Fontenot <nfont@austin•ibm.com>
To: Grant Likely <grant.likely@linaro•org>,
Pantelis Antoniou <pantelis.antoniou@konsulko•com>
Cc: devicetree@vger•kernel.org, Steven Rostedt <rostedt@goodmis•org>,
linuxppc-dev <linuxppc-dev@ozlabs•org>,
Tyrel Datwyler <tyreld@linux•vnet.ibm.com>,
Thomas Gleixner <tglx@linutronix•de>
Subject: Re: OF_DYNAMIC node lifecycle
Date: Tue, 24 Jun 2014 15:07:05 -0500 [thread overview]
Message-ID: <53A9DA69.1040101@austin.ibm.com> (raw)
In-Reply-To: <20140623145844.DA6A3C40AE5@trevor.secretlab.ca>
On 06/23/2014 09:58 AM, Grant Likely wrote:
> On Thu, 19 Jun 2014 11:33:20 +0300, Pantelis Antoniou <pantelis.antoniou@konsulko•com> wrote:
>> Hi Grant,
>>
>> CCing Thomas Gleixner & Steven Rostedt, since they might have a few
>> ideas...
>>
>> On Jun 18, 2014, at 11:07 PM, Grant Likely wrote:
>>
>>> Hi Nathan and Tyrel,
>>>
>>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
>>> I'm hoping you can help me. Right now, pseries seems to be the only
>>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
>>> the entire kernel because it requires all DT code to manage reference
>>> counting with iterating over nodes. Most users simply get it wrong.
>>> Pantelis did some investigation and found that the reference counts on
>>> a running kernel are all over the place. I have my doubts that any
>>> code really gets it right.
>>>
>>> The problem is that users need to know when it is appropriate to call
>>> of_node_get()/of_node_put(). All list traversals that exit early need
>>> an extra call to of_node_put(), and code that is searching for a node
>>> in the tree and holding a reference to it needs to call of_node_get().
>>>
>>
>> In hindsight it appears that drivers just can't get the lifecycle right.
>> So we need to simplify things.
>>
>>> I've got a few pseries questions:
>>> - What are the changes being requested by pseries firmware? Is it only
>>> CPUs and memory nodes, or does it manipulate things all over the tree?
>>> - How frequent are the changes? How many changes would be likely over
>>> the runtime of the system?
>>> - Are you able to verify that removed nodes are actually able to be
>>> freed correctly? Do you have any testcases for node removal?
>>>
>>> I'm thinking very seriously about changing the locking semantics of DT
>>> code entirely so that most users never have to worry about
>>> of_node_get/put at all. If the DT code is switched to use rcu
>>> primitives for tree iteration (which also means making DT code use
>>> list_head, something I'm already investigating), then instead of
>>> trying to figure out of_node_get/put rules, callers could use
>>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
>>> searching over nodes, and only call of_node_get() if the node pointer
>>> is needed outside the rcu read-side lock.
>>>
>>> I'd really like to be rid of the node reference counting entirely, but
>>> I can't figure out a way of doing that safely, so I'd settle for
>>> making it a lot easier to get correct.
>>>
>>
>> Since we're going about changing things, how about that devtree_lock?
>
> I believe rcu would pretty much eliminate the devtree_lock entirely. All
> modifiers would need to grab a mutex to ensure there is only one writer
> at any given time, but readers would have free reign to parse the tree
> however they like.
>
> DT writers would have to follow some strict rules about how to handle
> nodes that are removed (ie. don't modify or of_node_put() them until
> after rcu is syncronized), but the number of writers is very small and
> we have control of all of them.
>
>> We're using a raw_spinlock and we're always taking the lock with
>> interrupts disabled.
>>
>> If we're going to make DT changes frequently during normal runtime
>> and not only during boot time, those are bad for any kind of real-time
>> performance.
>>
>> So the question is, do we really have code that access the live tree
>> during atomic sections? Is that something we want? Enforcing this
>> will make our lives easier, and we'll get the change to replace
>> that spinlock with a mutex.
>
> Yes, I believe the powerpc CPU hotplug code accesses the DT in atomic
> sections. I cannot put my finger on the exact code however. Nathan might
> know better. But, if I'm right, the whole problem goes away with RCU.
I went back through the cpu hotplug code. we do update the DT during cpu
hotplug but I don't see it happening during atomic sections.
The code is in arch/powerpc/platforms/pseries/dlpar.c
-Nathan
>
> The design with RCU is to switch struct device_node and struct property
> to use list_head and/or hlist_head with the _rcu accessors. They allow
> items to be removed from a list without syncronizing with readers. Right
> now we have two lists that need to be modified; the allnodes list and
> the sibling list. I *think* it will be fine for the two list removals to
> be non-atomic (there will be a brief period where the node can be found
> on one list, but not the other) because it is a transient state already
> accounted for in rcu read-side critical region.
>
> That said, I've also got a design to remove the allnodes list entirely
> and only work with the sibling list. I need to prototype this.
>
> We'll also need a transition plan to move to RCU. I think the existing
> iterators can be modified to do the rcu locking in-line, but still require
> the of_node_get/put stuff (basically, so existing code continue to works
> unchanged). Then we can add _rcu versions that drop the need for
> of_node_get/put(). When everything is converted, the old iterators can
> be dropped.
>
> g.
>
next prev parent reply other threads:[~2014-06-24 20:07 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-18 20:07 OF_DYNAMIC node lifecycle Grant Likely
2014-06-19 8:33 ` Pantelis Antoniou
2014-06-23 14:58 ` Grant Likely
2014-06-23 15:26 ` Pantelis Antoniou
2014-06-23 20:21 ` Grant Likely
2014-06-24 20:07 ` Nathan Fontenot [this message]
2014-06-25 20:22 ` Grant Likely
2014-06-26 19:59 ` Nathan Fontenot
2014-06-27 12:32 ` Grant Likely
2014-06-27 12:40 ` Pantelis Antoniou
2014-06-27 14:41 ` Nathan Fontenot
2014-06-19 15:26 ` Nathan Fontenot
2014-06-23 14:48 ` Grant Likely
2014-06-24 20:10 ` Nathan Fontenot
2014-06-25 20:24 ` Grant Likely
2014-06-26 20:01 ` Nathan Fontenot
2014-06-27 12:41 ` Grant Likely
2014-06-27 14:41 ` Nathan Fontenot
2014-07-16 5:33 ` Grant Likely
2014-07-16 18:30 ` Tyrel Datwyler
2014-07-16 20:57 ` Grant Likely
2014-07-16 22:26 ` Grant Likely
2014-07-16 23:12 ` Nathan Fontenot
2014-07-17 0:44 ` Grant Likely
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53A9DA69.1040101@austin.ibm.com \
--to=nfont@austin$(echo .)ibm.com \
--cc=devicetree@vger$(echo .)kernel.org \
--cc=grant.likely@linaro$(echo .)org \
--cc=linuxppc-dev@ozlabs$(echo .)org \
--cc=pantelis.antoniou@konsulko$(echo .)com \
--cc=rostedt@goodmis$(echo .)org \
--cc=tglx@linutronix$(echo .)de \
--cc=tyreld@linux$(echo .)vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox