CCF locking problems

public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed

* CCF locking problems
@ 2014-12-12 17:37 Russell King - ARM Linux
  2014-12-12 18:50 ` Stephen Boyd
  0 siblings, 1 reply; 3+ messages in thread
From: Russell King - ARM Linux @ 2014-12-12 17:37 UTC (permalink / raw)
  To: linux-arm-kernel

CCF is causing lockdep problems elsewhere in the kernel...

During kernel boot, the following lock chain is established via CCF when
debugfs is enabled:

	clk_register()
	`-> __clk_init() (takes prepare_lock)
	    `-> clk_debug_register() (takes clk_debug_lock)
	        `-> debugfs_create_dir()
	            `-> __create_file() (takes i_mutex)

So, prepare_lock ends up being a parent of the i_mutex class.

Generic kernel code creates a dependency between i_mutex and mmap_sem:

	iterate_dir() (takes i_mutex)
	`-> dcache_readdir()
	    `-> filldir64()
	        `-> might_fault() (marks mmap_sem as potentially taken by
				   a fault)

This means prepare_lock is also a parent lock of mmap_sem.

The kernel mmap() implementation of takes the mmap_sem, before calling
into drivers.  DRM's drm_gem_mmap() function which will be called as a
result of a mmap() takes the dev->struct_mutex mutex - which ends up as
a child of mmap_sem.

Now, if a DRM driver _then_ wants to use runtime PM, it may need to
call runtime PM functions beneath DRM's dev->struct_mutex (since, eg,
it may be protecting other DRM data while deciding which GPU to forward
to.)  This ends up creating a circular dependency between dev->struct_mutex
and prepare_lock, involving all the above mentioned locks.

I believe it is totally unreasonable for CCF to allow the prepare lock
to depend on something as fundamental as core kernel locks - in fact,
looking at __clk_init(), it looks like this fails in a very basic aspect
of kernel programming: do setup first, then publish.  If it did follow
that principle, it probably would not need to take the prepare lock
while calling clk_debug_register(), which would mean that prepare_lock
would not end up being a parent of potentially a lot of core kernel locks.

When you consider what prepare_lock is supposed to be doing, it's quite
clear that it should not be a parent to those.

Another interesting point is that clk_debug_create_one() has a comment
above it which is untrue:

/* caller must hold prepare_lock */

... except when it's called by clk_debug_init().

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* CCF locking problems
  2014-12-12 17:37 CCF locking problems Russell King - ARM Linux
@ 2014-12-12 18:50 ` Stephen Boyd
  2014-12-12 22:37   ` Russell King - ARM Linux
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen Boyd @ 2014-12-12 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/12/2014 09:37 AM, Russell King - ARM Linux wrote:
> CCF is causing lockdep problems elsewhere in the kernel...
>
> During kernel boot, the following lock chain is established via CCF when
> debugfs is enabled:
>
> 	clk_register()
> 	`-> __clk_init() (takes prepare_lock)
> 	    `-> clk_debug_register() (takes clk_debug_lock)
> 	        `-> debugfs_create_dir()
> 	            `-> __create_file() (takes i_mutex)
>
> So, prepare_lock ends up being a parent of the i_mutex class.
>
> Generic kernel code creates a dependency between i_mutex and mmap_sem:
>
> 	iterate_dir() (takes i_mutex)
> 	`-> dcache_readdir()
> 	    `-> filldir64()
> 	        `-> might_fault() (marks mmap_sem as potentially taken by
> 				   a fault)
>
> This means prepare_lock is also a parent lock of mmap_sem.
>
> The kernel mmap() implementation of takes the mmap_sem, before calling
> into drivers.  DRM's drm_gem_mmap() function which will be called as a
> result of a mmap() takes the dev->struct_mutex mutex - which ends up as
> a child of mmap_sem.
>
> Now, if a DRM driver _then_ wants to use runtime PM, it may need to
> call runtime PM functions beneath DRM's dev->struct_mutex (since, eg,
> it may be protecting other DRM data while deciding which GPU to forward
> to.)  This ends up creating a circular dependency between dev->struct_mutex
> and prepare_lock, involving all the above mentioned locks.
>
> I believe it is totally unreasonable for CCF to allow the prepare lock
> to depend on something as fundamental as core kernel locks - in fact,
> looking at __clk_init(), it looks like this fails in a very basic aspect
> of kernel programming: do setup first, then publish.  If it did follow
> that principle, it probably would not need to take the prepare lock
> while calling clk_debug_register(), which would mean that prepare_lock
> would not end up being a parent of potentially a lot of core kernel locks.
>
> When you consider what prepare_lock is supposed to be doing, it's quite
> clear that it should not be a parent to those.
>
> Another interesting point is that clk_debug_create_one() has a comment
> above it which is untrue:
>
> /* caller must hold prepare_lock */
>
> ... except when it's called by clk_debug_init().
>

Do you have a lockdep splat? What kernel version are you running? There
was an earlier report of this that I tried to fix with this commit:

commit 6314b6796e3c070d4c8086b08dfd453a0aeac4cf
Author: Stephen Boyd <sboyd@codeaurora•org>
Date:   Thu Sep 4 23:37:49 2014 -0700

    clk: Don't hold prepare_lock across debugfs creation


but I'm not sure if you have that patch or if more recent changes to the
framework have caused this problem to reoccur. I think I missed the part
where clk_debug_register() is called after clk_debug_init() though, so
perhaps you have clocks that are getting registered after late init?
Here's an untested patch.

----8<-----

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 44cdc47a6cc5..c9430653ddc9 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -240,12 +240,13 @@ static const struct file_operations clk_dump_fops = {
 	.release	= single_release,
 };
 
-/* caller must hold prepare_lock */
 static int clk_debug_create_one(struct clk *clk, struct dentry *pdentry)
 {
 	struct dentry *d;
 	int ret = -ENOMEM;
 
+	lockdep_assert_held(clk_debug_lock);
+
 	if (!clk || !pdentry) {
 		ret = -EINVAL;
 		goto out;
@@ -1944,7 +1945,6 @@ int __clk_init(struct device *dev, struct clk *clk)
 	else
 		clk->rate = 0;
 
-	clk_debug_register(clk);
 	/*
 	 * walk the list of orphan clocks and reparent any that are children of
 	 * this clock
@@ -1979,6 +1979,9 @@ int __clk_init(struct device *dev, struct clk *clk)
 out:
 	clk_prepare_unlock();
 
+	if (!ret)
+		clk_debug_register(clk);
+
 	return ret;
 }
 


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* CCF locking problems
  2014-12-12 18:50 ` Stephen Boyd
@ 2014-12-12 22:37   ` Russell King - ARM Linux
  0 siblings, 0 replies; 3+ messages in thread
From: Russell King - ARM Linux @ 2014-12-12 22:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 12, 2014 at 10:50:35AM -0800, Stephen Boyd wrote:
> Do you have a lockdep splat? What kernel version are you running?

I could if I dug it out of the syslog files... but your patch would also
fix it - it's pretty close to what I'm now running.  Mine was found on
a v3.18 based kernel.

> but I'm not sure if you have that patch or if more recent changes to the
> framework have caused this problem to reoccur. I think I missed the part
> where clk_debug_register() is called after clk_debug_init() though, so
> perhaps you have clocks that are getting registered after late init?
> Here's an untested patch.
> 
> ----8<-----
> 
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 44cdc47a6cc5..c9430653ddc9 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -240,12 +240,13 @@ static const struct file_operations clk_dump_fops = {
>  	.release	= single_release,
>  };
>  
> -/* caller must hold prepare_lock */
>  static int clk_debug_create_one(struct clk *clk, struct dentry *pdentry)
>  {
>  	struct dentry *d;
>  	int ret = -ENOMEM;
>  
> +	lockdep_assert_held(clk_debug_lock);
> +
>  	if (!clk || !pdentry) {
>  		ret = -EINVAL;
>  		goto out;
> @@ -1944,7 +1945,6 @@ int __clk_init(struct device *dev, struct clk *clk)
>  	else
>  		clk->rate = 0;
>  
> -	clk_debug_register(clk);
>  	/*
>  	 * walk the list of orphan clocks and reparent any that are children of
>  	 * this clock
> @@ -1979,6 +1979,9 @@ int __clk_init(struct device *dev, struct clk *clk)
>  out:
>  	clk_prepare_unlock();
>  
> +	if (!ret)
> +		clk_debug_register(clk);
> +
>  	return ret;
>  }

Apart from the lockdep_assert_held(), this is basically what I'm
currently running, which does fix the problem.  I was intending to send
my patch, but pub with good beer, gliding syndicate, and Christmas food
intervened tonight.

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-12-12 22:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-12 17:37 CCF locking problems Russell King - ARM Linux
2014-12-12 18:50 ` Stephen Boyd
2014-12-12 22:37   ` Russell King - ARM Linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox