public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed
From: sudeep.holla@arm•com (Sudeep Holla)
To: linux-arm-kernel@lists•infradead.org
Subject: Versatile Express randomly fails to boot - Versatile Express to be removed from nightly testing
Date: Thu, 02 Apr 2015 18:38:51 +0100	[thread overview]
Message-ID: <551D7EAB.1000200@arm.com> (raw)
In-Reply-To: <20150402141336.GI24899@n2100.arm.linux.org.uk>



On 02/04/15 15:13, Russell King - ARM Linux wrote:
> On Tue, Mar 31, 2015 at 06:27:30PM +0100, Sudeep Holla wrote:
>> Not sure on that as v3.18 with DT seems to be working fine and passed
>> overnight reboot testing.
>
> Okay, that suggests there's something post v3.18 which is causing this,
> rather than it being a DT vs non-DT thing.
>

Correct. Just to be 100% sure I reverted that non-DT removal commit on
both v3.19-rc1 and v4.0-rc6 and was able to reproduce issue without DT.

> An extra data point which I've just found (by enabling attempts to do
> hibernation on various test platforms) is that the Versatile Express
> appears to be incapable of taking a CPU offline.
>
> This crashes the entire system with sometimes random results.  Sometimes
> it'll appear that a spinlock has been left owned by CPU#1 which is
> offline.  Sometimes it'll silently hang.  Sometimes it'll start slowly
> dumping kernel messages from the start of the kernel's ring buffer (!),
> eg:
>
> PM: freeze of devices complete after 29.342 msecs
> PM: late freeze of devices complete after 6.398 msecs
> PM: noirq freeze of devices complete after 5.493 msecs
> Disabling non-boot CPUs ...
> __cpu_disable(1)
> __cpu_die(1)
> handle_IPI(0)
> Booting Linux on physical CPU 0x0
>
> So far, it's not managed to take a CPU successfully offline and know that
> it has.  If I disable the calls to cpu_enter_lowpower() and
> cpu_leave_lowpower(), then it appears to work.
>
> This leads me to wonder whether flush_cache_louis() works... which led me
> in turn to ARM_ERRATA_643719, which is disabled in my builds.  However,
> the CA9 tile has a r0p1 CA9, which allegedly suffers from this errata.
>

Yes I observed that and tested for this issue enabling it. It's doesn't
affect and I still hit the issue.

[...]
>
> I haven't tested going back to a tag latency of 1 1 1 yet.  Can you
> confirm whether you have this errata enabled for your tests?
>
I have now gone back to <1 1 1> latency to debug the issue as it's
easier to reproduce with that latencies.

After I failed terribly to bisect between v3.18..v3.19-c1, as it depends
a lot on the config you choose(a lot of changes introduced as it's merge
window), I started looking at the code where we hit this issue since
it's always in __radix_tree_lookup in lib/radix-tree.c while
accessing the slots to see if it provides any more details.

Regards,
Sudeep

  reply	other threads:[~2015-04-02 17:38 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-15 21:33 Versatile Express randomly fails to boot Russell King - ARM Linux
2015-03-16  0:04 ` Russell King - ARM Linux
2015-03-16  0:42   ` Russell King - ARM Linux
2015-03-16  9:35     ` Russell King - ARM Linux
2015-03-16 13:04       ` Versatile Express randomly fails to boot - Versatile Express to be removed from nightly testing Russell King - ARM Linux
2015-03-16 17:47         ` Sudeep Holla
2015-03-16 18:16           ` Russell King - ARM Linux
2015-03-16 19:16             ` Sudeep Holla
2015-03-16 19:52               ` Russell King - ARM Linux
2015-03-17 12:05                 ` Sudeep Holla
2015-03-17 15:36                   ` Russell King - ARM Linux
2015-03-17 15:51                     ` Sudeep Holla
2015-03-17 16:17                       ` Russell King - ARM Linux
2015-03-30 14:03                         ` Russell King - ARM Linux
2015-03-30 14:48                           ` Sudeep Holla
2015-03-30 15:05                             ` Russell King - ARM Linux
2015-03-30 15:39                               ` Sudeep Holla
2015-03-31 17:27                                 ` Sudeep Holla
2015-04-02 14:13                                   ` Russell King - ARM Linux
2015-04-02 17:38                                     ` Sudeep Holla [this message]
2016-06-14 15:31                                       ` Jon Medhurst (Tixy)
2016-06-14 15:52                                         ` Russell King - ARM Linux
2016-06-14 16:44                                           ` Sudeep Holla
2016-06-14 16:49                                             ` Russell King - ARM Linux
2016-06-15  9:27                                               ` Jon Medhurst (Tixy)
2016-06-15  9:32                                                 ` Sudeep Holla
2016-06-15  9:50                                                   ` Jon Medhurst (Tixy)
2016-06-15  9:59                                                     ` Sudeep Holla
2016-06-15  9:27                                               ` Sudeep Holla
2016-06-14 16:31                                         ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=551D7EAB.1000200@arm.com \
    --to=sudeep.holla@arm$(echo .)com \
    --cc=linux-arm-kernel@lists$(echo .)infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox