From: sudeep.holla@arm•com (Sudeep Holla)
To: linux-arm-kernel@lists•infradead.org
Subject: Versatile Express randomly fails to boot - Versatile Express to be removed from nightly testing
Date: Mon, 16 Mar 2015 17:47:46 +0000 [thread overview]
Message-ID: <55071742.6000405@arm.com> (raw)
In-Reply-To: <20150316130419.GI8656@n2100.arm.linux.org.uk>
Hi Russell,
On 16/03/15 13:04, Russell King - ARM Linux wrote:
> On Mon, Mar 16, 2015 at 09:35:53AM +0000, Russell King - ARM Linux wrote:
>> On Mon, Mar 16, 2015 at 12:42:39AM +0000, Russell King - ARM Linux wrote:
>>> On Mon, Mar 16, 2015 at 12:04:38AM +0000, Russell King - ARM Linux wrote:
>>>> On Sun, Mar 15, 2015 at 09:33:30PM +0000, Russell King - ARM Linux wrote:
>>>>> I'm going to try a few other kernels to try and track down what's going
>>>>> on - whether something from arm-soc or my tree is responsible for this
>>>>> really weird behaviour.
>>>>
>>>> Okay, this is weird - it seems that it's caused by the FIQ oops
>>>> dumping code/FIQ changes which I've carried for many months
>>>> unchanged in my tree.
>>>
>>> More weirdness. Progressing forwards through my development code
>>> showed that when I merged the patch I mentioned in the previous mail,
>>> things started to fail.
>>>
>>> As I also mentioned, I'd drop that branch (two patches, one adding
>>> the IPI backtrace stuff and the second one updating the GIC to allow
>>> it to raise FIQs on suitably equipped platforms.) I would have
>>> expected that to have worked, but it just failed after four boot
>>> iterations. So either it's not the FIQ, or it is the FIQ code _and_
>>> also something else. Or it has something to do with the placement
>>> of functions in the kernel.
>>>
>>> I'll try more stuff tomorrow, working from where I presently am
>>> (which is basically last night's code minus the FIQ changes) by
>>> removing other changes to see what brings us back to a working
>>> system.
>>>
>>> As I've already said - this is really weird because all of these
>>> changes were also tested against -rc1... those which weren't are:
>>>
>>> mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
>>> mm: split ET_DYN ASLR from mmap ASLR
>>> mm: move randomize_et_dyn into ELF_ET_DYN_BASE
>>> mm: expose arch_mmap_rnd when available
>>> arm: factor out mmap ASLR into mmap_rnd
>>>
>>> and a number of clkdev rework patches (to make it use clk_hw
>>> internally.) Neither of these should be affecting it, but that's
>>> something I will be testing tomorrow.
>>
>> Okay, reverting the ASLR changes and the clkdev changes annoyingly still
>> results in random failure.
>
> After ruling out ASLR and clkdev, I started progressively reverting other
> stuff in the build tree. Eventually, I got down to reverting the L2C
> change I've been carrying since the L2C cleanups.
>
> With that lot reverted, which is slightly more than the previously known
> good test, it booted five times without issue.
>
> So, I thought I'd add that L2C change to the list of bad commits, and try
> omitting _just_ the L2C and FIQ changes... and it still fails - on the
> first test boot iteration.
>
> I think I'm going to declare that this is all down to some obscure
> hardware problem with Versatile Express, which is tickled by the layout
> of the kernel against the cache, and take it out of the nightly system
> (it's pointless having unstable hardware being tested; random failures
> are completely meaningless.)
>
I was able to see exact behaviour on my VExpress setup with CA9X4
core-tile. Few observations from my side:
1. This issue can be reproduced even on v3.19
2. As you suspected L2C, I tried disabling L2C and it seems to solve
the issue
3. Since it's very random and enabling LL_DEBUG made it difficult to
reproduce the issue, I tried to dump the stack using DS5 debugger
4. The stack is exactly same always both on v4.0-rc* and v3.19 kernel
and on multiple runs
5. Connecting to h/w debugger, stopping and re-starting the CPUs,
solves the issue. It's helping CPUs to get out of __radix_tree_lookup
somehow
Stacktrace
==========
(sorry it's looks different from std. Linux backtrace as this one id
dump from DS5)
CPU 0
----
#0 __radix_tree_lookup( root = <Value currently has no location>, index
= 16, nodep = (struct radix_tree_node**) 0x0, slotp = (void***) 0x0 ) at
radix-tree.c:517
#1 generic_handle_irq( irq = 16 ) at irqdesc.c:349
#2 __handle_domain_irq( domain = (struct irq_domain*) 0xBF004400, hwirq
= 16, lookup = <Value currently has no location>, regs = <Value
currently has no location> ) at irqdesc.c:391
#3 __raw_readl( addr = <Value optimised away by compiler> ) at io.h:121
#4 gic_handle_irq( regs = (struct pt_regs*) 0x805F1F40 ) at irq-gic.c:277
#5 [__irq_svc+0x40]
CPU1
----
#0 __radix_tree_lookup( root = <Value currently has no location>, index
= 16, nodep = (struct radix_tree_node**) 0x0, slotp = (void***) 0x0 ) at
radix-tree.c:517
#1 __irq_get_desc_lock( irq = <Value currently has no location>, flags =
(long unsigned int*) 0xBF08BF94, bus = false, check = 3 ) at irqdesc.c:544
#2 enable_percpu_irq( irq = 16, type = 0 ) at manage.c:1583
#3 twd_timer_cpu_notify( self = <Value not available : Undefined value
in stack frame for register R0>, action = <Value currently has no
location>, hcpu = <Value not available : Undefined value in stack frame
for register R2> ) at smp_twd.c:322
#4 notifier_call_chain( nl = <Value currently has no location>, val =
<Value not available : Undefined value in stack frame for register R1>,
v = <Value not available : Undefined value in stack frame for register
R2>, nr_to_call = <Value not available : Undefined value in stack frame
for register R3>, nr_calls = (int*) 0x0 ) at notifier.c:95
#5 notifier_to_errno( ret = <Value currently has no location> ) at
notifier.h:179
#6 cpu_notify( val = <Value currently has no location>, v = <Value
currently has no location> ) at cpu.c:234
#7 secondary_start_kernel() at smp.c:367
CPU2 & CPU3
-----------
Not booted yet, still waiting in bootloader
next prev parent reply other threads:[~2015-03-16 17:47 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-15 21:33 Versatile Express randomly fails to boot Russell King - ARM Linux
2015-03-16 0:04 ` Russell King - ARM Linux
2015-03-16 0:42 ` Russell King - ARM Linux
2015-03-16 9:35 ` Russell King - ARM Linux
2015-03-16 13:04 ` Versatile Express randomly fails to boot - Versatile Express to be removed from nightly testing Russell King - ARM Linux
2015-03-16 17:47 ` Sudeep Holla [this message]
2015-03-16 18:16 ` Russell King - ARM Linux
2015-03-16 19:16 ` Sudeep Holla
2015-03-16 19:52 ` Russell King - ARM Linux
2015-03-17 12:05 ` Sudeep Holla
2015-03-17 15:36 ` Russell King - ARM Linux
2015-03-17 15:51 ` Sudeep Holla
2015-03-17 16:17 ` Russell King - ARM Linux
2015-03-30 14:03 ` Russell King - ARM Linux
2015-03-30 14:48 ` Sudeep Holla
2015-03-30 15:05 ` Russell King - ARM Linux
2015-03-30 15:39 ` Sudeep Holla
2015-03-31 17:27 ` Sudeep Holla
2015-04-02 14:13 ` Russell King - ARM Linux
2015-04-02 17:38 ` Sudeep Holla
2016-06-14 15:31 ` Jon Medhurst (Tixy)
2016-06-14 15:52 ` Russell King - ARM Linux
2016-06-14 16:44 ` Sudeep Holla
2016-06-14 16:49 ` Russell King - ARM Linux
2016-06-15 9:27 ` Jon Medhurst (Tixy)
2016-06-15 9:32 ` Sudeep Holla
2016-06-15 9:50 ` Jon Medhurst (Tixy)
2016-06-15 9:59 ` Sudeep Holla
2016-06-15 9:27 ` Sudeep Holla
2016-06-14 16:31 ` Sudeep Holla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55071742.6000405@arm.com \
--to=sudeep.holla@arm$(echo .)com \
--cc=linux-arm-kernel@lists$(echo .)infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox