public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed
From: linas@austin•ibm.com (Linas Vepstas)
To: Benjamin Herrenschmidt <benh@kernel•crashing.org>
Cc: linuxppc-dev list <linuxppc-dev@ozlabs•org>
Subject: Re: eeh bug
Date: Thu, 17 May 2007 11:44:38 -0500	[thread overview]
Message-ID: <20070517164438.GD4325@austin.ibm.com> (raw)
In-Reply-To: <1179377946.32247.281.camel@localhost.localdomain>

On Thu, May 17, 2007 at 02:59:06PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2007-05-17 at 14:46 +1000, Benjamin Herrenschmidt wrote:
> > 
> > When an RTAS PCI config space call returns all f's, we do an eeh error
> > check by calling eeh_dn_check_failure(pdn->node, NULL);
> > 
> > The problem is that second argument... NULL for the pci_dev *. It looks
> > like the EEH code will try to printk pci_name of that and later on
> > dereference it within eehd, thus causing an oops.
> 
> Ok, so I just added a
> 
> 	if (dev == NULL)
> 		dev = pdn->pcidev;
> 
> To eeh_dn_check_failure(), and that fixes one of the NULL (name
> printing), but I get another one a bit later, in pci_find_capability
> called from eeh_slot_error_detail called from handle_eeh_events.
> (Probably in gather_pci_data).

OK, clearly I have been sloppy. The initial eeh design used pci_dev
for everything; and as time went on, I realized that the device node
made a better fit for what needed to be manipulated. So the code
migrated in that direction, but not unambiguously; it tried to
keep allegience to both ways of identifying a slot.

> One thing that looks suspicions is that just before that I see:
> 
> EEH: of node=/pci/@8000000200000d3/pci@2,4
> 
> Which is not a device but the bridge above it... 

That's the "partition endpoint", which is what the firmware wants. 
There's some ambiguity, as older systems with EADS and newer
direct-attached P5IOC slots have different relationships between
the "partition endpoint", the device, the slot, the bridge and 
PHB; which of these are equivalent and which are subordinate
can be confusing.

> we should probably not sure
> pci_find_capability in that code anyway and implent our own version
> using RTAS in case we don't have a pci_dev around, don't you think ?

I'll take a look. Usually, there's no pci_dev only when its a slot
with no device plugged into it; these can still receive EEH errors
during config space i/o to the bridge (I presume that the justification
is when aluminum scrap shorts out a pci connector or something like
that). In all other cases, there's a pci_dev, which is why the 
bug slipped by.

--linas

> 

  reply	other threads:[~2007-05-17 16:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-17  4:46 eeh bug Benjamin Herrenschmidt
2007-05-17  4:59 ` Benjamin Herrenschmidt
2007-05-17 16:44   ` Linas Vepstas [this message]
2007-05-17 22:43     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070517164438.GD4325@austin.ibm.com \
    --to=linas@austin$(echo .)ibm.com \
    --cc=benh@kernel$(echo .)crashing.org \
    --cc=linuxppc-dev@ozlabs$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox