From: linas@austin•ibm.com (Linas Vepstas)
To: Benjamin Herrenschmidt <benh@kernel•crashing.org>
Cc: linuxppc-dev list <linuxppc-dev@ozlabs•org>
Subject: Re: eeh bug
Date: Thu, 17 May 2007 11:44:38 -0500 [thread overview]
Message-ID: <20070517164438.GD4325@austin.ibm.com> (raw)
In-Reply-To: <1179377946.32247.281.camel@localhost.localdomain>
On Thu, May 17, 2007 at 02:59:06PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2007-05-17 at 14:46 +1000, Benjamin Herrenschmidt wrote:
> >
> > When an RTAS PCI config space call returns all f's, we do an eeh error
> > check by calling eeh_dn_check_failure(pdn->node, NULL);
> >
> > The problem is that second argument... NULL for the pci_dev *. It looks
> > like the EEH code will try to printk pci_name of that and later on
> > dereference it within eehd, thus causing an oops.
>
> Ok, so I just added a
>
> if (dev == NULL)
> dev = pdn->pcidev;
>
> To eeh_dn_check_failure(), and that fixes one of the NULL (name
> printing), but I get another one a bit later, in pci_find_capability
> called from eeh_slot_error_detail called from handle_eeh_events.
> (Probably in gather_pci_data).
OK, clearly I have been sloppy. The initial eeh design used pci_dev
for everything; and as time went on, I realized that the device node
made a better fit for what needed to be manipulated. So the code
migrated in that direction, but not unambiguously; it tried to
keep allegience to both ways of identifying a slot.
> One thing that looks suspicions is that just before that I see:
>
> EEH: of node=/pci/@8000000200000d3/pci@2,4
>
> Which is not a device but the bridge above it...
That's the "partition endpoint", which is what the firmware wants.
There's some ambiguity, as older systems with EADS and newer
direct-attached P5IOC slots have different relationships between
the "partition endpoint", the device, the slot, the bridge and
PHB; which of these are equivalent and which are subordinate
can be confusing.
> we should probably not sure
> pci_find_capability in that code anyway and implent our own version
> using RTAS in case we don't have a pci_dev around, don't you think ?
I'll take a look. Usually, there's no pci_dev only when its a slot
with no device plugged into it; these can still receive EEH errors
during config space i/o to the bridge (I presume that the justification
is when aluminum scrap shorts out a pci connector or something like
that). In all other cases, there's a pci_dev, which is why the
bug slipped by.
--linas
>
next prev parent reply other threads:[~2007-05-17 16:44 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-17 4:46 eeh bug Benjamin Herrenschmidt
2007-05-17 4:59 ` Benjamin Herrenschmidt
2007-05-17 16:44 ` Linas Vepstas [this message]
2007-05-17 22:43 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070517164438.GD4325@austin.ibm.com \
--to=linas@austin$(echo .)ibm.com \
--cc=benh@kernel$(echo .)crashing.org \
--cc=linuxppc-dev@ozlabs$(echo .)org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox