public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed
From: Gavin Shan <gwshan@linux•vnet.ibm.com>
To: Daniel Axtens <dja@axtens•net>
Cc: Gavin Shan <gwshan@linux•vnet.ibm.com>, linuxppc-dev@ozlabs•org
Subject: Re: [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion
Date: Tue, 13 Oct 2015 15:28:28 +1100	[thread overview]
Message-ID: <20151013042828.GA28681@gwshan> (raw)
In-Reply-To: <87y4f732ll.fsf@gamma.ozlabs.ibm.com>

On Tue, Oct 13, 2015 at 01:48:54PM +1100, Daniel Axtens wrote:
>Gavin Shan <gwshan@linux•vnet.ibm.com> writes:
>
>> Danienl, The issue is tracked by IBM's bugzilla 127612 reported from Nvida
>> private GPU drivers. I tried to find the source code from upstream kernel,
>> but failed.
>
>OK. So I've read the internal bug, and I'm going to do my best to summarise
>without including confidential info.
>
> 1) A PHB with 2 devices is fenced via error injection.
>
> 2) The error_detected() callback is run on both devices. One returns
>    CAN_RECOVER, the other returns NONE.
>
>We then fall through to partial-hotplug handling. (BTW this isn't
>documented in Documentation/PCI/pci-error-recovery.txt, so at some point
>this should be fixed!)
>

No hotplug is triggered when EEH core receives CAN_RECOVER. It seems the
bug brought confusion instead of helping to explain the situation as I
intended to. I was intended to say: there has driver which implements
part of the EEH callbacks to collect diag-data.

>Partial hotplug is detected by the presence of an err_handler, not by
>storing the result of error_detected. Would it be better to store the
>result from eeh_report_error in the eeh_dev structure, rather than by
>looking at more elements of the err_handler structure?
>

I don't see the benefit to do that. In eeh_report_error(), the specific
error handlers still need to be checked and the result (from the check)
is temporary, and not worthy to store that in eeh_dev. The current code
looks good.

>More generally, drivers using error_detect and then returning NONE as a
>way to get data and then not participate in EEH is a hack, and it's not
>surprising it's breaking in odd ways, especially with partial hotplug.
>

I think you're talking about the situation reported from the bug. It's
CAN_RECOVER instead of NONE returned from error_detected(). With the
CAN_RECOVER, the driver hopes the EEH core to enable the IO path so that
it can collect diag-data from IO space at late point.

>Partial hotplug is pretty hacky to begin with, and a driver being able
>to opt out of EEH selectively is a useful feature, so we probably want
>to redesign the state machine to handle them both better. That would be
>a long term project.
>

Thanks,
Gavin

>>>> Signed-off-by: Gavin Shan <gwshan@linux•vnet.ibm.com>
>>>> ---
>>>>  arch/powerpc/kernel/eeh_driver.c | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>>>> index 3a626ed..32178a4 100644
>>>> --- a/arch/powerpc/kernel/eeh_driver.c
>>>> +++ b/arch/powerpc/kernel/eeh_driver.c
>>>> @@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>>>  	driver = eeh_pcid_get(dev);
>>>>  	if (driver) {
>>>>  		eeh_pcid_put(dev);
>>>> -		if (driver->err_handler)
>>>> +		if (driver->err_handler &&
>>>> +		    driver->err_handler->error_detected &&
>>>> +		    driver->err_handler->slot_reset &&
>>>> +		    driver->err_handler->resume)
>>>>  			return NULL;
>>>>  	}
>>>>  
>>>> -- 
>>>> 2.1.0
>>>>
>>>> _______________________________________________
>>>> Linuxppc-dev mailing list
>>>> Linuxppc-dev@lists•ozlabs.org
>>>> https://lists.ozlabs.org/listinfo/linuxppc-dev

  reply	other threads:[~2015-10-13  4:29 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-08  3:58 [PATCH v2 0/8] EEH Improvement and Cleanup Gavin Shan
2015-10-08  3:58 ` [PATCH v2 1/8] powerpc/eeh: Don't unfreeze PHB PE after reset Gavin Shan
2015-10-21 11:41   ` [v2,1/8] " Michael Ellerman
2015-10-08  3:58 ` [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion Gavin Shan
2015-10-12 22:55   ` Daniel Axtens
2015-10-12 23:25     ` Gavin Shan
2015-10-13  2:48       ` Daniel Axtens
2015-10-13  4:28         ` Gavin Shan [this message]
2015-10-13 23:48           ` Daniel Axtens
2015-10-14  1:33             ` Gavin Shan
2015-10-08  3:58 ` [PATCH v2 3/8] powerpc/eeh: Force reset on fenced PHB Gavin Shan
2015-10-13  1:43   ` Daniel Axtens
2015-10-13  5:01     ` Gavin Shan
2015-10-13  5:18       ` Daniel Axtens
2015-10-08  3:58 ` [PATCH v2 4/8] powerpc/eeh: More relxed condition for enabled IO path Gavin Shan
2015-10-08  3:58 ` [PATCH v2 5/8] powerpc/pseries: Cleanup on pseries_eeh_get_state() Gavin Shan
2015-10-08  4:15   ` Andrew Donnellan
2015-10-08  3:58 ` [PATCH v2 6/8] powerpc/powernv: Cleanup on EEH comments Gavin Shan
2015-10-08  3:58 ` [PATCH v2 7/8] powerpc/powernv: Remove pnv_eeh_cap_start() Gavin Shan
2015-10-08  4:18   ` Andrew Donnellan
2015-10-08  3:58 ` [PATCH v2 8/8] powerpc/powernv: Simplify pnv_eeh_set_option() Gavin Shan
2015-10-08  4:33   ` Andrew Donnellan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151013042828.GA28681@gwshan \
    --to=gwshan@linux$(echo .)vnet.ibm.com \
    --cc=dja@axtens$(echo .)net \
    --cc=linuxppc-dev@ozlabs$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox