public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner•de>
To: "Yury M." <yurypm@arista•com>
Cc: bhelgaas@google•com, mahesh@linux•ibm.com, oohall@gmail•com,
	linux-pci@vger•kernel.org, linux-kernel@vger•kernel.org,
	linuxppc-dev@lists•ozlabs.org
Subject: Re: [PATCH] PCI/AER: Clear non-fatal errors on AER recovery failure
Date: Wed, 20 May 2026 10:43:36 +0200	[thread overview]
Message-ID: <ag10ON5pZ4ymPT9U@wunner.de> (raw)
In-Reply-To: <3633b587-5782-4cfa-b967-997de86866bb@arista.com>

On Tue, May 19, 2026 at 05:05:20PM +0100, Yury M. wrote:
> Root port can detect AER error with source 0000:00:00.0.
> 
> In this case, we call find_source_device -> find_device_iter. The
> 'multi-error' flag is not set, and we are looking for the first error (not
> all). This means that for any error with the 0000:00:00.0 source on the root
> port, we will report the error for the first device on the bus.

No, is_error_source() considers bus number 0 as a bogus number
and will iterate over all devices on the bus.

> In my case, an AER error reported by 0000:06:08.0 will be logged as an error
> reported by 0000:06:07.0 if AER recovery constantly fails.

The problem is that 0000:06:08.0 reports an Advisory Non-Fatal Error,
i.e. it sets the ANFE bit in the Correctable Error Status Register
and signals (only) a Correctable Error, even though it also sets bits
in the Uncorrectable Error Status Register.

The kernel lacks support for ANFE handling and will only clear the bits
in the Correctable Error Status Register.  It neglects to also clear
(and report) the bits in the Uncorrectable Error Status Register.

There was an effort two years back to bring up ANFE support but it
fizzled out.  I talked to the submitter and he's now busy with other
things:

https://lore.kernel.org/r/20240620025857.206647-1-zhenzhong.duan@intel.com/

It's on my todo list to respin his series but I can't promise when
I'll get to it.

Thanks,

Lukas


  parent reply	other threads:[~2026-05-20  8:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18 13:23 [PATCH] PCI/AER: Clear non-fatal errors on AER recovery failure Yury Murashka
2026-05-18 20:29 ` Bjorn Helgaas
2026-05-18 20:49   ` Yury M.
2026-05-19  9:53 ` Lukas Wunner
     [not found]   ` <3633b587-5782-4cfa-b967-997de86866bb@arista.com>
2026-05-20  8:43     ` Lukas Wunner [this message]
2026-05-20 10:00       ` Yury M.
2026-05-20  9:02     ` Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ag10ON5pZ4ymPT9U@wunner.de \
    --to=lukas@wunner$(echo .)de \
    --cc=bhelgaas@google$(echo .)com \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linux-pci@vger$(echo .)kernel.org \
    --cc=linuxppc-dev@lists$(echo .)ozlabs.org \
    --cc=mahesh@linux$(echo .)ibm.com \
    --cc=oohall@gmail$(echo .)com \
    --cc=yurypm@arista$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox