From: Sourabh Jain <sourabhjain@linux•ibm.com>
To: Narayana Murty N <nnmlinux@linux•ibm.com>,
mahesh@linux•ibm.com, maddy@linux•ibm.com, mpe@ellerman•id.au,
christophe.leroy@csgroup•eu, gregkh@linuxfoundation•org,
oohall@gmail•com, npiggin@gmail•com
Cc: linuxppc-dev@lists•ozlabs.org, linux-kernel@vger•kernel.org,
tyreld@linux•ibm.com, vaibhav@linux•ibm.com, sbhat@linux•ibm.com,
ganeshgr@linux•ibm.com, haren@linux•ibm.com, thuth@redhat•com
Subject: Re: [PATCH v2 4/5] powerpc/pseries: Implement RTAS error injection via pseries_eeh_err_inject
Date: Sun, 7 Jun 2026 19:05:11 +0530 [thread overview]
Message-ID: <5388c50b-e43b-4a3a-bd59-28568f5d84cb@linux.ibm.com> (raw)
In-Reply-To: <20260527072433.94510-5-nnmlinux@linux.ibm.com>
On 27/05/26 12:54, Narayana Murty N wrote:
> Replace legacy MMIO error injection with full PAPR-compliant RTAS error
> injection supporting 14+ error types via
> - ibm,open-errinjct
> - ibm,errinjct
> - ibm,close-errinjct.
>
> Key features:
> - Complete open-session-inject-close cycle management
> - Special handling for ibm,open-errinjct output format (token,status)
> - Comprehensive buffer preparation per PAPR layouts
> - All pr_* logging uses pr_fmt("EEH: ") prefix
>
> Tested with corresponding QEMU patches:
> https://lore.kernel.org/all/20251029150618.186803-1-nnmlinux@linux.ibm.com/
>
> Signed-off-by: Narayana Murty N <nnmlinux@linux•ibm.com>
> ---
> arch/powerpc/platforms/pseries/eeh_pseries.c | 168 ++++++++++++++++---
> 1 file changed, 147 insertions(+), 21 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index d6f2e0d43b89..6af2a153ec25 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -902,8 +902,7 @@ static int validate_special_event(unsigned long addr, unsigned long mask)
> * Return: 0 if valid, RTAS_INVALID_PARAMETER otherwise.
> */
>
> -static int validate_corrupted_page(struct eeh_pe *pe __maybe_unused,
> - unsigned long addr, unsigned long mask)
> +static int validate_corrupted_page(unsigned long addr, unsigned long mask)
> {
> if (!addr) {
> pr_err("corrupted-page requires non-zero addr\n");
> @@ -978,7 +977,7 @@ static int prepare_errinjct_buffer(struct eeh_pe *pe, int type, int func,
> if (addr == 0)
> return RTAS_INVALID_PARAMETER;
>
> - if (validate_corrupted_page(pe, addr, mask))
> + if (validate_corrupted_page(addr, mask))
> return RTAS_INVALID_PARAMETER;
>
> buf32[0] = cpu_to_be32(upper_32_bits(addr));
> @@ -1047,6 +1046,97 @@ static int prepare_errinjct_buffer(struct eeh_pe *pe, int type, int func,
> return 0;
> }
>
> +/**
> + * rtas_open_errinjct_session - Open an RTAS error injection session
> + *
> + * Opens a session with the RTAS ibm,open-errinjct service.
> + *
> + * Return: Positive session token on success, negative error code on failure.
session token can't be 0, is it?
> + */
> +static int rtas_open_errinjct_session(void)
> +{
> + int open_token, args[2] = {0};
> + int rc, status, session_token = -1;
> +
> + open_token = rtas_function_token(RTAS_FN_IBM_OPEN_ERRINJCT);
> + if (open_token == RTAS_UNKNOWN_SERVICE) {
> + pr_err("RTAS: ibm,open-errinjct not available\n");
> + return RTAS_UNKNOWN_SERVICE;
> + }
> +
> + /* Call open; original code treated rtas_call return as session token */
> + rc = rtas_call(open_token, 0, 2, args);
> + status = args[1];
rc and status is same, isn't it? That makes the status variable redundant.
> + if (status != 0) {
> + pr_err("RTAS: open-errinjct failed: status=%d args[1]=%d rc=%d\n",
> + status, args[1], rc);
> + return status ? status : -EIO;
> + }
Not planning to handle extend delay by RTAS, return code 9900...9905?
> +
> + session_token = args[0];
> + pr_info("Opened injection session: token=%d\n", session_token);
> + return session_token;
> +}
> +
> +/**
> + * rtas_close_errinjct_session - Close an RTAS error injection session
> + * @session_token: Session token returned from open
> + *
> + * Attempts to close a previously opened error injection session. Best-effort;
> + * logs warnings if close fails or if service is unavailable.
> + */
> +
> +static void rtas_close_errinjct_session(int session_token)
> +{
> + int close_token, args[2] = {0};
> +
> + if (session_token <= 0)
> + return;
I didn't find a section in the PAPR which says token can't be 0.
> +
> + close_token = rtas_function_token(RTAS_FN_IBM_CLOSE_ERRINJCT);
> + if (close_token == RTAS_UNKNOWN_SERVICE) {
> + pr_warn("close-errinjct not available\n");
> + return;
> + }
> +
> + args[0] = session_token;
> + rtas_call(close_token, 1, 1, args);
> + if (args[0])
> + pr_warn("close-errinjct args[0]=%d\n", args[0]);
IIUC rtas_call do not copy status to output buffer. Let's consider
return value
from rtas_call function as status.
Since status is not copied, int arg is enough.
I think we must handle rtas busy delay for errinjct close rtas call?
> +}
> +
> +/**
> + * do_errinjct_call - Invoke the RTAS error injection service
> + * @errinjct_token: RTAS token for ibm,errinjct
> + * @type: RTAS error type
> + * @session_token: RTAS error injection session token
> + *
> + * Issues the RTAS ibm,errinjct call with the prepared work buffer. Logs errors
> + * on failure.
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +
> +static int do_errinjct_call(int errinjct_token, int type, int session_token)
> +{
> + int rc, status;
> +
> + if (errinjct_token == RTAS_UNKNOWN_SERVICE)
> + return -ENODEV;
> +
> + /* errinjct takes: type, session_token, workbuf pointer (3 in), returns status */
> + rc = rtas_call(errinjct_token, 3, 1, &status, type, session_token,
> + rtas_errinjct_buf);
> +
> + if (rc || status != 0) {
> + pr_err("RTAS: errinjct failed: rc=%d, status=%d\n", rc, status);
> + return status ? status : -EIO;
> + }
> +
> + pr_info("RTAS: errinjct ok: rc=%d, status=%d\n", rc, status);
> + return 0;
> +}
> +
> /**
> * pseries_eeh_err_inject - Inject specified error to the indicated PE
> * @pe: the indicated PE
> @@ -1060,30 +1150,66 @@ static int prepare_errinjct_buffer(struct eeh_pe *pe, int type, int func,
> static int pseries_eeh_err_inject(struct eeh_pe *pe, int type, int func,
> unsigned long addr, unsigned long mask)
> {
> - struct eeh_dev *pdev;
> + int rc = 0;
> + int session_token = -1;
> + int errinjct_token;
>
> - /* Check on PCI error type */
> - if (type != EEH_ERR_TYPE_32 && type != EEH_ERR_TYPE_64)
> - return -EINVAL;
> + /* Validate type */
> + if (!validate_err_type(type)) {
> + pr_err("RTAS: invalid error type 0x%x\n", type);
> + return RTAS_INVALID_PARAMETER;
> + }
> + pr_debug("RTAS: error type 0x%x\n", type);
>
> - switch (func) {
> - case EEH_ERR_FUNC_LD_MEM_ADDR:
> - case EEH_ERR_FUNC_LD_MEM_DATA:
> - case EEH_ERR_FUNC_ST_MEM_ADDR:
> - case EEH_ERR_FUNC_ST_MEM_DATA:
> - /* injects a MMIO error for all pdev's belonging to PE */
> - pci_lock_rescan_remove();
> - list_for_each_entry(pdev, &pe->edevs, entry)
> - eeh_pe_inject_mmio_error(pdev->pdev);
> - pci_unlock_rescan_remove();
> - break;
> - default:
> - return -ERANGE;
> + /* For IOA bus errors we must validate err_func and addr/mask in PE.
> + * For other types: if addr/mask present we'll still validate BAR range;
> + * otherwise skip function checks.
> + */
> + if (type == RTAS_ERR_TYPE_IOA_BUS_ERROR ||
> + type == RTAS_ERR_TYPE_IOA_BUS_ERROR_64) {
> + /* Validate that addr/mask fall in the PE's BAR ranges */
> + rc = validate_addr_mask_in_pe(pe, addr, mask);
> + if (rc)
> + return rc;
> + } else if (addr || mask) {
> + /* If caller provided addr/mask for a non-IOA type, do a BAR check too */
> + rc = validate_addr_mask_in_pe(pe, addr, mask);
> + if (rc)
> + return rc;
> }
The above if and else if case has identical code. Why don't we merge them?
>
> - return 0;
> + /* Open RTAS session */
> + session_token = rtas_open_errinjct_session();
> + if (session_token < 0)
session_token 0 is considered valid here. Where as it was considered
invalid in other
function above.
> + return session_token;
> +
> + /* get errinjct token */
> + errinjct_token = rtas_function_token(RTAS_FN_IBM_ERRINJCT);
> + if (errinjct_token == RTAS_UNKNOWN_SERVICE) {
How about checking this before getting the session token?
> + pr_err("RTAS: ibm,errinjct not available\n");
> + rc = -ENODEV;
> + goto out_close;
> + }
> +
> + /* prepare shared buffer while holding lock */
> + spin_lock(&rtas_errinjct_buf_lock);
> + rc = prepare_errinjct_buffer(pe, type, func, addr, mask);
> + if (rc) {
> + spin_unlock(&rtas_errinjct_buf_lock);
> + goto out_close;
> + }
> +
> + /* perform the errinjct RTAS call */
> + rc = do_errinjct_call(errinjct_token, type, session_token);
> + spin_unlock(&rtas_errinjct_buf_lock);
> +
> +out_close:
> + /* always attempt close if we opened a session */
> + rtas_close_errinjct_session(session_token);
> + return rc;
> }
>
> +
This new line seems unnecessary.
> static struct eeh_ops pseries_eeh_ops = {
> .name = "pseries",
> .probe = pseries_eeh_probe,
next prev parent reply other threads:[~2026-06-07 13:35 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-27 7:24 [PATCH v2 0/5] powerpc/pseries: Add full RTAS-based error injection support Narayana Murty N
2026-05-27 7:24 ` [PATCH v2 1/5] powerpc/rtas: Handle special return format for RTAS_FN_IBM_OPEN_ERRINJCT Narayana Murty N
2026-06-07 11:19 ` Sourabh Jain
2026-05-27 7:24 ` [PATCH v2 2/5] powerpc/pseries: Add RTAS error injection buffer infrastructure Narayana Murty N
2026-05-27 7:24 ` [PATCH v2 3/5] powerpc/pseries: Add RTAS error injection validation helpers Narayana Murty N
2026-06-07 12:17 ` Sourabh Jain
2026-05-27 7:24 ` [PATCH v2 4/5] powerpc/pseries: Implement RTAS error injection via pseries_eeh_err_inject Narayana Murty N
2026-06-07 13:35 ` Sourabh Jain [this message]
2026-05-27 7:24 ` [PATCH v2 5/5] powerpc/powernv: Map EEH error types to OPAL error injection types Narayana Murty N
2026-06-07 13:46 ` Sourabh Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5388c50b-e43b-4a3a-bd59-28568f5d84cb@linux.ibm.com \
--to=sourabhjain@linux$(echo .)ibm.com \
--cc=christophe.leroy@csgroup$(echo .)eu \
--cc=ganeshgr@linux$(echo .)ibm.com \
--cc=gregkh@linuxfoundation$(echo .)org \
--cc=haren@linux$(echo .)ibm.com \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=linuxppc-dev@lists$(echo .)ozlabs.org \
--cc=maddy@linux$(echo .)ibm.com \
--cc=mahesh@linux$(echo .)ibm.com \
--cc=mpe@ellerman$(echo .)id.au \
--cc=nnmlinux@linux$(echo .)ibm.com \
--cc=npiggin@gmail$(echo .)com \
--cc=oohall@gmail$(echo .)com \
--cc=sbhat@linux$(echo .)ibm.com \
--cc=thuth@redhat$(echo .)com \
--cc=tyreld@linux$(echo .)ibm.com \
--cc=vaibhav@linux$(echo .)ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox