Re: [PATCH v2 4/5] powerpc/pseries: Implement RTAS error injection via pseries_eeh_err_inject

public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed

From: Sourabh Jain <sourabhjain@linux•ibm.com>
To: Narayana Murty N <nnmlinux@linux•ibm.com>,
	mahesh@linux•ibm.com, maddy@linux•ibm.com, mpe@ellerman•id.au,
	christophe.leroy@csgroup•eu, gregkh@linuxfoundation•org,
	oohall@gmail•com, npiggin@gmail•com
Cc: linuxppc-dev@lists•ozlabs.org, linux-kernel@vger•kernel.org,
	tyreld@linux•ibm.com, vaibhav@linux•ibm.com, sbhat@linux•ibm.com,
	ganeshgr@linux•ibm.com, haren@linux•ibm.com, thuth@redhat•com
Subject: Re: [PATCH v2 4/5] powerpc/pseries: Implement RTAS error injection via pseries_eeh_err_inject
Date: Sun, 7 Jun 2026 19:05:11 +0530	[thread overview]
Message-ID: <5388c50b-e43b-4a3a-bd59-28568f5d84cb@linux.ibm.com> (raw)
In-Reply-To: <20260527072433.94510-5-nnmlinux@linux.ibm.com>



On 27/05/26 12:54, Narayana Murty N wrote:
> Replace legacy MMIO error injection with full PAPR-compliant RTAS error
> injection supporting 14+ error types via
>   - ibm,open-errinjct
>   - ibm,errinjct
>   - ibm,close-errinjct.
>
> Key features:
> - Complete open-session-inject-close cycle management
> - Special handling for ibm,open-errinjct output format (token,status)
> - Comprehensive buffer preparation per PAPR layouts
> - All pr_* logging uses pr_fmt("EEH: ") prefix
>
> Tested with corresponding QEMU patches:
> https://lore.kernel.org/all/20251029150618.186803-1-nnmlinux@linux.ibm.com/
>
> Signed-off-by: Narayana Murty N <nnmlinux@linux•ibm.com>
> ---
>   arch/powerpc/platforms/pseries/eeh_pseries.c | 168 ++++++++++++++++---
>   1 file changed, 147 insertions(+), 21 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index d6f2e0d43b89..6af2a153ec25 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -902,8 +902,7 @@ static int validate_special_event(unsigned long addr, unsigned long mask)
>    * Return: 0 if valid, RTAS_INVALID_PARAMETER otherwise.
>    */
>   
> -static int validate_corrupted_page(struct eeh_pe *pe __maybe_unused,
> -				   unsigned long addr, unsigned long mask)
> +static int validate_corrupted_page(unsigned long addr, unsigned long mask)
>   {
>   	if (!addr) {
>   		pr_err("corrupted-page requires non-zero addr\n");
> @@ -978,7 +977,7 @@ static int prepare_errinjct_buffer(struct eeh_pe *pe, int type, int func,
>   		if (addr == 0)
>   			return RTAS_INVALID_PARAMETER;
>   
> -		if (validate_corrupted_page(pe, addr, mask))
> +		if (validate_corrupted_page(addr, mask))
>   			return RTAS_INVALID_PARAMETER;
>   
>   		buf32[0] = cpu_to_be32(upper_32_bits(addr));
> @@ -1047,6 +1046,97 @@ static int prepare_errinjct_buffer(struct eeh_pe *pe, int type, int func,
>   	return 0;
>   }
>   
> +/**
> + * rtas_open_errinjct_session - Open an RTAS error injection session
> + *
> + * Opens a session with the RTAS ibm,open-errinjct service.
> + *
> + * Return: Positive session token on success, negative error code on failure.

session token can't be 0, is it?

> + */
> +static int rtas_open_errinjct_session(void)
> +{
> +	int open_token, args[2] = {0};
> +	int rc, status, session_token = -1;
> +
> +	open_token = rtas_function_token(RTAS_FN_IBM_OPEN_ERRINJCT);
> +	if (open_token == RTAS_UNKNOWN_SERVICE) {
> +		pr_err("RTAS: ibm,open-errinjct not available\n");
> +		return RTAS_UNKNOWN_SERVICE;
> +	}
> +
> +	/* Call open; original code treated rtas_call return as session token */
> +	rc = rtas_call(open_token, 0, 2, args);
> +	status = args[1];
rc and status is same, isn't it? That makes the status variable redundant.

> +	if (status != 0) {
> +		pr_err("RTAS: open-errinjct failed: status=%d args[1]=%d rc=%d\n",
> +		       status, args[1], rc);
> +		return status ? status : -EIO;
> +	}

Not planning to handle extend delay by RTAS, return code 9900...9905?

> +
> +	session_token = args[0];
> +	pr_info("Opened injection session: token=%d\n", session_token);
> +	return session_token;
> +}
> +
> +/**
> + * rtas_close_errinjct_session - Close an RTAS error injection session
> + * @session_token: Session token returned from open
> + *
> + * Attempts to close a previously opened error injection session. Best-effort;
> + * logs warnings if close fails or if service is unavailable.
> + */
> +
> +static void rtas_close_errinjct_session(int session_token)
> +{
> +	int close_token, args[2] = {0};
> +
> +	if (session_token <= 0)
> +		return;

I didn't find a section in the PAPR which says token can't be 0.

> +
> +	close_token = rtas_function_token(RTAS_FN_IBM_CLOSE_ERRINJCT);
> +	if (close_token == RTAS_UNKNOWN_SERVICE) {
> +		pr_warn("close-errinjct not available\n");
> +		return;
> +	}
> +
> +	args[0] = session_token;
> +	rtas_call(close_token, 1, 1, args);
> +	if (args[0])
> +		pr_warn("close-errinjct  args[0]=%d\n", args[0]);

IIUC rtas_call do not copy status to output buffer. Let's consider 
return value
from rtas_call function as status.

Since status is not copied, int arg is enough.

I think we must handle rtas busy delay for errinjct close rtas call?

> +}
> +
> +/**
> + * do_errinjct_call - Invoke the RTAS error injection service
> + * @errinjct_token: RTAS token for ibm,errinjct
> + * @type:           RTAS error type
> + * @session_token:  RTAS error injection session token
> + *
> + * Issues the RTAS ibm,errinjct call with the prepared work buffer. Logs errors
> + * on failure.
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +
> +static int do_errinjct_call(int errinjct_token, int type, int session_token)
> +{
> +	int rc, status;
> +
> +	if (errinjct_token == RTAS_UNKNOWN_SERVICE)
> +		return -ENODEV;
> +
> +	/* errinjct takes: type, session_token, workbuf pointer (3 in), returns status */
> +	rc = rtas_call(errinjct_token, 3, 1, &status, type, session_token,
> +		       rtas_errinjct_buf);
> +
> +	if (rc || status != 0) {
> +		pr_err("RTAS: errinjct failed: rc=%d, status=%d\n", rc, status);
> +		return status ? status : -EIO;
> +	}
> +
> +	pr_info("RTAS: errinjct ok: rc=%d, status=%d\n", rc, status);
> +	return 0;
> +}
> +
>   /**
>    * pseries_eeh_err_inject - Inject specified error to the indicated PE
>    * @pe: the indicated PE
> @@ -1060,30 +1150,66 @@ static int prepare_errinjct_buffer(struct eeh_pe *pe, int type, int func,
>   static int pseries_eeh_err_inject(struct eeh_pe *pe, int type, int func,
>   				  unsigned long addr, unsigned long mask)
>   {
> -	struct	eeh_dev	*pdev;
> +	int rc = 0;
> +	int session_token = -1;
> +	int errinjct_token;
>   
> -	/* Check on PCI error type */
> -	if (type != EEH_ERR_TYPE_32 && type != EEH_ERR_TYPE_64)
> -		return -EINVAL;
> +	/* Validate type */
> +	if (!validate_err_type(type)) {
> +		pr_err("RTAS: invalid error type 0x%x\n", type);
> +		return RTAS_INVALID_PARAMETER;
> +	}
> +	pr_debug("RTAS: error type 0x%x\n", type);
>   
> -	switch (func) {
> -	case EEH_ERR_FUNC_LD_MEM_ADDR:
> -	case EEH_ERR_FUNC_LD_MEM_DATA:
> -	case EEH_ERR_FUNC_ST_MEM_ADDR:
> -	case EEH_ERR_FUNC_ST_MEM_DATA:
> -		/* injects a MMIO error for all pdev's belonging to PE */
> -		pci_lock_rescan_remove();
> -		list_for_each_entry(pdev, &pe->edevs, entry)
> -			eeh_pe_inject_mmio_error(pdev->pdev);
> -		pci_unlock_rescan_remove();
> -		break;
> -	default:
> -		return -ERANGE;
> +	/* For IOA bus errors we must validate err_func and addr/mask in PE.
> +	 * For other types: if addr/mask present we'll still validate BAR range;
> +	 * otherwise skip function checks.
> +	 */
> +	if (type == RTAS_ERR_TYPE_IOA_BUS_ERROR ||
> +	    type == RTAS_ERR_TYPE_IOA_BUS_ERROR_64) {
> +		/* Validate that addr/mask fall in the PE's BAR ranges */
> +		rc = validate_addr_mask_in_pe(pe, addr, mask);
> +		if (rc)
> +			return rc;
> +	} else if (addr || mask) {
> +		/* If caller provided addr/mask for a non-IOA type, do a BAR check too */
> +		rc = validate_addr_mask_in_pe(pe, addr, mask);
> +		if (rc)
> +			return rc;
>   	}

The above if and else if case has identical code. Why don't we merge them?

>   
> -	return 0;
> +	/* Open RTAS session */
> +	session_token = rtas_open_errinjct_session();
> +	if (session_token < 0)

session_token 0 is considered valid here. Where as it was considered 
invalid in other
function above.

> +		return session_token;
> +
> +	/* get errinjct token */
> +	errinjct_token = rtas_function_token(RTAS_FN_IBM_ERRINJCT);
> +	if (errinjct_token == RTAS_UNKNOWN_SERVICE) {
How about checking this before getting the session token?

> +		pr_err("RTAS: ibm,errinjct not available\n");
> +		rc = -ENODEV;
> +		goto out_close;
> +	}
> +
> +	/* prepare shared buffer while holding lock */
> +	spin_lock(&rtas_errinjct_buf_lock);
> +	rc = prepare_errinjct_buffer(pe, type, func, addr, mask);
> +	if (rc) {
> +		spin_unlock(&rtas_errinjct_buf_lock);
> +		goto out_close;
> +	}
> +
> +	/* perform the errinjct RTAS call */
> +	rc = do_errinjct_call(errinjct_token, type, session_token);
> +	spin_unlock(&rtas_errinjct_buf_lock);
> +
> +out_close:
> +	/* always attempt close if we opened a session */
> +	rtas_close_errinjct_session(session_token);
> +	return rc;
>   }
>   
> +

This new line seems unnecessary.


>   static struct eeh_ops pseries_eeh_ops = {
>   	.name			= "pseries",
>   	.probe			= pseries_eeh_probe,

next prev parent reply	other threads:[~2026-06-07 13:35 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-27  7:24 [PATCH v2 0/5] powerpc/pseries: Add full RTAS-based error injection support Narayana Murty N
2026-05-27  7:24 ` [PATCH v2 1/5] powerpc/rtas: Handle special return format for RTAS_FN_IBM_OPEN_ERRINJCT Narayana Murty N
2026-06-07 11:19   ` Sourabh Jain
2026-05-27  7:24 ` [PATCH v2 2/5] powerpc/pseries: Add RTAS error injection buffer infrastructure Narayana Murty N
2026-05-27  7:24 ` [PATCH v2 3/5] powerpc/pseries: Add RTAS error injection validation helpers Narayana Murty N
2026-06-07 12:17   ` Sourabh Jain
2026-05-27  7:24 ` [PATCH v2 4/5] powerpc/pseries: Implement RTAS error injection via pseries_eeh_err_inject Narayana Murty N
2026-06-07 13:35   ` Sourabh Jain [this message]
2026-05-27  7:24 ` [PATCH v2 5/5] powerpc/powernv: Map EEH error types to OPAL error injection types Narayana Murty N
2026-06-07 13:46   ` Sourabh Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5388c50b-e43b-4a3a-bd59-28568f5d84cb@linux.ibm.com \
    --to=sourabhjain@linux$(echo .)ibm.com \
    --cc=christophe.leroy@csgroup$(echo .)eu \
    --cc=ganeshgr@linux$(echo .)ibm.com \
    --cc=gregkh@linuxfoundation$(echo .)org \
    --cc=haren@linux$(echo .)ibm.com \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linuxppc-dev@lists$(echo .)ozlabs.org \
    --cc=maddy@linux$(echo .)ibm.com \
    --cc=mahesh@linux$(echo .)ibm.com \
    --cc=mpe@ellerman$(echo .)id.au \
    --cc=nnmlinux@linux$(echo .)ibm.com \
    --cc=npiggin@gmail$(echo .)com \
    --cc=oohall@gmail$(echo .)com \
    --cc=sbhat@linux$(echo .)ibm.com \
    --cc=thuth@redhat$(echo .)com \
    --cc=tyreld@linux$(echo .)ibm.com \
    --cc=vaibhav@linux$(echo .)ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox