Re: [PATCH v14 2/8] ACPI: APEI: GHES: use exception context to gate SIGBUS on poison consumption

public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed

From: Shuai Xue <xueshuai@linux•alibaba.com>
To: Ruidong Tian <tianruidong@linux•alibaba.com>,
	catalin.marinas@arm•com, will@kernel•org, rafael@kernel•org,
	tony.luck@intel•com, guohanjun@huawei•com, mchehab@kernel•org,
	tongtiangen@huawei•com, james.morse@arm•com,
	robin.murphy@arm•com, andreyknvl@gmail•com, dvyukov@google•com,
	vincenzo.frascino@arm•com, mpe@ellerman•id.au, npiggin@gmail•com,
	ryabinin.a.a@gmail•com, glider@google•com,
	christophe.leroy@csgroup•eu, aneesh.kumar@kernel•org,
	naveen.n.rao@linux•ibm.com, tglx@linutronix•de, mingo@redhat•com
Cc: linux-arm-kernel@lists•infradead.org, linux-mm@kvack•org,
	linuxppc-dev@lists•ozlabs.org, linux-kernel@vger•kernel.org,
	kasan-dev@googlegroups•com
Subject: Re: [PATCH v14 2/8] ACPI: APEI: GHES: use exception context to gate SIGBUS on poison consumption
Date: Wed, 27 May 2026 17:34:14 +0800	[thread overview]
Message-ID: <c6ca8a87-c341-4368-9773-39c48ab9297d@linux.alibaba.com> (raw)
In-Reply-To: <20260518084956.2538442-3-tianruidong@linux.alibaba.com>



On 5/18/26 4:49 PM, Ruidong Tian wrote:
> When a GHES SEA (Synchronous External Abort) fires while the CPU
> was executing in kernel mode, it typically means that kernel code
> itself consumed a poisoned memory location -- e.g. copy_from_user()
> / copy_to_user() invoked from a ioctl() or write() syscall touched
> a poisoned user page or page-cache page on behalf of the task.
> 
> The expected behaviour in that case is that the faulting kernel
> helper returns via its extable fixup and the syscall returns an
> error (e.g. -EFAULT) to user space. It is NOT appropriate to deliver
> SIGBUS to the current task: the task did not directly dereference
> the poisoned address, the kernel did on its behalf, and the kernel
> is able to recover.
> 
> Up to now ghes_handle_memory_failure() unconditionally promoted any
> synchronous recoverable memory error to MF_ACTION_REQUIRED, which
> ends up SIGBUS on current -- regardless of whether the poison was
> consumed from user space or from inside the kernel on the task's
> behalf. That kills tasks that should instead have seen a plain
> syscall error.
> 
> To fix this, the execution mode in which the exception was taken
> must be captured at the arch-level entry point, where pt_regs (and
> hence user_mode(regs)) are still available. The estatus node that
> later drains the error in IRQ / process context no longer has
> access to the original regs.
> 
> Introduce:
> 
>      enum context { NO_USE = -1, IN_KERNEL = 0, IN_USER = 1 };
> 
> and plumb the value all the way down to the queued estatus node:
> 
>   * Add an 'enum context context' field to struct ghes_estatus_node
>     and record it in ghes_in_nmi_queue_one_entry().
>   * Extend ghes_notify_sea() and the internal
>     ghes_in_nmi_spool_from_list() with an enum context parameter.
> 
> Then consume the recorded context in ghes_handle_memory_failure()
> for the GHES_SEV_RECOVERABLE / sync path:
> 
>      flags = sync && context == IN_USER ? MF_ACTION_REQUIRED : 0;
> 
> i.e. MF_ACTION_REQUIRED (and thus SIGBUS via the task_work path) is
> only raised for user-mode poison consumption. Synchronous errors
> taken in kernel mode fall back to memory_failure_queue() with
> flags=0, asynchronously isolating the poisoned page while letting
> the faulting kernel helper's extable fixup return -EFAULT
> to user space.
> 
> Paths that pass NO_USE are unaffected:
> sync is false for them, so flags stays 0 as before.
> 
> Signed-off-by: Ruidong Tian  <tianruidong@linux•alibaba.com>
> ---
>   arch/arm64/kernel/acpi.c |  2 +-
>   drivers/acpi/apei/ghes.c | 36 ++++++++++++++++++++----------------
>   include/acpi/ghes.h      |  6 ++++--
>   3 files changed, 25 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
> index 5891f92c2035..40d4a2913d51 100644
> --- a/arch/arm64/kernel/acpi.c
> +++ b/arch/arm64/kernel/acpi.c
> @@ -409,7 +409,7 @@ int apei_claim_sea(struct pt_regs *regs)
>   	 */
>   	local_daif_restore(DAIF_ERRCTX);
>   	nmi_enter();
> -	err = ghes_notify_sea();
> +	err = ghes_notify_sea(user_mode(regs));

apei_claim_sea() explicitly documents that @regs may be NULL when
called from process context, and arch/arm64/kvm/mmu.c does exactly
that:

     if (apei_claim_sea(NULL) == 0)
             return 1;

user_mode(regs) dereferences regs->pstate, so any SEA taken on a
KVM-enabled arm64 host with this patch applied will oops immediately.

This also relies on bool 1 == IN_USER == 1 and bool 0 == IN_KERNEL == 0
purely by coincidence. The day someone reorders the enumerators or
adds a new value before IN_USER, the policy silently flips (kernel-
mode poison would start raising SIGBUS again) with no warning from
the compiler. Please convert explicitly with a ternary as shown
above; problem #1 then falls out for free.

Please handle the NULL case explicitly, e.g.:

     err = ghes_notify_sea(regs ? (user_mode(regs) ? GHES_CTX_USER
                                                   : GHES_CTX_KERNEL)
                               : GHES_CTX_NA);



>   	nmi_exit();
>   
>   	/*
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 3236a3ce79d6..6f265893cddf 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -529,7 +529,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
>   }
>   
>   static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
> -				       int sev, bool sync)
> +				       int sev, bool sync, enum context context)
>   {
>   	int flags = -1;
>   	int sec_sev = ghes_severity(gdata->error_severity);
> @@ -543,7 +543,7 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
>   	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
>   		flags = MF_SOFT_OFFLINE;
>   	if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
> -		flags = sync ? MF_ACTION_REQUIRED : 0;
> +		flags = sync && context == IN_USER ? MF_ACTION_REQUIRED : 0;
>   
>   	if (flags != -1)
>   		return ghes_do_memory_failure(mem_err->physical_addr, flags);
> @@ -552,10 +552,10 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
>   }
>   
>   static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
> -				     int sev, bool sync)
> +				     int sev, bool sync, enum context context)
>   {
>   	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
> -	int flags = sync ? MF_ACTION_REQUIRED : 0;
> +	int flags = sync && context == IN_USER ? MF_ACTION_REQUIRED : 0;
>   	int length = gdata->error_data_length;
>   	char error_type[120];
>   	bool queued = false;
> @@ -910,7 +910,8 @@ static void ghes_log_hwerr(int sev, guid_t *sec_type)
>   }
>   
>   static void ghes_do_proc(struct ghes *ghes,
> -			 const struct acpi_hest_generic_status *estatus)
> +			 const struct acpi_hest_generic_status *estatus,
> +			 enum context context)
>   {
>   	int sev, sec_sev;
>   	struct acpi_hest_generic_data *gdata;
> @@ -937,11 +938,11 @@ static void ghes_do_proc(struct ghes *ghes,
>   			atomic_notifier_call_chain(&ghes_report_chain, sev, mem_err);
>   
>   			arch_apei_report_mem_error(sev, mem_err);
> -			queued = ghes_handle_memory_failure(gdata, sev, sync);
> +			queued = ghes_handle_memory_failure(gdata, sev, sync, context);
>   		} else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
>   			ghes_handle_aer(gdata);
>   		} else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
> -			queued = ghes_handle_arm_hw_error(gdata, sev, sync);
> +			queued = ghes_handle_arm_hw_error(gdata, sev, sync, context);
>   		} else if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR)) {
>   			struct cxl_cper_sec_prot_err *prot_err = acpi_hest_get_payload(gdata);
>   
> @@ -1190,7 +1191,7 @@ static int ghes_proc(struct ghes *ghes)
>   		if (ghes_print_estatus(NULL, ghes->generic, estatus))
>   			ghes_estatus_cache_add(ghes->generic, estatus);
>   	}
> -	ghes_do_proc(ghes, estatus);
> +	ghes_do_proc(ghes, estatus, NO_USE);
>   
>   out:
>   	ghes_clear_estatus(ghes, estatus, buf_paddr, FIX_APEI_GHES_IRQ);
> @@ -1297,7 +1298,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
>   		len = cper_estatus_len(estatus);
>   		node_len = GHES_ESTATUS_NODE_LEN(len);
>   
> -		ghes_do_proc(estatus_node->ghes, estatus);
> +		ghes_do_proc(estatus_node->ghes, estatus, estatus_node->context);
>   
>   		if (!ghes_estatus_cached(estatus)) {
>   			generic = estatus_node->generic;
> @@ -1335,7 +1336,8 @@ static void ghes_print_queued_estatus(void)
>   }
>   
>   static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
> -				       enum fixed_addresses fixmap_idx)
> +				       enum fixed_addresses fixmap_idx,
> +				       enum context context)
>   {
>   	struct acpi_hest_generic_status *estatus, tmp_header;
>   	struct ghes_estatus_node *estatus_node;
> @@ -1364,6 +1366,7 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>   	if (!estatus_node)
>   		return -ENOMEM;
>   
> +	estatus_node->context = context;
>   	estatus_node->ghes = ghes;
>   	estatus_node->generic = ghes->generic;
>   	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
> @@ -1398,14 +1401,15 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>   }
>   
>   static int ghes_in_nmi_spool_from_list(struct list_head *rcu_list,
> -				       enum fixed_addresses fixmap_idx)
> +				       enum fixed_addresses fixmap_idx,
> +				       enum context context)
>   {
>   	int ret = -ENOENT;
>   	struct ghes *ghes;
>   
>   	rcu_read_lock();
>   	list_for_each_entry_rcu(ghes, rcu_list, list) {
> -		if (!ghes_in_nmi_queue_one_entry(ghes, fixmap_idx))
> +		if (!ghes_in_nmi_queue_one_entry(ghes, fixmap_idx, context))
>   			ret = 0;
>   	}
>   	rcu_read_unlock();
> @@ -1488,7 +1492,7 @@ static LIST_HEAD(ghes_sea);
>    * Return 0 only if one of the SEA error sources successfully reported an error
>    * record sent from the firmware.
>    */
> -int ghes_notify_sea(void)
> +int ghes_notify_sea(enum context context)
>   {
>   	static DEFINE_RAW_SPINLOCK(ghes_notify_lock_sea);
>   	int rv;
> @@ -1497,7 +1501,7 @@ int ghes_notify_sea(void)
>   		return -ENOENT;
>   
>   	raw_spin_lock(&ghes_notify_lock_sea);
> -	rv = ghes_in_nmi_spool_from_list(&ghes_sea, FIX_APEI_GHES_SEA);
> +	rv = ghes_in_nmi_spool_from_list(&ghes_sea, FIX_APEI_GHES_SEA, context);
>   	raw_spin_unlock(&ghes_notify_lock_sea);
>   
>   	return rv;
> @@ -1552,7 +1556,7 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>   		return ret;
>   
>   	raw_spin_lock(&ghes_notify_lock_nmi);
> -	if (!ghes_in_nmi_spool_from_list(&ghes_nmi, FIX_APEI_GHES_NMI))
> +	if (!ghes_in_nmi_spool_from_list(&ghes_nmi, FIX_APEI_GHES_NMI, NO_USE))
>   		ret = NMI_HANDLED;
>   	raw_spin_unlock(&ghes_notify_lock_nmi);
>   
> @@ -1606,7 +1610,7 @@ static void ghes_nmi_init_cxt(void)
>   static int __ghes_sdei_callback(struct ghes *ghes,
>   				enum fixed_addresses fixmap_idx)
>   {
> -	if (!ghes_in_nmi_queue_one_entry(ghes, fixmap_idx)) {
> +	if (!ghes_in_nmi_queue_one_entry(ghes, fixmap_idx, NO_USE)) {
>   		irq_work_queue(&ghes_proc_irq_work);
>   
>   		return 0;
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 8d7e5caef3f1..646cd5c3c0ca 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -33,10 +33,12 @@ struct ghes {
>   	void __iomem *error_status_vaddr;
>   };
>   
> +enum context {NO_USE = -1, IN_KERNEL = 0, IN_USER = 1};

This goes into include/acpi/ghes.h, which is included widely. The
type name and all three enumerators are far too generic; "IN_USER"
and "IN_KERNEL" in particular are likely to collide with driver-local
enums in the future, and once this is merged the names are hard to
change.

Please prefix:

     enum ghes_exec_ctx {
             GHES_CTX_NA     = -1,
             GHES_CTX_KERNEL = 0,
             GHES_CTX_USER   = 1,
     };


Thanks.
Shuai

next prev parent reply	other threads:[~2026-05-27  9:34 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18  8:49 [PATCH v14 0/8] arm64: add ARCH_HAS_COPY_MC support Ruidong Tian
2026-05-18  8:49 ` [PATCH v14 1/8] uaccess: add generic fallback version of copy_mc_to_user() Ruidong Tian
2026-05-27  9:16   ` Shuai Xue
2026-05-18  8:49 ` [PATCH v14 2/8] ACPI: APEI: GHES: use exception context to gate SIGBUS on poison consumption Ruidong Tian
2026-05-27  9:34   ` Shuai Xue [this message]
2026-05-18  8:49 ` [PATCH v14 3/8] arm64: add support for ARCH_HAS_COPY_MC Ruidong Tian
2026-05-27 11:35   ` Shuai Xue
2026-06-04  8:10     ` Ruidong Tian
2026-05-18  8:49 ` [PATCH v14 4/8] mm/hwpoison: return -EFAULT when copy fail in copy_mc_[user]_highpage() Ruidong Tian
2026-05-27 11:44   ` Shuai Xue
2026-05-18  8:49 ` [PATCH v14 5/8] arm64: support copy_mc_[user]_highpage() Ruidong Tian
2026-05-27 12:11   ` Shuai Xue
2026-05-18  8:49 ` [PATCH v14 6/8] lib/test: memcpy_kunit: add copy_page() and copy_mc_page() tests Ruidong Tian
2026-05-27 13:43   ` Shuai Xue
2026-05-18  8:49 ` [PATCH v14 7/8] arm64: introduce copy_mc_to_kernel() implementation Ruidong Tian
2026-05-28  3:10   ` Shuai Xue
2026-05-18  8:49 ` [PATCH v14 8/8] lib/tests: memcpy_kunit: add memcpy_mc() and memcpy_mc_large() test Ruidong Tian
2026-05-28  3:17   ` Shuai Xue
2026-05-18 15:05 ` [PATCH v14 0/8] arm64: add ARCH_HAS_COPY_MC support Kefeng Wang
2026-06-05  7:33   ` Ruidong Tian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6ca8a87-c341-4368-9773-39c48ab9297d@linux.alibaba.com \
    --to=xueshuai@linux$(echo .)alibaba.com \
    --cc=andreyknvl@gmail$(echo .)com \
    --cc=aneesh.kumar@kernel$(echo .)org \
    --cc=catalin.marinas@arm$(echo .)com \
    --cc=christophe.leroy@csgroup$(echo .)eu \
    --cc=dvyukov@google$(echo .)com \
    --cc=glider@google$(echo .)com \
    --cc=guohanjun@huawei$(echo .)com \
    --cc=james.morse@arm$(echo .)com \
    --cc=kasan-dev@googlegroups$(echo .)com \
    --cc=linux-arm-kernel@lists$(echo .)infradead.org \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linux-mm@kvack$(echo .)org \
    --cc=linuxppc-dev@lists$(echo .)ozlabs.org \
    --cc=mchehab@kernel$(echo .)org \
    --cc=mingo@redhat$(echo .)com \
    --cc=mpe@ellerman$(echo .)id.au \
    --cc=naveen.n.rao@linux$(echo .)ibm.com \
    --cc=npiggin@gmail$(echo .)com \
    --cc=rafael@kernel$(echo .)org \
    --cc=robin.murphy@arm$(echo .)com \
    --cc=ryabinin.a.a@gmail$(echo .)com \
    --cc=tglx@linutronix$(echo .)de \
    --cc=tianruidong@linux$(echo .)alibaba.com \
    --cc=tongtiangen@huawei$(echo .)com \
    --cc=tony.luck@intel$(echo .)com \
    --cc=vincenzo.frascino@arm$(echo .)com \
    --cc=will@kernel$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox