From: Junhao He <hejunhao3@h-partners•com>
To: <rafael@kernel•org>, <tony.luck@intel•com>,
<guohanjun@huawei•com>, <mchehab@kernel•org>,
<xueshuai@linux•alibaba.com>, <jarkko@kernel•org>,
<yazen.ghannam@amd•com>, <jane.chu@oracle•com>, <lenb@kernel•org>,
<linmiaohe@huawei•com>
Cc: <bp@alien8•de>, <linux-acpi@vger•kernel.org>,
<linux-arm-kernel@lists•infradead.org>,
<linux-kernel@vger•kernel.org>, <linux-edac@vger•kernel.org>,
<tanxiaofei@huawei•com>, <linuxarm@huawei•com>,
<liuyonglong@huawei•com>, <mawupeng1@huawei•com>,
<hejunhao3@h-partners•com>
Subject: [PATCH v2] ACPI: APEI: Handle repeated SEA error storms
Date: Wed, 27 May 2026 16:27:07 +0800 [thread overview]
Message-ID: <20260527082707.2013499-1-hejunhao3@h-partners.com> (raw)
When hardware memory corruption occurs and a user process accesses the
corrupted page, the CPU triggers a Synchronous External Abort (SEA).
The kernel invokes do_sea() to handle the exception, which calls
memory_failure() to handle the faulty page.
Scenario 1: Memory Error Interrupt First, then SEA
The page is already poisoned by the memory error interrupt path. The
subsequent SEA handler sends a SIGBUS to the task, which accesses the
poisoned page. This flow is correct.
Scenario 2: SEA first, then memory error interrupt (problematic scenario)
If a user task directly accesses corrupted memory through a PFNMAP-style
mapping (e.g., devmem), the page may still be in the free-buddy state when
SEA is handled. In this case, memory_failure() will poison the page without
invoking kill_accessing_process(), and then takes the free-buddy recovery
path.
After the CPU returns to the task context, the task re-enters the SEA
handler due to the same access. However, ghes_estatus_cached() suppresses
all subsequent entries during the 10-second window, preventing
ghes_do_proc() from being called. This suppression blocks the
MF_ACTION_REQUIRED-based SIGBUS delivery, causing the kernel to fail to
kill the task immediately. Consequently, the process keeps re-entering
the SEA handler, leading to an SEA storm. Later, the memory error
interrupt path also cannot kill the task, leaving the system stuck in
this repeated loop.
The following error logs are explained using the devmem process:
NOTICE: SEA Handle
[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
[Hardware Error]: event severity: recoverable
[Hardware Error]: section_type: ARM processor error
[Hardware Error]: physical fault address: 0x0000001000093c00
[T54990] Memory failure: 0x1000093: recovery action for free buddy page: Recovered
[ T9955] EDAC MC0: 1 UE Multi-bit ECC on unknown memory
(page:0x1000093 offset:0xc00 grain:1 - APEI location: ...)
NOTICE: SEA Handle
NOTICE: SEA Handle
...
... ---> SEA storm
...
NOTICE: SEA Handle
[ T9955] Memory failure: 0x1000093: already hardware poisoned
ghes_print_estatus: 1 callbacks suppressed
[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
[Hardware Error]: event severity: recoverable
[Hardware Error]: section_type: ARM processor error
[Hardware Error]: physical fault address: 0x0000001000093c00
[T54990] Memory failure: 0x1000093: already hardware poisoned
[T54990] 0x1000093: Sending SIGBUS to devmem:54990 due to hardware memory corruption
To resolve this, return an error when encountering the same SEA again.
The subsequent SEA handler invocation uses arm64_notify_die() to send a
SIGBUS signal to the task, which terminates the process and prevents it
from re-entering the handler loop.
Signed-off-by: Junhao He <hejunhao3@h-partners•com>
---
drivers/acpi/apei/ghes.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
Changes in V2:
1. update the commit message per suggestion from Xueshuai
2. Add a check to only return failure on the ghes_notify_sea() path,
avoiding impact on other NMI-type GHES handlers.
Link to V1 - https://lore.kernel.org/all/20251030071321.2763224-1-hejunhao3@h-partners.com/
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 3236a3ce79d6..787664740150 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1383,8 +1383,16 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
/* This error has been reported before, don't process it again. */
- if (ghes_estatus_cached(estatus))
+ if (ghes_estatus_cached(estatus)) {
+ /*
+ * Return failure on duplicate SEA entries so that the
+ * subsequent SEA handler invocation sends a SIGBUS signal to
+ * the task to prevent it from re-entering the handler loop.
+ */
+ if (is_hest_sync_notify(ghes))
+ rc = -ECANCELED;
goto no_work;
+ }
llist_add(&estatus_node->llnode, &ghes_estatus_llist);
--
2.33.0
next reply other threads:[~2026-05-27 8:27 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-27 8:27 Junhao He [this message]
2026-05-28 1:48 ` [PATCH v2] ACPI: APEI: Handle repeated SEA error storms mawupeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260527082707.2013499-1-hejunhao3@h-partners.com \
--to=hejunhao3@h-partners$(echo .)com \
--cc=bp@alien8$(echo .)de \
--cc=guohanjun@huawei$(echo .)com \
--cc=jane.chu@oracle$(echo .)com \
--cc=jarkko@kernel$(echo .)org \
--cc=lenb@kernel$(echo .)org \
--cc=linmiaohe@huawei$(echo .)com \
--cc=linux-acpi@vger$(echo .)kernel.org \
--cc=linux-arm-kernel@lists$(echo .)infradead.org \
--cc=linux-edac@vger$(echo .)kernel.org \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=linuxarm@huawei$(echo .)com \
--cc=liuyonglong@huawei$(echo .)com \
--cc=mawupeng1@huawei$(echo .)com \
--cc=mchehab@kernel$(echo .)org \
--cc=rafael@kernel$(echo .)org \
--cc=tanxiaofei@huawei$(echo .)com \
--cc=tony.luck@intel$(echo .)com \
--cc=xueshuai@linux$(echo .)alibaba.com \
--cc=yazen.ghannam@amd$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox