public inbox for linux-next@vger.kernel.org 
 help / color / mirror / Atom feed
From: Bert Karwatzki <spasswolf@web•de>
To: Borislav Petkov <bp@alien8•de>, Yazen Ghannam <yazen.ghannam@amd•com>
Cc: Mario Limonciello <mario.limonciello@amd•com>,
	spasswolf@web•de, Nikolay Borisov <nik.borisov@suse•com>,
	Tony Luck <tony.luck@intel•com>,
	linux-kernel@vger•kernel.org, 	linux-next@vger•kernel.org,
	linux-edac@vger•kernel.org, 	linux-acpi@vger•kernel.org,
	x86@kernel•org, rafael@kernel•org, 	qiuxu.zhuo@intel•com,
	Smita.KoralahalliChannabasappa@amd•com
Subject: Re: spurious (?) mce Hardware Error messages in v6.19
Date: Sun, 05 Apr 2026 10:47:02 +0200	[thread overview]
Message-ID: <f5f4251da81752ccae59b6ffa158fd8587458431.camel@web.de> (raw)
In-Reply-To: <20260403140501.GCac_JDVCaX0eCIDUj@fat_crate.local>

Am Freitag, dem 03.04.2026 um 16:05 +0200 schrieb Borislav Petkov:
> On Mon, Feb 23, 2026 at 04:53:16PM -0500, Yazen Ghannam wrote:
> > Thanks Bert for confirming.
> > 
> > I'll send a patch to filter this signature.
> 
> Bert, pls try this:
> 
> From: Yazen Ghannam <yazen.ghannam@amd•com>
> Date: Sat, 28 Feb 2026 09:08:14 -0500
> Subject: [PATCH] x86/mce/amd: Filter bogus hardware errors on Zen3 clients
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Users have been observing multiple L3 cache deferred errors after recent
> kernel rework of deferred error handling.¹ ⁴
> 
> The errors are bogus due to inconsistent status values. Also, user verified
> that bogus MCA_DESTAT values are present on the system even with an older
> kernel.²
> 
> The errors seem to be garbage values present in the MCA_DESTAT of some L3
> cache banks. These were implicitly ignored before the recent kernel rework
> because these do not generate a deferred error interrupt.
> 
> A later revision of the rework patch was merged for v6.19. This naturally
> filtered out most of the bogus error logs. However, a few signatures still
> remain.³
> 
> Minimize the scope of the filter to the reported CPU
> family/model/stepping and only for errors which don't have the Enabled
> bit in the MCi status MSR.
> 
> ¹ https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de
> ² https://lore.kernel.org/6e1eda7dd55f6fa30405edf7b0f75695cf55b237.camel@web.de
> ³ https://lore.kernel.org/21ba47fa8893b33b94370c2a42e5084cf0d2e975.camel@web.de
> ⁴ https://lore.kernel.org/r/CAKFB093B2k3sKsGJ_QNX1jVQsaXVFyy=wNwpzCGLOXa_vSDwXw@mail.gmail.com
> 
>   [ bp: Generalize the condition according to which errors are bogus. ]
> 
> Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
> Closes: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de
> Reported-by: Bert Karwatzki <spasswolf@web•de>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd•com>
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8•de>
> Reviewed-by: Mario Limonciello <mario.limonciello@amd•com>
> Cc: stable@vger•kernel.org
> Link: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de
> ---
>  arch/x86/kernel/cpu/mce/amd.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 146f4207a863..7fc78759cd4e 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -606,6 +606,14 @@ bool amd_filter_mce(struct mce *m)
>  	enum smca_bank_types bank_type = smca_get_bank_type(m->extcpu, m->bank);
>  	struct cpuinfo_x86 *c = &boot_cpu_data;
>  
> +	/* Bogus hw errors on Cezanne A0. */
> +	if (c->x86 == 0x19 &&
> +	    c->x86_model == 0x50 &&
> +	    c->x86_stepping == 0x0) {
> +		if (!(m->status & MCI_STATUS_EN))
> +			return true;
> +	}
> +
>  	/* See Family 17h Models 10h-2Fh Erratum #1114. */
>  	if (c->x86 == 0x17 &&
>  	    c->x86_model >= 0x10 && c->x86_model <= 0x2F &&
> -- 
> 2.51.0
> 

I tested this patch on v6.19.11 and as these bogus messages are pretty rare 
I added a monitoring printk():

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 159f0becf8cc..54fa3863ea0b 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -608,8 +608,10 @@ bool amd_filter_mce(struct mce *m)
        if (c->x86 == 0x19 &&
            c->x86_model == 0x50 &&
            c->x86_stepping == 0x0) {
-               if (!(m->status & MCI_STATUS_EN))
+               if (!(m->status & MCI_STATUS_EN)) {
+                       printk(KERN_INFO "%s: filtering bogus hw error on Cezanne A0\n", __func__);
                        return true;
+               }
        }
 
        /* See Family 17h Models 10h-2Fh Erratum #1114. */

After ~12h of uptime I got the messages that a bogus error was filtered:
[42603.594231] [      C0] amd_filter_mce: filtering bogus hw error on Cezanne A0
So the patch seems to work fine:

Tested-By: Bert Karwatzki <spasswolf@web•de>

Bert Karwatzki

  reply	other threads:[~2026-04-05  8:47 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-15  1:00 spurious mce Hardware Error messages in next-20250912 Bert Karwatzki
2025-09-15 17:55 ` Yazen Ghannam
2025-09-15 21:03   ` Bert Karwatzki
2025-09-15 21:43     ` Bert Karwatzki
2025-09-16  9:10       ` Borislav Petkov
2025-09-16 14:07         ` Yazen Ghannam
2025-09-16 20:27           ` Bert Karwatzki
2025-09-17  7:13             ` Bert Karwatzki
2025-09-17 14:41               ` Yazen Ghannam
2025-09-17 15:33                 ` Bert Karwatzki
2025-09-17 19:26                   ` Yazen Ghannam
2025-09-17 21:15                     ` Yazen Ghannam
2025-09-17 22:01                       ` Bert Karwatzki
2025-09-18 10:20                     ` Nikolay Borisov
2025-09-18 21:00                       ` Yazen Ghannam
2025-09-18 21:04                         ` Luck, Tony
2025-09-18 21:14                           ` Yazen Ghannam
2025-09-18 22:07                         ` Bert Karwatzki
2025-10-09 13:20                           ` Yazen Ghannam
2026-02-12 12:50                             ` spurious (?) mce Hardware Error messages in v6.19 Bert Karwatzki
2026-02-13 12:45                               ` Bert Karwatzki
2026-02-16 20:25                               ` Yazen Ghannam
2026-02-19 14:33                                 ` Yazen Ghannam
2026-02-19 15:43                                   ` Bert Karwatzki
2026-02-20 16:49                                     ` Mario Limonciello
2026-02-20 18:24                                       ` Bert Karwatzki
2026-02-23 21:53                                         ` Yazen Ghannam
2026-04-03 14:05                                           ` Borislav Petkov
2026-04-05  8:47                                             ` Bert Karwatzki [this message]
2026-04-05 10:46                                               ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5f4251da81752ccae59b6ffa158fd8587458431.camel@web.de \
    --to=spasswolf@web$(echo .)de \
    --cc=Smita.KoralahalliChannabasappa@amd$(echo .)com \
    --cc=bp@alien8$(echo .)de \
    --cc=linux-acpi@vger$(echo .)kernel.org \
    --cc=linux-edac@vger$(echo .)kernel.org \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linux-next@vger$(echo .)kernel.org \
    --cc=mario.limonciello@amd$(echo .)com \
    --cc=nik.borisov@suse$(echo .)com \
    --cc=qiuxu.zhuo@intel$(echo .)com \
    --cc=rafael@kernel$(echo .)org \
    --cc=tony.luck@intel$(echo .)com \
    --cc=x86@kernel$(echo .)org \
    --cc=yazen.ghannam@amd$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox