public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed
* [PATCH v1] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
@ 2026-06-04 23:12 Shanker Donthineni
  2026-06-05  8:01 ` Catalin Marinas
  2026-06-05  9:26 ` Vladimir Murzin
  0 siblings, 2 replies; 4+ messages in thread
From: Shanker Donthineni @ 2026-06-04 23:12 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel
  Cc: Mark Rutland, linux-kernel, linux-doc, Shanker Donthineni,
	Vikram Sethi, Jason Sequeira

On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
observed by a peripheral before an older, non-overlapping Device-nGnR*
store to the same peripheral. This breaks the program-order guarantee
that software expects for Device-nGnR* accesses and can leave a
peripheral in an incorrect state, as a load is observed before an
earlier store takes effect.

The erratum can occur only when all of the following apply:

  - A PE executes a Device-nGnR* store followed by a younger
    Device-nGnR* load.
  - The store is not a store-release.
  - The accesses target the same peripheral and do not overlap in bytes.
  - There is at most one intervening Device-nGnR* store in program
    order, and there are no intervening Device-nGnR* loads.
  - There is no DSB, and no DMB that orders loads, between the store and
    the load.
  - Specific micro-architectural and timing conditions occur.

Two ways to restore ordering: insert a barrier (any DSB, or a DMB that
orders loads) between the store and the load, or make the store a
store-release. A load-acquire on the load side would not help, because
acquire semantics do not prevent a load from being observed ahead of an
older store; only the store side (release or a barrier) closes the
window.

Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
to stlr* (Store-Release), which removes the "store is not a
store-release" condition for every device write the kernel issues.
Because writel() and writel_relaxed() are both built on __raw_writel()
in asm-generic/io.h, patching the raw variants covers both the
non-relaxed and relaxed APIs without touching the higher layers. Note
that writel()'s own barrier sits before the store, so it does not order
the store against a subsequent readl(); the store-release promotion is
what provides that ordering.

Like ARM64_ERRATUM_832075 on the load side, the change is gated on a new
ARM64_WORKAROUND_DEVICE_STORE_RELEASE capability and only activated on
parts that match MIDR_NVIDIA_OLYMPUS, so unaffected CPUs continue to use
the plain str* sequence.

Co-developed-by: Vikram Sethi <vsethi@nvidia•com>
Signed-off-by: Vikram Sethi <vsethi@nvidia•com>
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia•com>
---
 Documentation/arch/arm64/silicon-errata.rst |  2 ++
 arch/arm64/Kconfig                          | 23 ++++++++++++++++++++
 arch/arm64/include/asm/io.h                 | 24 ++++++++++++++-------
 arch/arm64/kernel/cpu_errata.c              |  8 +++++++
 arch/arm64/tools/cpucaps                    |  1 +
 5 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index 211119ce7adc..899bed3908bb 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -256,6 +256,8 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | NVIDIA         | Carmel Core     | N/A             | NVIDIA_CARMEL_CNP_ERRATUM   |
 +----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA         | Olympus core    | T410-OLY-1027   | NVIDIA_OLYMPUS_1027_ERRATUM |
++----------------+-----------------+-----------------+-----------------------------+
 | NVIDIA         | T241 GICv3/4.x  | T241-FABRIC-4   | N/A                         |
 +----------------+-----------------+-----------------+-----------------------------+
 | NVIDIA         | T241 MPAM       | T241-MPAM-1     | N/A                         |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fe60738e5943..a6bac84b05a1 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -564,6 +564,29 @@ config ARM64_ERRATUM_832075
 
 	  If unsure, say Y.
 
+config NVIDIA_OLYMPUS_1027_ERRATUM
+	bool "NVIDIA Olympus: device store/load ordering erratum"
+	default y
+	help
+	  This option adds an alternative code sequence to work around an
+	  NVIDIA Olympus core erratum where a Device-nGnR* store can be
+	  observed by a peripheral after a younger Device-nGnR* load to the
+	  same peripheral. This breaks the program order that drivers rely
+	  on for MMIO and can leave a device in an incorrect state.
+
+	  The workaround promotes the raw MMIO store helpers
+	  (__raw_writeb/w/l/q) to Store-Release (STLR), which restores the
+	  required ordering. Because writel() and writel_relaxed() are built
+	  on __raw_writel(), both are covered without changes to the higher
+	  layers.
+
+	  The fix is applied through the alternatives framework, so enabling
+	  this option does not by itself activate the workaround: it is
+	  patched in only when an affected CPU is detected, and is a no-op on
+	  unaffected CPUs.
+
+	  If unsure, say Y.
+
 config ARM64_ERRATUM_834220
 	bool "Cortex-A57: 834220: Stage 2 translation fault might be incorrectly reported in presence of a Stage 1 fault (rare)"
 	depends on KVM
diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 8cbd1e96fd50..b6d7966e9c19 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -25,29 +25,37 @@
 #define __raw_writeb __raw_writeb
 static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
 {
-	volatile u8 __iomem *ptr = addr;
-	asm volatile("strb %w0, %1" : : "rZ" (val), "Qo" (*ptr));
+	asm volatile(ALTERNATIVE("strb %w0, [%1]",
+				 "stlrb %w0, [%1]",
+				 ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
+		     : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_writew __raw_writew
 static __always_inline void __raw_writew(u16 val, volatile void __iomem *addr)
 {
-	volatile u16 __iomem *ptr = addr;
-	asm volatile("strh %w0, %1" : : "rZ" (val), "Qo" (*ptr));
+	asm volatile(ALTERNATIVE("strh %w0, [%1]",
+				 "stlrh %w0, [%1]",
+				 ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
+		     : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_writel __raw_writel
 static __always_inline void __raw_writel(u32 val, volatile void __iomem *addr)
 {
-	volatile u32 __iomem *ptr = addr;
-	asm volatile("str %w0, %1" : : "rZ" (val), "Qo" (*ptr));
+	asm volatile(ALTERNATIVE("str %w0, [%1]",
+				 "stlr %w0, [%1]",
+				 ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
+		     : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_writeq __raw_writeq
 static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
 {
-	volatile u64 __iomem *ptr = addr;
-	asm volatile("str %x0, %1" : : "rZ" (val), "Qo" (*ptr));
+	asm volatile(ALTERNATIVE("str %x0, [%1]",
+				 "stlr %x0, [%1]",
+				 ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
+		     : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_readb __raw_readb
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 5377e4c2eba2..958d7f16bfeb 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -809,6 +809,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 		ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_CARMEL),
 	},
 #endif
+#ifdef CONFIG_NVIDIA_OLYMPUS_1027_ERRATUM
+	{
+		/* NVIDIA Olympus core */
+		.desc = "NVIDIA Olympus device load/store ordering erratum",
+		.capability = ARM64_WORKAROUND_DEVICE_STORE_RELEASE,
+		ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
+	},
+#endif
 #ifdef CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
 	{
 		/*
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 811c2479e82d..d367257bf770 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -120,6 +120,7 @@ WORKAROUND_CAVIUM_TX2_219_PRFM
 WORKAROUND_CAVIUM_TX2_219_TVM
 WORKAROUND_CLEAN_CACHE
 WORKAROUND_DEVICE_LOAD_ACQUIRE
+WORKAROUND_DEVICE_STORE_RELEASE
 WORKAROUND_NVIDIA_CARMEL_CNP
 WORKAROUND_PMUV3_IMPDEF_TRAPS
 WORKAROUND_QCOM_FALKOR_E1003
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
  2026-06-04 23:12 [PATCH v1] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum Shanker Donthineni
@ 2026-06-05  8:01 ` Catalin Marinas
  2026-06-05  9:26 ` Vladimir Murzin
  1 sibling, 0 replies; 4+ messages in thread
From: Catalin Marinas @ 2026-06-05  8:01 UTC (permalink / raw)
  To: Shanker Donthineni
  Cc: Will Deacon, linux-arm-kernel, Mark Rutland, linux-kernel,
	linux-doc, Vikram Sethi, Jason Sequeira

On Thu, Jun 04, 2026 at 06:12:54PM -0500, Shanker Donthineni wrote:
> On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
> observed by a peripheral before an older, non-overlapping Device-nGnR*
> store to the same peripheral. This breaks the program-order guarantee
> that software expects for Device-nGnR* accesses and can leave a
> peripheral in an incorrect state, as a load is observed before an
> earlier store takes effect.
> 
> The erratum can occur only when all of the following apply:
> 
>   - A PE executes a Device-nGnR* store followed by a younger
>     Device-nGnR* load.
>   - The store is not a store-release.
>   - The accesses target the same peripheral and do not overlap in bytes.
>   - There is at most one intervening Device-nGnR* store in program
>     order, and there are no intervening Device-nGnR* loads.
>   - There is no DSB, and no DMB that orders loads, between the store and
>     the load.
>   - Specific micro-architectural and timing conditions occur.
> 
> Two ways to restore ordering: insert a barrier (any DSB, or a DMB that
> orders loads) between the store and the load, or make the store a
> store-release. A load-acquire on the load side would not help, because
> acquire semantics do not prevent a load from being observed ahead of an
> older store; only the store side (release or a barrier) closes the
> window.

Ignoring Device-nGnR*, a store-release followed by a load (not
load-acquire) would not guarantee any ordering. I assume the
store-release behaviour is specific to this erratum - part of the
preconditions.

The patch looks fine to me.

Reviewed-by: Catalin Marinas <catalin.marinas@arm•com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
  2026-06-04 23:12 [PATCH v1] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum Shanker Donthineni
  2026-06-05  8:01 ` Catalin Marinas
@ 2026-06-05  9:26 ` Vladimir Murzin
  2026-06-05 14:34   ` Shanker Donthineni
  1 sibling, 1 reply; 4+ messages in thread
From: Vladimir Murzin @ 2026-06-05  9:26 UTC (permalink / raw)
  To: Shanker Donthineni, Catalin Marinas, Will Deacon,
	linux-arm-kernel
  Cc: Mark Rutland, linux-kernel, linux-doc, Vikram Sethi,
	Jason Sequeira

On 6/5/26 00:12, Shanker Donthineni wrote:
> On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
> observed by a peripheral before an older, non-overlapping Device-nGnR*
> store to the same peripheral. This breaks the program-order guarantee
> that software expects for Device-nGnR* accesses and can leave a
> peripheral in an incorrect state, as a load is observed before an
> earlier store takes effect.
> 
> The erratum can occur only when all of the following apply:
> 
>   - A PE executes a Device-nGnR* store followed by a younger
>     Device-nGnR* load.
>   - The store is not a store-release.
>   - The accesses target the same peripheral and do not overlap in bytes.
>   - There is at most one intervening Device-nGnR* store in program
>     order, and there are no intervening Device-nGnR* loads.
>   - There is no DSB, and no DMB that orders loads, between the store and
>     the load.
>   - Specific micro-architectural and timing conditions occur.
> 
> Two ways to restore ordering: insert a barrier (any DSB, or a DMB that
> orders loads) between the store and the load, or make the store a
> store-release. A load-acquire on the load side would not help, because
> acquire semantics do not prevent a load from being observed ahead of an
> older store; only the store side (release or a barrier) closes the
> window.
> 
> Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
> to stlr* (Store-Release), which removes the "store is not a
> store-release" condition for every device write the kernel issues.
> Because writel() and writel_relaxed() are both built on __raw_writel()
> in asm-generic/io.h, patching the raw variants covers both the
> non-relaxed and relaxed APIs without touching the higher layers. Note
> that writel()'s own barrier sits before the store, so it does not order
> the store against a subsequent readl(); the store-release promotion is
> what provides that ordering.
> 
> Like ARM64_ERRATUM_832075 on the load side, the change is gated on a new
> ARM64_WORKAROUND_DEVICE_STORE_RELEASE capability and only activated on
> parts that match MIDR_NVIDIA_OLYMPUS, so unaffected CPUs continue to use
> the plain str* sequence.
> 
> Co-developed-by: Vikram Sethi <vsethi@nvidia•com>
> Signed-off-by: Vikram Sethi <vsethi@nvidia•com>
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia•com>
> ---
>  Documentation/arch/arm64/silicon-errata.rst |  2 ++
>  arch/arm64/Kconfig                          | 23 ++++++++++++++++++++
>  arch/arm64/include/asm/io.h                 | 24 ++++++++++++++-------
>  arch/arm64/kernel/cpu_errata.c              |  8 +++++++
>  arch/arm64/tools/cpucaps                    |  1 +
>  5 files changed, 50 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> index 211119ce7adc..899bed3908bb 100644
> --- a/Documentation/arch/arm64/silicon-errata.rst
> +++ b/Documentation/arch/arm64/silicon-errata.rst
> @@ -256,6 +256,8 @@ stable kernels.
>  +----------------+-----------------+-----------------+-----------------------------+
>  | NVIDIA         | Carmel Core     | N/A             | NVIDIA_CARMEL_CNP_ERRATUM   |
>  +----------------+-----------------+-----------------+-----------------------------+
> +| NVIDIA         | Olympus core    | T410-OLY-1027   | NVIDIA_OLYMPUS_1027_ERRATUM |
> ++----------------+-----------------+-----------------+-----------------------------+
>  | NVIDIA         | T241 GICv3/4.x  | T241-FABRIC-4   | N/A                         |
>  +----------------+-----------------+-----------------+-----------------------------+
>  | NVIDIA         | T241 MPAM       | T241-MPAM-1     | N/A                         |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fe60738e5943..a6bac84b05a1 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -564,6 +564,29 @@ config ARM64_ERRATUM_832075
>  
>  	  If unsure, say Y.
>  
> +config NVIDIA_OLYMPUS_1027_ERRATUM
> +	bool "NVIDIA Olympus: device store/load ordering erratum"
> +	default y
> +	help
> +	  This option adds an alternative code sequence to work around an
> +	  NVIDIA Olympus core erratum where a Device-nGnR* store can be
> +	  observed by a peripheral after a younger Device-nGnR* load to the
> +	  same peripheral. This breaks the program order that drivers rely
> +	  on for MMIO and can leave a device in an incorrect state.
> +
> +	  The workaround promotes the raw MMIO store helpers
> +	  (__raw_writeb/w/l/q) to Store-Release (STLR), which restores the
> +	  required ordering. Because writel() and writel_relaxed() are built
> +	  on __raw_writel(), both are covered without changes to the higher
> +	  layers.
> +
> +	  The fix is applied through the alternatives framework, so enabling
> +	  this option does not by itself activate the workaround: it is
> +	  patched in only when an affected CPU is detected, and is a no-op on
> +	  unaffected CPUs.
> +
> +	  If unsure, say Y.
> +
>  config ARM64_ERRATUM_834220
>  	bool "Cortex-A57: 834220: Stage 2 translation fault might be incorrectly reported in presence of a Stage 1 fault (rare)"
>  	depends on KVM
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 8cbd1e96fd50..b6d7966e9c19 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -25,29 +25,37 @@
>  #define __raw_writeb __raw_writeb
>  static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
>  {
> -	volatile u8 __iomem *ptr = addr;
> -	asm volatile("strb %w0, %1" : : "rZ" (val), "Qo" (*ptr));
> +	asm volatile(ALTERNATIVE("strb %w0, [%1]",
> +				 "stlrb %w0, [%1]",
> +				 ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
> +		     : : "rZ" (val), "r" (addr));
>  }
>  

Nitpick:

The change has the side effect of undoing d044d6ba6f02 ("arm64:
io: permit offset addressing"), since stlr* do not support
offset addressing. Unaffected CPUs would continue to use str*,
but would lose the benefit of offset addressing :(

Not sure if this needs to be mentioned in the commit message...

Cheers
Vladimir

>  #define __raw_writew __raw_writew
>  static __always_inline void __raw_writew(u16 val, volatile void __iomem *addr)
>  {
> -	volatile u16 __iomem *ptr = addr;
> -	asm volatile("strh %w0, %1" : : "rZ" (val), "Qo" (*ptr));
> +	asm volatile(ALTERNATIVE("strh %w0, [%1]",
> +				 "stlrh %w0, [%1]",
> +				 ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
> +		     : : "rZ" (val), "r" (addr));
>  }
>  
>  #define __raw_writel __raw_writel
>  static __always_inline void __raw_writel(u32 val, volatile void __iomem *addr)
>  {
> -	volatile u32 __iomem *ptr = addr;
> -	asm volatile("str %w0, %1" : : "rZ" (val), "Qo" (*ptr));
> +	asm volatile(ALTERNATIVE("str %w0, [%1]",
> +				 "stlr %w0, [%1]",
> +				 ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
> +		     : : "rZ" (val), "r" (addr));
>  }
>  
>  #define __raw_writeq __raw_writeq
>  static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
>  {
> -	volatile u64 __iomem *ptr = addr;
> -	asm volatile("str %x0, %1" : : "rZ" (val), "Qo" (*ptr));
> +	asm volatile(ALTERNATIVE("str %x0, [%1]",
> +				 "stlr %x0, [%1]",
> +				 ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
> +		     : : "rZ" (val), "r" (addr));
>  }
>  
>  #define __raw_readb __raw_readb
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 5377e4c2eba2..958d7f16bfeb 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -809,6 +809,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>  		ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_CARMEL),
>  	},
>  #endif
> +#ifdef CONFIG_NVIDIA_OLYMPUS_1027_ERRATUM
> +	{
> +		/* NVIDIA Olympus core */
> +		.desc = "NVIDIA Olympus device load/store ordering erratum",
> +		.capability = ARM64_WORKAROUND_DEVICE_STORE_RELEASE,
> +		ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
> +	},
> +#endif
>  #ifdef CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>  	{
>  		/*
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 811c2479e82d..d367257bf770 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -120,6 +120,7 @@ WORKAROUND_CAVIUM_TX2_219_PRFM
>  WORKAROUND_CAVIUM_TX2_219_TVM
>  WORKAROUND_CLEAN_CACHE
>  WORKAROUND_DEVICE_LOAD_ACQUIRE
> +WORKAROUND_DEVICE_STORE_RELEASE
>  WORKAROUND_NVIDIA_CARMEL_CNP
>  WORKAROUND_PMUV3_IMPDEF_TRAPS
>  WORKAROUND_QCOM_FALKOR_E1003
> -- 2.43.0
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
  2026-06-05  9:26 ` Vladimir Murzin
@ 2026-06-05 14:34   ` Shanker Donthineni
  0 siblings, 0 replies; 4+ messages in thread
From: Shanker Donthineni @ 2026-06-05 14:34 UTC (permalink / raw)
  To: Vladimir Murzin, Catalin Marinas, Will Deacon, linux-arm-kernel
  Cc: Mark Rutland, linux-kernel, linux-doc, Vikram Sethi,
	Jason Sequeira

Hi Vladimir Murzin,

On 6/5/2026 4:26 AM, Vladimir Murzin wrote:
> External email: Use caution opening links or attachments
>
>
> On 6/5/26 00:12, Shanker Donthineni wrote:
>> On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
>> observed by a peripheral before an older, non-overlapping Device-nGnR*
>> store to the same peripheral. This breaks the program-order guarantee
>> that software expects for Device-nGnR* accesses and can leave a
>> peripheral in an incorrect state, as a load is observed before an
>> earlier store takes effect.
>>
>> The erratum can occur only when all of the following apply:
>>
>>    - A PE executes a Device-nGnR* store followed by a younger
>>      Device-nGnR* load.
>>    - The store is not a store-release.
>>    - The accesses target the same peripheral and do not overlap in bytes.
>>    - There is at most one intervening Device-nGnR* store in program
>>      order, and there are no intervening Device-nGnR* loads.
>>    - There is no DSB, and no DMB that orders loads, between the store and
>>      the load.
>>    - Specific micro-architectural and timing conditions occur.
>>
>> Two ways to restore ordering: insert a barrier (any DSB, or a DMB that
>> orders loads) between the store and the load, or make the store a
>> store-release. A load-acquire on the load side would not help, because
>> acquire semantics do not prevent a load from being observed ahead of an
>> older store; only the store side (release or a barrier) closes the
>> window.
>>
>> Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
>> to stlr* (Store-Release), which removes the "store is not a
>> store-release" condition for every device write the kernel issues.
>> Because writel() and writel_relaxed() are both built on __raw_writel()
>> in asm-generic/io.h, patching the raw variants covers both the
>> non-relaxed and relaxed APIs without touching the higher layers. Note
>> that writel()'s own barrier sits before the store, so it does not order
>> the store against a subsequent readl(); the store-release promotion is
>> what provides that ordering.
>>
>> Like ARM64_ERRATUM_832075 on the load side, the change is gated on a new
>> ARM64_WORKAROUND_DEVICE_STORE_RELEASE capability and only activated on
>> parts that match MIDR_NVIDIA_OLYMPUS, so unaffected CPUs continue to use
>> the plain str* sequence.
>>
>> Co-developed-by: Vikram Sethi <vsethi@nvidia•com>
>> Signed-off-by: Vikram Sethi <vsethi@nvidia•com>
>> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia•com>
>> ---
>>   Documentation/arch/arm64/silicon-errata.rst |  2 ++
>>   arch/arm64/Kconfig                          | 23 ++++++++++++++++++++
>>   arch/arm64/include/asm/io.h                 | 24 ++++++++++++++-------
>>   arch/arm64/kernel/cpu_errata.c              |  8 +++++++
>>   arch/arm64/tools/cpucaps                    |  1 +
>>   5 files changed, 50 insertions(+), 8 deletions(-)
>>
>> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
>> index 211119ce7adc..899bed3908bb 100644
>> --- a/Documentation/arch/arm64/silicon-errata.rst
>> +++ b/Documentation/arch/arm64/silicon-errata.rst
>> @@ -256,6 +256,8 @@ stable kernels.
>>   +----------------+-----------------+-----------------+-----------------------------+
>>   | NVIDIA         | Carmel Core     | N/A             | NVIDIA_CARMEL_CNP_ERRATUM   |
>>   +----------------+-----------------+-----------------+-----------------------------+
>> +| NVIDIA         | Olympus core    | T410-OLY-1027   | NVIDIA_OLYMPUS_1027_ERRATUM |
>> ++----------------+-----------------+-----------------+-----------------------------+
>>   | NVIDIA         | T241 GICv3/4.x  | T241-FABRIC-4   | N/A                         |
>>   +----------------+-----------------+-----------------+-----------------------------+
>>   | NVIDIA         | T241 MPAM       | T241-MPAM-1     | N/A                         |
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index fe60738e5943..a6bac84b05a1 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -564,6 +564,29 @@ config ARM64_ERRATUM_832075
>>
>>          If unsure, say Y.
>>
>> +config NVIDIA_OLYMPUS_1027_ERRATUM
>> +     bool "NVIDIA Olympus: device store/load ordering erratum"
>> +     default y
>> +     help
>> +       This option adds an alternative code sequence to work around an
>> +       NVIDIA Olympus core erratum where a Device-nGnR* store can be
>> +       observed by a peripheral after a younger Device-nGnR* load to the
>> +       same peripheral. This breaks the program order that drivers rely
>> +       on for MMIO and can leave a device in an incorrect state.
>> +
>> +       The workaround promotes the raw MMIO store helpers
>> +       (__raw_writeb/w/l/q) to Store-Release (STLR), which restores the
>> +       required ordering. Because writel() and writel_relaxed() are built
>> +       on __raw_writel(), both are covered without changes to the higher
>> +       layers.
>> +
>> +       The fix is applied through the alternatives framework, so enabling
>> +       this option does not by itself activate the workaround: it is
>> +       patched in only when an affected CPU is detected, and is a no-op on
>> +       unaffected CPUs.
>> +
>> +       If unsure, say Y.
>> +
>>   config ARM64_ERRATUM_834220
>>        bool "Cortex-A57: 834220: Stage 2 translation fault might be incorrectly reported in presence of a Stage 1 fault (rare)"
>>        depends on KVM
>> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
>> index 8cbd1e96fd50..b6d7966e9c19 100644
>> --- a/arch/arm64/include/asm/io.h
>> +++ b/arch/arm64/include/asm/io.h
>> @@ -25,29 +25,37 @@
>>   #define __raw_writeb __raw_writeb
>>   static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
>>   {
>> -     volatile u8 __iomem *ptr = addr;
>> -     asm volatile("strb %w0, %1" : : "rZ" (val), "Qo" (*ptr));
>> +     asm volatile(ALTERNATIVE("strb %w0, [%1]",
>> +                              "stlrb %w0, [%1]",
>> +                              ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
>> +                  : : "rZ" (val), "r" (addr));
>>   }
>>
> Nitpick:
>
> The change has the side effect of undoing d044d6ba6f02 ("arm64:
> io: permit offset addressing"), since stlr* do not support
> offset addressing. Unaffected CPUs would continue to use str*,
> but would lose the benefit of offset addressing :(
>
> Not sure if this needs to be mentioned in the commit message...
>
Thanks for your feedback, You're right that this reverts the 
offset-addressing benefit of d044d6ba6f02 for the str* path too, because 
stlr* has no offset form and both alternates must share one compile-time 
operand form (alternatives are patched at boot). Keeping offset 
addressing only for the unaffected str* path would need a runtime branch 
per str operation, which isn't worth it for this optimization. I'll call 
this out explicitly in the commit message in the v2 patch. -Shanker



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-05 14:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 23:12 [PATCH v1] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum Shanker Donthineni
2026-06-05  8:01 ` Catalin Marinas
2026-06-05  9:26 ` Vladimir Murzin
2026-06-05 14:34   ` Shanker Donthineni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox