[PATCH 1/2] arm64: atomics: fix use of acquire + release for full barrier semantics

public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed

From: catalin.marinas@arm•com (Catalin Marinas)
To: linux-arm-kernel@lists•infradead.org
Subject: [PATCH 1/2] arm64: atomics: fix use of acquire + release for full barrier semantics
Date: Wed, 19 Feb 2014 17:05:15 +0000	[thread overview]
Message-ID: <20140219170515.GE22252@arm.com> (raw)
In-Reply-To: <1391516953-14541-1-git-send-email-will.deacon@arm.com>

On Tue, Feb 04, 2014 at 12:29:12PM +0000, Will Deacon wrote:
> Linux requires a number of atomic operations to provide full barrier
> semantics, that is no memory accesses after the operation can be
> observed before any accesses up to and including the operation in
> program order.
> 
> On arm64, these operations have been incorrectly implemented as follows:
> 
> 	// A, B, C are independent memory locations
> 
> 	<Access [A]>
> 
> 	// atomic_op (B)
> 1:	ldaxr	x0, [B]		// Exclusive load with acquire
> 	<op(B)>
> 	stlxr	w1, x0, [B]	// Exclusive store with release
> 	cbnz	w1, 1b
> 
> 	<Access [C]>
> 
> The assumption here being that two half barriers are equivalent to a
> full barrier, so the only permitted ordering would be A -> B -> C
> (where B is the atomic operation involving both a load and a store).
> 
> Unfortunately, this is not the case by the letter of the architecture
> and, in fact, the accesses to A and C are permitted to pass their
> nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
> or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
> store-release on B). This is a clear violation of the full barrier
> requirement.
> 
> The simple way to fix this is to implement the same algorithm as ARMv7
> using explicit barriers:
> 
> 	<Access [A]>
> 
> 	// atomic_op (B)
> 	dmb	ish		// Full barrier
> 1:	ldxr	x0, [B]		// Exclusive load
> 	<op(B)>
> 	stxr	w1, x0, [B]	// Exclusive store
> 	cbnz	w1, 1b
> 	dmb	ish		// Full barrier
> 
> 	<Access [C]>
> 
> but this has the undesirable effect of introducing *two* full barrier
> instructions. A better approach is actually the following, non-intuitive
> sequence:
> 
> 	<Access [A]>
> 
> 	// atomic_op (B)
> 1:	ldxr	x0, [B]		// Exclusive load
> 	<op(B)>
> 	stlxr	w1, x0, [B]	// Exclusive store with release
> 	cbnz	w1, 1b
> 	dmb	ish		// Full barrier
> 
> 	<Access [C]>
> 
> The simple observations here are:
> 
>   - The dmb ensures that no subsequent accesses (e.g. the access to C)
>     can enter or pass the atomic sequence.
> 
>   - The dmb also ensures that no prior accesses (e.g. the access to A)
>     can pass the atomic sequence.
> 
>   - Therefore, no prior access can pass a subsequent access, or
>     vice-versa (i.e. A is strictly ordered before C).
> 
>   - The stlxr ensures that no prior access can pass the store component
>     of the atomic operation.
> 
> The only tricky part remaining is the ordering between the ldxr and the
> access to A, since the absence of the first dmb means that we're now
> permitting re-ordering between the ldxr and any prior accesses.
> 
> From an (arbitrary) observer's point of view, there are two scenarios:
> 
>   1. We have observed the ldxr. This means that if we perform a store to
>      [B], the ldxr will still return older data. If we can observe the
>      ldxr, then we can potentially observe the permitted re-ordering
>      with the access to A, which is clearly an issue when compared to
>      the dmb variant of the code. Thankfully, the exclusive monitor will
>      save us here since it will be cleared as a result of the store and
>      the ldxr will retry. Notice that any use of a later memory
>      observation to imply observation of the ldxr will also imply
>      observation of the access to A, since the stlxr/dmb ensure strict
>      ordering.
> 
>   2. We have not observed the ldxr. This means we can perform a store
>      and influence the later ldxr. However, that doesn't actually tell
>      us anything about the access to [A], so we've not lost anything
>      here either when compared to the dmb variant.
> 
> This patch implements this solution for our barriered atomic operations,
> ensuring that we satisfy the full barrier requirements where they are
> needed.
> 
> Cc: <stable@vger•kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm•com>
> Cc: Peter Zijlstra <peterz@infradead•org>
> Signed-off-by: Will Deacon <will.deacon@arm•com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm•com>

     prev parent reply	other threads:[~2014-02-19 17:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-04 12:29 [PATCH 1/2] arm64: atomics: fix use of acquire + release for full barrier semantics Will Deacon
2014-02-04 12:29 ` [PATCH 2/2] arm64: asm: remove redundant "cc" clobbers Will Deacon
2014-02-19 17:05   ` Catalin Marinas
2014-02-04 16:43 ` [PATCH 1/2] arm64: atomics: fix use of acquire + release for full barrier semantics Peter Zijlstra
2014-02-04 17:16   ` Will Deacon
2014-02-19 17:05 ` Catalin Marinas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140219170515.GE22252@arm.com \
    --to=catalin.marinas@arm$(echo .)com \
    --cc=linux-arm-kernel@lists$(echo .)infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox