> Date: Fri, 11 Feb 2022 06:52:48 -0800 > From: Jason Thorpe <thor...@me.com> > > I would prefer we adopt the Solaris description about a generic > barrier that provides "lock-is-visible-before-load/store" without > explicitly stating "load-before-load/store", and provide a new > membar_acquire() that means "load-before-load/store".
What does `lock-is-visible' mean? How would you ever use it correctly, or audit correct use? What operations do we have that have `lock-is-visible' semantics, which don't also imply an acquire operation and thus don't need an explicit barrier anyway? It turns out that _even on x86_, our membar_enter fails to implement the semantics we documented for it -- but does implement the semantics that (a) all ports (except riscv) already implement, and (b) all users (except one deletable one) already assume. In particular, from the AMD64 Architecture Programmer's Manual, vol. 2, Sec. 7.2 `Multiprocessor Memory Access Ordering', p. 187: > Non-overlapping Loads may pass stores. > > Processor 0 Processor 1 > Store A <- 1 Store B <- 1 > Load B Load A > > All combinations of values (00, 01, 10, and 11) may be observed by > Processors 0 and 1. > > Where sequential consistency is needed (for example in Dekker's > algorithm for mutual exclusion), an MFENCE instruction should be > used between the store and the subsequent load, or a locked access, > such as an XCHG, should be used for the store. > > Processor 0 Processor 1 > Store A <- 1 Store B <- 1 > MFENCE MFENCE > Load B Load A > > Load A and Load B cannot both read 0. https://www.amd.com/system/files/TechDocs/24593.pdf#page=247 lfence -- as x86 membar_enter currently issues, on essentially all machines of the past two decades -- is not enough here; mfence is required. Fortunately, we have nothing that relies on using membar_enter this way! Which is probably part of why we never noticed an issue with the bad current semantics.