> On Feb 11, 2022, at 10:27 AM, Taylor R Campbell <riastr...@netbsd.org> wrote:
> 
> So x86, powerpc, and sparc64 all implement what I suggest membar_enter
> _should_ be (and what all current users use it for!), and _fail_ to
> implement what membar_enter is documented to be in the man page.
> 
> And of all the membar_enter definitions, only the riscv one fails to
> guarantee the load-before-load/store ordering I suggest it should be
> documented to have.

My beef with the membar_enter definitional change is with the word "acquire".  
I.e. you want to give it what is called today "acquire" semantics.  My beef is 
with now "acquire" is defined, as load-before-load/store.  I think it's a 
non-intuitive word to use for that kind of barrier, especially when used in 
context of the Solaris description of membar_enter(), as you quoted earlier:

    The membar_enter() function is a generic memory barrier used
    during lock entry.  It is placed after the memory operation that
    acquires the lock to guarantee that the lock protects its data.
    No stores from after the memory barrier will reach visibility and
    no loads from after the barrier will be resolved before the lock
    acquisition reaches global visibility.

So, let's again take a look at the rules for PSO from the SPARC v9 architecture 
manual:

"""
The rules for PSO are:

* Loads are blocking and ordered with respect to earlier loads.
* Atomic load-stores are ordered with respect to loads.

Thus, PSO ensures that:
* Each load and atomic load-store instruction behaves as if it were followed by 
a MEMBAR with a mask value of 05.
* Explicit MEMBAR instructions are required to order store and atomic 
load-store instructions with respect to each other.
"""

My gripe hinges on what "lock acquisition" means.  In SPARC-land, the above 
Solaris description can be broken down into three cases:

v7 -- Locks are acquired with "ldstub", and memory is always strongly-ordered; 
membar_enter() is a nop.

v8 -- Locks are acquired with "ldstub", and CPU Partial Store Order; 
membar_enter() is "stbar" (equivalent to "membar #StoreStore" in v9-speak).

V9 -- Locks are acquired with either "ldstub" or "casx", and CPU can do PSO and 
RMO.  Both "ldstub" and "casx" are the same with respect to how they interact 
with barriers.  But PSO and RMO have two different requirements:

v9-PSO -- Because Atomic load-stores ("ldstub" and "casx") are not ordered with 
respect to stores, you would need "membar #StoreStore" (in PSO mode, Atomic 
load-stores are already strongly ordered with respect to other loads).

v9-RMO -- In **this case**, because an Atomic load-store is a load, you can use 
"membar #LoadLoad | #LoadStore" (your proposed semantics: 
load-before-load/store).  But because an Atomic load-store is also a store, you 
can also use "membar #StoreLoad | #StoreStore" (the current definition in our 
man page: store-before-load/store).


Now, because in PSO mode, Atomic load-stores are not strongly ordered with 
respect to stores, in order for the following code to work:

        mutex_enter();
        *foo = 0;
        result = *bar;
        mutex_exit();

...then you need to issue a "membar #StoreStore" because the ordering of the 
lock acquisition and the store through *foo is not guaranteed without it.  But 
you can also issue a "membar #StoreLoad | #StoreStore", which also works in RMO 
mode.

In other words, it's the **store into the lock cell** that actually performs 
the acquisition of the lock.  In addition to being true on platforms that have 
Atomic load-store (like SPARC), it is also true on platforms that have LL/SC 
semantics (the load in that case doesn't mean jack-squat, and the ordering 
guarantees that the LL has are specifically with respect to the paired SC).  
Until that store is globally visible, the lock does not protect its data, on 
all platforms that NetBSD supports.  And this is why I described membar_enter() 
the way I did.

Now, if we want to loosen the definition to be "acquisition of lock", which 
would allow a load-before-load/store after an Atomic load-store, then that's 
fine, but I would object to defining "membar_enter()" as 
"load-before-load/store" ... rather, I would word it in a more fuzzy way such 
that load-before-load/store is allowed if such a barrier works for that 
architecture, but would also allow store-before-load/store if some other 
architecture needs those semantics for whatever reason.

And that's why I'd also prefer to have a new name for an additional operation 
if you want to have one that is explicitly "load-before-load/store".

> (sparc64 membar_sync also fails to order store-before-load, and that's
> clearly just an implementation bug because membar_sync has always been
> unequivocally intended and used as a full load/store-before-load/store
> sequential consistency barrier.  Easy fix: change membar_sync to do
> `membar #StoreLoad' for kernel running in TSO.  If userland is really
> running in RMO, then all the membars need to be fixed on sparc64.  It
> looks like sparc (i.e., `sparc32'/`sparcv8') should get the same
> treatment.)

Sync is also a little funny, because "membar #Sync" in SPARC-land means 
"load/store-before-load/store-and-also-resolve-any-faults", but I don't recall 
if membar_sync() on Solaris does "membar #Sync" or "membar #0xf"

-- thorpej

Reply via email to