Background: The kernel mutex implementation has a pretty generic implementation that can be used on any platform that can provide a pointer-sized atomic compare-and-swap primitive. Platforms provide a definition of MUTEX_CAS() that expands to the right thing for that platform.
For the most part, architectures define it one of two ways: <type 1> int _lock_cas(volatile uintptr_t *, uintptr_t, uintptr_t); #define MUTEX_CAS(p, o, n) _lock_cas((p), (o), (n)) <type 2> #define MUTEX_CAS(p, o, n) \ (atomic_cas_ulong((volatile unsigned long *)(p), (o), (n)) == (o)) For the <type 1> cases, we have: -> alpha (_lock_cas() is basically like atomic_cas_ulong() but has memory barrier insns and different return value semantics) -> powerpc (like alpha, but has an IBM405 errata as well) -> sh3 (_lock_cas() is a restartable atomic sequence that the interrupt handler groks - it is aliased to the normal atomic_cas_*() functions) -> sparc64 (same situation as alpha) Now, as for the 2 uses of MUTEX_CAS() in kern_mutex.c: <MUTEX_ACQUIRE()> rv = MUTEX_CAS(&mtx->mtx_owner, oldown, newown); MUTEX_MEMBAR_ENTER(); <MUTEX_SET_WAITERS()> rv = MUTEX_CAS(&mtx->mtx_owner, owner, owner | MUTEX_BIT_WAITERS); MUTEX_MEMBAR_ENTER(); …and for the platforms that need it, MUTEX_MEMBAR_ENTER() expands to membar_enter(). So, for all platforms that require memory barriers, a memory barrier is already issued after the MUTEX_CAS(). So, there are a couple of takeaways here: 1. Some platforms have redundant memory barriers in their mutex implementations (one in _lock_cas() and another at the _lock_cas() call site). 2. kern_mutex.c issues memory barriers *even if the CAS failed*. This is probably not a big deal, but it still rubs me the wrong way :-) Anyway, I’m much more concerned with (1). I think at the very least, alpha and sparc64 don’t need to define their own _lock_cas() and can just use atomic_cas_ulong()… furthermore, I think we can just let that be the default definition unless a platform has a REALLY good reason to override it (I mean, not even sh3 has to do so, because it aliases _lock_cas() to atomic_cas_ulong()). Thoughts? -- thorpej