Some time ago I wrote about performance problems when doing high -j build.sh and made few remarks about mutex implementation.
TL;DR for that one was mostly that cas returns the found value, so explicit re-reads can be avoided. There are also avoidable explicit barriers (yes, I know about the old Opteron errata). I had another look and have a remark definitely worth looking at (and easy to fix). Unfortunately I lost my test jig so can't benchmark. That said, here it is: After it is is concluded that lock owner sleeps: ts = turnstile_lookup(mtx); /* * Once we have the turnstile chain interlock, mark the * mutex has having waiters. If that fails, spin again: * chances are that the mutex has been released. */ if (!MUTEX_SET_WAITERS(mtx, owner)) { turnstile_exit(mtx); owner = mtx->mtx_owner; continue; } Note that the lock apart from being free, can be: 1. owned by the same owner, which is now running In this case the bit is set spuriously and triggers slow path unlock. 2. owned by a different owner, which is NOT running Assuming the owner still has the lock and is not running after turnstile_exit, this causes extra trip. That said, proposed change is trivial and follows what FreeBSD is doing: re-read and abort sleep unless the lock is owned by an off-cpu thread. Note there is a minor optimisation of not setting the bit if already seen. That said, the minimal fix would look like this (untested): ts = turnstile_lookup(mtx); owner = mtx->mtx_owner; if (mutex_oncpu(owner) || !MUTEX_SET_WAITERS(mtx, owner)) turnstile_exit(mtx); owner = mtx->mtx_owner; continue; } If value from cas was to be preserved it would definitely make sense to re-test failed MUTEX_SET_WAITERS while still holding the turnstile lock. It looks like rw locks can use a similar treatment. Fun fact is that implementation on Illumos behaves worse in this regard: it sets the waiters bit regardless *who* owns the lock (or whether they are running), but then only goes to sleep if the *original* owner has the lock. -- Mateusz Guzik <mjguzik gmail.com>