I think _spinlock is suboptimal, although that's not the real problem
as far as my code is concerned. Spinlock is a loop:
while (_atomic_lock(&lock->ticket)) _sched_yield();
This causes a system call every time the lock is held by another thread.
In many cases the spinlock protects a simple operation, e.g. setting a
bit or inserting an item into a list. These operations will complete
in less than 100 cycles (two atomics and a few memory references).
Performance might be improved by polling the lock for a while before
giving up and waiting:
int i = 0;
/* If the lock is held, wait a little for it to become free before
doing expensive scheduling operations. */
while (i++ < 50 && lock->ticket)
asm("pause"); /* x86-specific instruction for use in polling loops */
while (_atomic_lock(&lock->ticket))
_sched_yield()
The pause instruction tells modern x86 processors not to get too
aggressive unrolling the loop. Normal behavior hurts performance
when the loop is polling memory waiting for a change.