I think _spinlock is suboptimal, although that's not the real problem as far as my code is concerned. Spinlock is a loop:
while (_atomic_lock(&lock->ticket)) _sched_yield(); This causes a system call every time the lock is held by another thread. In many cases the spinlock protects a simple operation, e.g. setting a bit or inserting an item into a list. These operations will complete in less than 100 cycles (two atomics and a few memory references). Performance might be improved by polling the lock for a while before giving up and waiting: int i = 0; /* If the lock is held, wait a little for it to become free before doing expensive scheduling operations. */ while (i++ < 50 && lock->ticket) asm("pause"); /* x86-specific instruction for use in polling loops */ while (_atomic_lock(&lock->ticket)) _sched_yield() The pause instruction tells modern x86 processors not to get too aggressive unrolling the loop. Normal behavior hurts performance when the loop is polling memory waiting for a change.