the 9 schedulers guts break down to the following loop.  this
is the improved version, abstracted a bit (by hand)

        spllo();
        for(;;){
                for(i = Npri-1; i >= 0; i--)
->a                     for(p = runqueue[i]; p != nil; p = p->rproc)
                                if(softaffinity(p, m) ||
                                hardaffinity(p, m) && scheddelay(p) >= Delay)
->b                                     goto found;
                        }
->b             while(monmwait(&runvec, 0) == 0)
                                ;
                }
        }

it occured to me that having multiple maches fighting over the
runqueue contains contention on number of cache lines.
what we'd like is for the maches to queue up, and try one-at-a-time.
but this is exactly what mcs locks do!  so adding a private lock at
(a) and a private iunlock at (b) is all that is required.

this nearly eliminates the scheduling penalty one saw with the
original version of the revised scheduler, an increases performance
marginally.  maybe 5%.

again, the cavet here is that i haven't tested with a very large multiprocessor.

- erik

Reply via email to