[9fans] A little ado about taslock

2010-06-21 Thread Venkatesh Srinivas
Hi, Erik's thread about a 16-processor x86 machine convinced me to try something related to spinlocks. The current 9 spinlocks are portable code, calling an arch-provided tas() in a loop to do their thing. On i386, Intel recommends 'PAUSE' in the core of a spin-lock loop; I modified tas to PAUSE

Re: [9fans] A little ado about taslock

2010-06-21 Thread erik quanstrom
In a crude test on a 1.5GHz p4 willamette with a local fossil/venti and 256mb of ram, 'time mk 'CONF=pcf' /dev/null' in /sys/src/9/pc, on a fully-built source tree, adding the PAUSE reduced times from an average of 18.97s to 18.84s (across ten runs). we tried this at coraid years ago. it's

Re: [9fans] interesting timing tests

2010-06-21 Thread erik quanstrom
void lock(ulong *l) { ulong old; ushort next, owner; old = _xadd(l, 1); for(;;){ next = old; owner = old16; old = *l; if(next == owner) break; } } void unlock(ulong *l)

Re: [9fans] A little ado about taslock

2010-06-21 Thread Lyndon Nerenberg
2. if today 16 machs are possible (and 128 on an intel xeon mp 7500? 8 sockets * 8 core * 2t = 128), what do we expect in 5 years? 128? www.seamicro.com

Re: [9fans] A little ado about taslock

2010-06-21 Thread David Leimbach
On Mon, Jun 21, 2010 at 9:28 AM, Lyndon Nerenberg lyn...@orthanc.ca wrote: 2. if today 16 machs are possible (and 128 on an intel xeon mp 7500? 8 sockets * 8 core * 2t = 128), what do we expect in 5 years? 128? www.seamicro.com There's a 100 core MIPS-like board available now too.

Re: [9fans] interesting timing tests

2010-06-21 Thread erik quanstrom
On Mon Jun 21 10:51:30 EDT 2010, quans...@quanstro.net wrote: void lock(ulong *l) somehow lost was an observation that since lock is only testing that next == owner, and that both are based on the current state of *l, i don't see how this is robust in the face of more than one mach spinning.

[9fans] lock question?

2010-06-21 Thread Venkatesh Srinivas
Hi, I've asked about this before, but I still don't see the reason; In 9/port/sysproc.c:sysrfork(), after a new process has been created, dupseg is called on each of the donor process's segments; why is the new process's p-seglock locked for this duration? The new process hasn't been published

Re: [9fans] interesting timing tests

2010-06-21 Thread Venkatesh Srinivas
On Mon, Jun 21, 2010 at 10:40 AM, erik quanstrom quans...@quanstro.netwrote: void lock(ulong *l) { ulong old; ushort next, owner; old = _xadd(l, 1); for(;;){ next = old; owner = old16; old = *l;

Re: [9fans] interesting timing tests

2010-06-21 Thread Bakul Shah
On Fri, 18 Jun 2010 19:26:25 EDT erik quanstrom quans...@labs.coraid.com wrote: note the extreme system time on the 16 processor machine Could this be due to memory contention caused by spinlocks? While locks are spinning they eat up memory bandwidth which slows down everyone's memory accesses

Re: [9fans] interesting timing tests

2010-06-21 Thread erik quanstrom
note the extreme system time on the 16 processor machine Could this be due to memory contention caused by spinlocks? While locks are spinning they eat up memory bandwidth which slows down everyone's memory accesses (including the one who is trying to finish its work while holding the

Re: [9fans] interesting timing tests

2010-06-21 Thread Bakul Shah
On Mon, 21 Jun 2010 17:21:36 EDT erik quanstrom quans...@quanstro.net wrote: note the extreme system time on the 16 processor machine Could this be due to memory contention caused by spinlocks? While locks are spinning they eat up memory bandwidth which slows down everyone's memory

Re: [9fans] interesting timing tests

2010-06-21 Thread erik quanstrom
Is there a way to check this? Is there a way to completely shut off N processors and measure benchmark speed slow down as function of processor? there hasn't been any performance impact measured. however, the extreme system time still seems wierd. richard miller suggested that kprof might

Re: [9fans] interesting timing tests

2010-06-21 Thread Lawrence E. Bakst
Do you have a way to turn off one of the sockets on c (2 x E5540) and get the numbers with HT (8 processors) and without HT (4 processors)? It would also be interesting to see c with HT turned off. Certainly it seems to me that idlehands needs to be fixed, your bit array active.schedwait is