Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Jeff Squyres
Ok, so I'm viewing this has a hardware/BIOS/something else failure, and doesn't indicate one way or the other whether the new OMPI 1.6 affinity code is working. I would still very much like to see other people's testing results. On May 30, 2012, at 2:02 PM, Brice Goglin wrote: > Something is

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Brice Goglin
Something is preventing all cores from appearing. The BIOS? My E5-2650 processors definitely have 8 cores (without counting hyperthreads) as advertised by Intel. Brice Le 30/05/2012 19:58, Mike Dubman a écrit : > no cgroups or cpusets. > > On Wed, May 30, 2012 at 4:59 PM, Jeff Squyres

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Mike Dubman
no cgroups or cpusets. On Wed, May 30, 2012 at 4:59 PM, Jeff Squyres wrote: > On May 30, 2012, at 9:47 AM, Mike Dubman wrote: > > > ohh.. you are right, false alarm :) sorry siblings != cores - so it is HT > > OMPI 1.6.soon-to-be-1 should handle HT properly, meaning that it

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Mike Dubman
ohh.. you are right, false alarm :) sorry siblings != cores - so it is HT On Wed, May 30, 2012 at 4:36 PM, Brice Goglin wrote: > Your /proc/cpuinfo output (filtered below) looks like only two sockets > (physical ids 0 and 1), with one core each (cpu cores=1, core id=0),

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Brice Goglin
Your /proc/cpuinfo output (filtered below) looks like only two sockets (physical ids 0 and 1), with one core each (cpu cores=1, core id=0), with hyperthreading (siblings=2). So lstopo looks good. E5-2650 is supposed to have 8 cores. I assume you use Linux cgroups/cpusets to restrict the available

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Mike Dubman
or, lstopo lies (Im not using the latest hwloc but one which comes with distro). The machine has two dual-code sockets, total 4 physical cores: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Ralph Castain
Hmmm...well, from what I see, mpirun was actually giving you the right answer! I only see TWO cores on each node, yet you told it to bind FOUR processes on each node, each proc to be bound to a unique core. The error message was correct - there are not enough cores on those nodes to do what

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Mike Dubman
attached. On Wed, May 30, 2012 at 2:32 PM, Jeff Squyres wrote: > On May 30, 2012, at 7:20 AM, Jeff Squyres wrote: > > >> $hwloc-ls --of console > >> Machine (32GB) > >> NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (20MB) + L2 L#0 (256KB) > + L1 L#0 (32KB) + Core L#0 > >>

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Jeff Squyres
On May 30, 2012, at 7:20 AM, Jeff Squyres wrote: >> $hwloc-ls --of console >> Machine (32GB) >> NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (20MB) + L2 L#0 (256KB) + L1 >> L#0 (32KB) + Core L#0 >>PU L#0 (P#0) >>PU L#1 (P#2) >> NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (20MB) + L2

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Jeff Squyres
On May 30, 2012, at 5:05 AM, Mike Dubman wrote: > Not good: @#$%@#%@#!! But I guess this is why we test. :-( > /labhome/alexm/workspace/openmpi-1.6.1a1hge06c2f2a0859/inst/bin/mpirun --host > h-qa-017,h-qa-017,h-qa-017,h-qa-017,h-qa-018,h-qa-018,h-qa-018,h-qa-018 -np 8 > --bind-to-core

Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-30 Thread Mike Dubman
Not good: /labhome/alexm/workspace/openmpi-1.6.1a1hge06c2f2a0859/inst/bin/mpirun --host h-qa-017,h-qa-017,h-qa-017,h-qa-017,h-qa-018,h-qa-018,h-qa-018,h-qa-018 -np 8 --bind-to-core -bynode -display-map /usr/mpi/gcc/mlnx-openmpi-1.6rc4/tests/osu_benchmarks-3.1.1/osu_alltoall

[OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST

2012-05-29 Thread Jeff Squyres
Per ticket #3108, there were still some unfortunate bugs in the affinity code in 1.6. :-( These have now been fixed. ...but since is the 2nd or 3rd time we have "fixed" the 1.5/1.6 series w.r.t. processor affinity, I'd really like people to test this stuff before it's committed and we ship