Am 28.02.2013 um 06:55 schrieb Ralph Castain:

> I don't off-hand see a problem, though I do note that your "working" version 
> incorrectly reports the universe size as 2!

Yes, it was 2 in the case when it was working by giving only two hostnames 
without any dedicated slot count. What should it be in this case - "unknown", 
"infinity"?

-- Reuti


> 
> I'll have to take a look at this and get back to you on it.
> 
> On Feb 27, 2013, at 3:15 PM, Reuti <re...@staff.uni-marburg.de> wrote:
> 
>> Hi,
>> 
>> I have an issue using the option -cpus-per-proc 2. As I have Bulldozer 
>> machines and I want only one process per FP core, I thought using 
>> -cpus-per-proc 2 would be the way to go. Initially I had this issue inside 
>> GridEngine but then tried it outside any queuingsystem and face exactly the 
>> same behavior.
>> 
>> @) Each machine has 4 CPUs with each having 16 integer cores, hence 64 
>> integer cores per machine in total. Used Open MPI is 1.6.4.
>> 
>> 
>> a) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 64 
>> ./mpihello
>> 
>> and a hostfile containing only the two lines listing the machines:
>> 
>> node006
>> node007
>> 
>> This works as I would like it (see working.txt) when initiated on node006.
>> 
>> 
>> b) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 64 
>> ./mpihello
>> 
>> But changing the hostefile so that it is having a slot count which might 
>> mimic the behavior in case of a parsed machinefile out of any queuing system:
>> 
>> node006 slots=64
>> node007 slots=64
>> 
>> This fails with:
>> 
>> --------------------------------------------------------------------------
>> An invalid physical processor ID was returned when attempting to bind
>> an MPI process to a unique processor on node:
>> 
>> Node: node006
>> 
>> This usually means that you requested binding to more processors than
>> exist (e.g., trying to bind N MPI processes to M processors, where N >
>> M), or that the node has an unexpectedly different topology.
>> 
>> Double check that you have enough unique processors for all the
>> MPI processes that you are launching on this host, and that all nodes
>> have identical topologies.
>> 
>> You job will now abort.
>> --------------------------------------------------------------------------
>> 
>> (see failed.txt)
>> 
>> 
>> b1) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 32 
>> ./mpihello
>> 
>> This works and the found universe is 128 as expected (see only32.txt).
>> 
>> 
>> c) Maybe the used machinefile is not parsed in the correct way, so I checked:
>> 
>> c1) mpiexec -hostfile machines -np 64 ./mpihello => works
>> 
>> c2) mpiexec -hostfile machines -np 128 ./mpihello => works
>> 
>> c3) mpiexec -hostfile machines -np 129 ./mpihello => fails as expected
>> 
>> So, it got the slot counts in the correct way.
>> 
>> What do I miss?
>> 
>> -- Reuti
>> 
>> <failed.txt><only32.txt><working.txt>_______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to