On Feb 28, 2013, at 6:17 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> Am 28.02.2013 um 08:58 schrieb Reuti: > >> Am 28.02.2013 um 06:55 schrieb Ralph Castain: >> >>> I don't off-hand see a problem, though I do note that your "working" >>> version incorrectly reports the universe size as 2! >> >> Yes, it was 2 in the case when it was working by giving only two hostnames >> without any dedicated slot count. What should it be in this case - >> "unknown", "infinity"? > > As an add on: > > a) I tried it again on the command line and still get: > > Total: 64 > Universe: 2 > > with a hostfile > > node006 > node007 > My bad - since no slots were given, we default to a value of 1 for each node, so this is correct. > > b) In a job script under SGE and Open MPI compiled --with-sge I get after > mangling the hostfile: > > #!/bin /sh > #$ -pe openmpi* 128 > #$ -l exclusive > cut -f 1 -d" " $PE_HOSTFILE > $TMPDIR/machines > mpiexec -cpus-per-proc 2 -report-bindings -hostfile $TMPDIR/machines -np 64 > ./mpihello > > Here: > > Total: 64 > Universe: 128 This would be correct as SGE is allocating a total of 128 slots (or pe's) > > Maybe the found allocation by SGE and the one from the command line argument > are getting mixed here. > > -- Reuti > > >> -- Reuti >> >> >>> >>> I'll have to take a look at this and get back to you on it. >>> >>> On Feb 27, 2013, at 3:15 PM, Reuti <re...@staff.uni-marburg.de> wrote: >>> >>>> Hi, >>>> >>>> I have an issue using the option -cpus-per-proc 2. As I have Bulldozer >>>> machines and I want only one process per FP core, I thought using >>>> -cpus-per-proc 2 would be the way to go. Initially I had this issue inside >>>> GridEngine but then tried it outside any queuingsystem and face exactly >>>> the same behavior. >>>> >>>> @) Each machine has 4 CPUs with each having 16 integer cores, hence 64 >>>> integer cores per machine in total. Used Open MPI is 1.6.4. >>>> >>>> >>>> a) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 64 >>>> ./mpihello >>>> >>>> and a hostfile containing only the two lines listing the machines: >>>> >>>> node006 >>>> node007 >>>> >>>> This works as I would like it (see working.txt) when initiated on node006. >>>> >>>> >>>> b) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 64 >>>> ./mpihello >>>> >>>> But changing the hostefile so that it is having a slot count which might >>>> mimic the behavior in case of a parsed machinefile out of any queuing >>>> system: >>>> >>>> node006 slots=64 >>>> node007 slots=64 >>>> >>>> This fails with: >>>> >>>> -------------------------------------------------------------------------- >>>> An invalid physical processor ID was returned when attempting to bind >>>> an MPI process to a unique processor on node: >>>> >>>> Node: node006 >>>> >>>> This usually means that you requested binding to more processors than >>>> exist (e.g., trying to bind N MPI processes to M processors, where N > >>>> M), or that the node has an unexpectedly different topology. >>>> >>>> Double check that you have enough unique processors for all the >>>> MPI processes that you are launching on this host, and that all nodes >>>> have identical topologies. >>>> >>>> You job will now abort. >>>> -------------------------------------------------------------------------- >>>> >>>> (see failed.txt) >>>> >>>> >>>> b1) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 32 >>>> ./mpihello >>>> >>>> This works and the found universe is 128 as expected (see only32.txt). >>>> >>>> >>>> c) Maybe the used machinefile is not parsed in the correct way, so I >>>> checked: >>>> >>>> c1) mpiexec -hostfile machines -np 64 ./mpihello => works >>>> >>>> c2) mpiexec -hostfile machines -np 128 ./mpihello => works >>>> >>>> c3) mpiexec -hostfile machines -np 129 ./mpihello => fails as expected >>>> >>>> So, it got the slot counts in the correct way. >>>> >>>> What do I miss? >>>> >>>> -- Reuti >>>> >>>> <failed.txt><only32.txt><working.txt>_______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users