Am 23.08.2014 um 02:37 schrieb Reuti:
> Hi,
>
> Am 23.08.2014 um 00:43 schrieb Noah Knowles:
>
>> Hi, I am using OGS/GE 2011.11p1 on ROCKS. We have a small cluster with a
>> combination of 12- and 16-core blades. We are running an application where
>> the specific assignment of ranks to nodes has a big effect on run time. Is
>> it possible, for example, with NP=64 to specify that
>>
>> ranks 0-15 go to a 16-core blade,
>> ranks 16-27 go to a 12-core blade,
>> ranks 28-39 go to a 12-core blade,
>> ranks 40-55 go to a 16-core blade, and
>> ranks 56-63 go to a 12-core blade?
>>
>> I tried, for this example,
>> qsub -binding linear:64 -l
>> h="compute-0-4|compute-0-0|compute-0-1|compute-0-5|compute-0-2"
>
> The binding would only be honored (as it's a soft request), if there would be
> a node with 64 cores. And it must also be activated in "execd_params" in
> SGE's configuration.
>
>
>> (where compute nodes 4-5 are 16 core and the others are 12-core), but that
>> gave me no control over the order in which the nodes were assigned.
>>
>> We are experimenting with Intel MPI and OpenMPI-- I couldn't figure out how
>> to do this with the Intel mpirun options, and rankfiles were causing errors,
>> so I was hoping to accomplish it with qsub.
>
> - Do you have a tight integration of Open MPI into SGE (i.e. compiled with
> "--with-sge")?
> - All 64 are MPI processes, no OpenMP threads?
> - What PE did you use?
> - You always want complete machines, i.e. you could also request 68 cores?
> - The rank0 (i.e. where also the jobscript runs) can be selected with:
>
> `qsub -masterq foobar@compute-0-4 ...`
>
> - Additional machines with:
>
> "... -q
> foobar@compute-0-4,foobar@compute-0-0,foobar@compute-0-1,foobar@compute-0-5,foobar@compute-0-2"
>
>
> (foobar@compute-0-4 needs to be listed in both options, no order of hosts
> guaranteed)
>
> Creating a rankfile out of the granted machinefile should work (i.e. keeping
> the allocation). As long as you are alone on these machine, it's better when
> Open MPI would do the binding to cores finally.
>
> Jobscript:
>
> # Reorder in the way you need them
> sort $PE_HOSTFILE > RESORTED_HOSTFILE
> export PE_HOSTFILE=RESORTED_HOSTFILE
>
> PeHostfile2RankFile()
> {
> rank=0
> cat RESORTED_HOSTFILE | while read line; do
> # echo $line
> host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
> nslots=`echo $line|cut -f2 -d" "`
> i=0
> while [ $i -lt $nslots ]; do
> echo "rank $rank=$host slot=$i"
> rank=`expr $rank + 1`
> i=`expr $i + 1`
> if [ $rank -eq "$1" ]; then
> break
> fi
> done
> done
> }
>
> PeHostfile2RankFile 64 > RANKFILE
>
> mpiexec -np 64 --rankfile RANKFILE ./mpihello
>
> (I don't have such machines, so I gave all the same core to get only the list
> of locations [slots=0] which seems working)
One additional thought: OpenMPI fills the machines according to the given
machinefile. Maybe you don't need to provide a rankfile at all when the
machinefile has already be rearranged.
-- Reuti
> -- Reuti
>
>
>> I hope I'm asking this in the right place-- sorry if not.
>> Thanks for any help!
>> Noah
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users