Hi,
Am 23.08.2014 um 16:46 schrieb Noah Knowles:
> Hi Reuti,
>
> On 08/23/2014 01:38 AM, Reuti wrote:
>> Am 23.08.2014 um 02:37 schrieb Reuti:
>>
>>> Hi,
>>>
>>> Am 23.08.2014 um 00:43 schrieb Noah Knowles:
>>>
>>>> Hi, I am using OGS/GE 2011.11p1 on ROCKS. We have a small cluster with a
>>>> combination of 12- and 16-core blades. We are running an application where
>>>> the specific assignment of ranks to nodes has a big effect on run time. Is
>>>> it possible, for example, with NP=64 to specify that
>>>>
>>>> ranks 0-15 go to a 16-core blade,
>>>> ranks 16-27 go to a 12-core blade,
>>>> ranks 28-39 go to a 12-core blade,
>>>> ranks 40-55 go to a 16-core blade, and
>>>> ranks 56-63 go to a 12-core blade?
>>>>
>>>> I tried, for this example,
>>>> qsub -binding linear:64 -l
>>>> h="compute-0-4|compute-0-0|compute-0-1|compute-0-5|compute-0-2"
>>> The binding would only be honored (as it's a soft request), if there would
>>> be a node with 64 cores. And it must also be activated in "execd_params" in
>>> SGE's configuration.
> OK I see. I misunderstood the way that binding works.
>>>
>>>
>>>> (where compute nodes 4-5 are 16 core and the others are 12-core), but that
>>>> gave me no control over the order in which the nodes were assigned.
>>>>
>>>> We are experimenting with Intel MPI and OpenMPI-- I couldn't figure out
>>>> how to do this with the Intel mpirun options, and rankfiles were causing
>>>> errors, so I was hoping to accomplish it with qsub.
>>> - Do you have a tight integration of Open MPI into SGE (i.e. compiled with
>>> "--with-sge")?
> yes
>>> - All 64 are MPI processes, no OpenMP threads?
> correct
>>> - What PE did you use?
> orte
Ok, unclear question. The important thing would be the "allocation_rule". But
both "$fill_up" or "$round_robin" will do as you request all 68 slots of the
machines you list.
Requesting only 64 slots might lead to the effect that foobar@compute-0-4 gets
the master slot for sure, but the node won't be filled completely with either
allocation rule, instead the last node compute-0-2 is completely filled and the
first node has still 4 slots free (even attaching an exclusive complex which
you request in `qsub` won't prohibit this).
-- Reuti
>>> - You always want complete machines, i.e. you could also request 68 cores?
> yes that would be smarter!
>>> - The rank0 (i.e. where also the jobscript runs) can be selected with:
>>>
>>> `qsub -masterq foobar@compute-0-4 ...`
>>>
>>> - Additional machines with:
>>>
>>> "... -q
>>> foobar@compute-0-4,foobar@compute-0-0,foobar@compute-0-1,foobar@compute-0-5,foobar@compute-0-2"
>>>
>>> (foobar@compute-0-4 needs to be listed in both options, no order of hosts
>>> guaranteed)
>>>
>>> Creating a rankfile out of the granted machinefile should work (i.e.
>>> keeping the allocation). As long as you are alone on these machine, it's
>>> better when Open MPI would do the binding to cores finally.
>>>
>>> Jobscript:
>>>
>>> # Reorder in the way you need them
>>> sort $PE_HOSTFILE > RESORTED_HOSTFILE
>>> export PE_HOSTFILE=RESORTED_HOSTFILE
>>>
>>> PeHostfile2RankFile()
>>> {
>>> rank=0
>>> cat RESORTED_HOSTFILE | while read line; do
>>> # echo $line
>>> host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
>>> nslots=`echo $line|cut -f2 -d" "`
>>> i=0
>>> while [ $i -lt $nslots ]; do
>>> echo "rank $rank=$host slot=$i"
>>> rank=`expr $rank + 1`
>>> i=`expr $i + 1`
>>> if [ $rank -eq "$1" ]; then
>>> break
>>> fi
>>> done
>>> done
>>> }
>>>
>>> PeHostfile2RankFile 64 > RANKFILE
>>>
>>> mpiexec -np 64 --rankfile RANKFILE ./mpihello
>>>
>>> (I don't have such machines, so I gave all the same core to get only the
>>> list of locations [slots=0] which seems working)
>> One additional thought: OpenMPI fills the machines according to the given
>> machinefile. Maybe you don't need to provide a rankfile at all when the
>> machinefile has already be rearranged.
> OK thanks, I'll try that Monday or when the kids are sleeping. Even if I
> don't need it, it's helpful to see the script too.
> Thanks so much for your very helpful (and quick) replies Reuti!
> Noah
>>
>> -- Reuti
>>
>>
>>> -- Reuti
>>>
>>>
>>>> I hope I'm asking this in the right place-- sorry if not.
>>>> Thanks for any help!
>>>> Noah
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> https://gridengine.org/mailman/listinfo/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users