Hi Reuti,

Thanks for direction.

I have never worked with grid submission scripts, so it will take me a while to 
learn and then try this out. 

I will get back as I am able to make progress.

Regards,
Vipul
 

-----Original Message-----
From: Reuti [mailto:re...@staff.uni-marburg.de] 
Sent: Thursday, February 6, 2020 4:35 PM
To: Open MPI Users <users@lists.open-mpi.org>
Cc: Kulshrestha, Vipul <vipul_kulshres...@mentor.com>
Subject: Re: [OMPI users] running mpirun with grid

Hi,

> Am 06.02.2020 um 21:47 schrieb Kulshrestha, Vipul via users 
> <users@lists.open-mpi.org>:
> 
> Hi,
>  
> I need to launch my openmpi application on grid.
>  
> My application is designed to run N processes, where each process would have 
> M threads.
>  
> To run it without grid, I run it as (say N = 7, M = 2) % mpirun –np 7 
> <application name with arguments>
>  
> The above works well and runs N processes. I am also able to submit it on 
> grid using below command and it works.
>  
> % qsub –pe orte 7 –l os-redhat6.7* -V –j y –b y –shell no mpirun –np 7 
> <application name with arguments>
>  
> However, the above job allocates only N slots on the grid, when it really is 
> consuming N*M slots. How do I submit the qsub command so that it reserves the 
> N*M slots, while starting up N processes? I tried belwo but I get some weird 
> error from ORTE as pasted below.
>  
> % qsub –pe orte 14 –l os-redhat6.7* -V –j y –b y –shell no mpirun –np 
> 7 <application name with arguments>

a) You will first have to talk to the admin to provide a fixed allocation rule 
on all involved nodes, hence e.g. "allocation_rule 2" and name this PE "orte2". 
Essentially you can be sure to get always 2 slots on each node this way.

b) Instead of submitting a binary, you will need a job script where you mangle 
the provided PE_HOSTFILE to include each node only with a slot count of 1. I.e. 
Open MPI should think to start only one process per node. You can then use the 
remaining core for an additional thread. As the original file can't be changed, 
it has to be copied, adjusted and then PE_HOSTFILE reset to this new file.

c) It would be nice, if the admin could prepare already a mangled PE_HOSTFILE 
(maybe by dividing the slotcount by the last diigit in the PE name) in a 
parallel prolog and put it in $TMPDIR of the job. As the environemnt variables 
won't be inherited to the job, you will have to point the environment variable 
PE_HOSTFILE to the mangled one in your job script in this case too.

d) SGE should get the real amount of needed slots of your job during 
submission, i.e. 14 in your case.

This way you will get an allocation of 14 slots, due to the fixed allocation 
rule "orte2" they are equally distributed. The mangled PE_HOSTFILE will include 
only one slot per node and Open MPI will start only one process per node for a 
total of 7. Then you can use OMP_NUM_THREAD=2 or alike to tell your application 
to start an additional thread per node. The environment variable OMP_NUM_THREAD 
should also be distributed to the nodes by the option "-x" to `mpirun` (or use 
MPI itself to distribute this information).

Note that in contrast to Torque you get each node only once for sure. AFAIR 
there was a setting in Torque to allow or disallow mutiple elections of the 
fixed allocation rule per node.

HTH -- Reuti

Reply via email to