Re: [OMPI users] Questions about integration with resource distribution systems

Kulshrestha, Vipul Wed, 26 Jul 2017 06:05:40 -0700

Thanks for a quick response.

I will try building OMPI as suggested.


On the integration with unsupported distribution systems, we cannot use script 
based approach, because often these machines don’t have ssh permission in 
customer environment. I will explore the path of writing orte component. At 
this stage, I don’t understand the effort for the same.

I guess my question 2 was not understood correctly. I used the below command as 
an example for SGE and want to understand the expected behavior for such a 
command. With the below command, I expect to have 8 copies of a.out launched 
with each copy having access to 40GB of memory. Is that correct? I am doubtful, 
because I don’t understand how mpirun gets access to information about RAM 
requirement.

qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out


Regards,
Vipul



From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of 
r...@open-mpi.org
Sent: Tuesday, July 25, 2017 8:16 PM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] Questions about integration with resource 
distribution systems


On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul 
<vipul_kulshres...@mentor.com<mailto:vipul_kulshres...@mentor.com>> wrote:

I have several questions about integration of openmpi with resource queuing 
systems.

1.
I understand that openmpi supports integration with various resource 
distribution systems such as SGE, LSF, torque etc.

I need to build an openmpi application that can interact with variety of 
different resource distribution systems, since different customers have 
different systems. Based on my research, it seems that I need to build a 
different openmpi installation to work, e.g. create an installation of opempi 
with grid and create a different installation of openmpi with LSF. Is there a 
way to build a generic installation of openmpi that can be used with more than 
1 distribution system by using some generic mechanism?

Just to be clear: your application doesn’t depend on the environment in any 
way. Only mpirun does - so if you are distributing an _application_, then your 
question is irrelevant.

If you are distributing OMPI itself, and therefore mpirun, then you can build 
the various components if you first install the headers for that environment on 
your system. It means that you need one machine where all those resource 
managers at least have their headers installed on it. Then configure OMPI 
--with-xxx pointing to each of the RM’s headers so all the components get 
built. When the binary hits your customer’s machine, only those components that 
have active libraries present will execute.


2.
For integration with LSF/grid, how would I specify the memory (RAM) requirement 
(or some other parameter) to bsub/qsub, when launching mpirun command? Will 
something like below work to ensure that each of the 8 copies of a.out have 40 
GB memory reserved for them by grid engine?

qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out

You’ll have to provide something that is environment dependent, I’m afraid - 
there is no standard out there.



3.
Some of our customers use custom distribution engine (some 
non-industry-standard distribution engine). How can I integrate my openmpi  
application with such system? I would think that it should be possible to do 
that if openmpi launched/managed interaction with the distribution engine using 
some kind of generic mechanism (say, use a configurable command to launch, 
monitor, kill a job and then allow specification of a plugin define these 
operations with commands specific to the distribution engine being in use). 
Does such integration exist in openmpi?

Easiest solution is to write a script that reads the allocation and dumps it 
into a file, and then provide that file as your hostfile on the mpirun cmd line 
(or in the environment). We will then use ssh to perform the launch. Otherwise, 
you’ll need to write at least an orte/mca/ras component to get the allocation, 
and possibly an orte/mca/plm component if you want to use the native launch 
mechanism in place of ssh.




Thanks,
Vipul


_______________________________________________
users mailing list
users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Questions about integration with resource distribution systems

Reply via email to