Hi all.

I'm trying to create a tight integration between torque and openmpi for cases 
where  the tm ras and plm isn't compiled into openmpi.  This scenario is 
common for linux distros that ship openmpi.  Of course the ideal solution is 
to recompile openmpi with torque support, but this isn't always feasible since 
I do not want to support my own version of openmpi on the stuff I'm 
distributing to others.

We also see some proprietary applications shipping their own embedded openmpi 
libraries where either tm plm/ras is missing or non-functional with the torque 
installation on our system.

So, I've come so far as to create a pbsdshwrapper.py that mimics ssh behaviour 
very closely so that starting the orteds on all the hosts works as expected 
and the application starts correctly when I use

setenv OMPI_MCA_plm_rsh_agent "pbsdshwrapper.py"
mpirun --hostfile $PBS_NODEFILE ........

What I want now is a way to get rid of the --hostfile $PBS_NODEFILE in the 
mpirun command.  Is there an environment variable that I can set so that 
mpirun grabs the right nodelist?

By spelunking the code I find that the rsh plm has support for SGE where it 
automatically picks up the PE_NODEFILE if it detects that it is launched 
within an SGE job.  Would it be possible to have the same functionality for 
torque?  The code looks a bit too complex at first sight for me to fix this 
myself.

Best regards,
Roy.

-- 
  The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
              phone:+47 77 64 41 07, fax:+47 77 64 41 00
     Roy Dragseth, Team Leader, High Performance Computing
         Direct call: +47 77 64 62 56. email: roy.drags...@uit.no

Reply via email to