[OMPI users] ORTE_ERROR_LOG: Timeout in file

Hugh Dickinson Tue, 28 Apr 2009 06:56:17 -0400

Hi all,

First of all let me make it perfectly clear that I'm a completebeginner as far as MPI is concerned, so this may well be a trivialproblem!

I've tried to set up Open MPI to use SSH to communicate between nodeson a heterogeneous cluster. I've set up passwordless SSH and it seemsto be working fine. For example by hand I can do:


ssh nodename uptime

and it returns the appropriate information for each node.

I then tried running a non-MPI program on all the nodes at the sametime:


mpirun -np 10 --hostfile hostfile uptime

Where hostfile is a list of the 10 cluster node names with slots=1after each one i.e


nodename1 slots=1
nodename2 slots=2
etc...

Nothing happens! The process just seems to hang. If I interrupt theprocess with Ctrl-C I get:


"

mpirun: killing job...

[gamma2.phyastcl.dur.ac.uk:18124] [0,0,0] ORTE_ERROR_LOG: Timeout infile base/pls_base_orted_cmds.c at line 275[gamma2.phyastcl.dur.ac.uk:18124] [0,0,0] ORTE_ERROR_LOG: Timeout infile pls_rsh_module.c at line 1166--------------------------------------------------------------------------

WARNING: mpirun has exited before it received notification that all
started processes had terminated.  You should double check and ensure
that there are no runaway processes still executing.

--------------------------------------------------------------------------

If, instead of using the hostfile, I specify on the command line thehost from which I'm running mpirun, e.g.:


mpirun -np 1 --host nodename uptime

then it works (i.e. if it doesn't need to communicate with othernodes). Do I need to tell Open MPI it should be using SSH tocommunicate? If so, how do I do this? To be honest I think it'strying to do so, because before I set up passwordless SSH itchallenged me for lots of passwords.

I'm running Open MPI 1.2.5 installed with Scientific Linux 5.2. Letme reiterate, it's very likely that I've done something stupid, soall suggestions are welcome.


Cheers,

Hugh

[OMPI users] ORTE_ERROR_LOG: Timeout in file

Reply via email to