I followed the steps given here to setup up openMPI cluster :
http://www.ps3cluster.umassd.edu/step3mpi.html
My cluster consists of two nodes, master(192.168.67.18) and
salve(192.168.45.65), connected directly through a cross cable.

After setting up the cluster n configuring the master node, i mounted  /tmp
folder of master node on the slave node(i had some problems with nfs at
first but i worked my way out of it).

Then i copied the 'pi.c' program in the /tmp folder
and successfully complied it, giving me a binary file 'pi'.

Now when i try to run the binary file using the following command

#mpirun –np 2 ./Pi
*
*
root@192.168.45.65's password:
<it asks for the password>

after entering the password it gives the following error:

*bash: orted: command not found*
*[ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275*
*[ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
pls_rsh_module.c at line 1166*
*[ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c
at line 90*
*[ccomp.cluster:18963] ERROR: A daemon on node 192.168.45.65 failed to start
as expected.*
*[ccomp.cluster:18963] ERROR: There may be more information available from*
*[ccomp.cluster:18963] ERROR: the remote shell (see above).*
*[ccomp.cluster:18963] ERROR: The daemon exited unexpectedly with status
127.*
*[ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188*
*[ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
pls_rsh_module.c at line 1198*
*--------------------------------------------------------------------------*
*mpirun was unable to cleanly terminate the daemons for this job. Returned
value Timeout instead of ORTE_SUCCESS.*
*--------------------------------------------------------------------------*
*
*
I am totally lost now, as this is the first time i am working on a cluster
project, and need some help

Thank you
Ankush

Reply via email to