Thank you Sir the problem was with the paths of 'bin' and 'lib' folders so i used de *mpirun --prefix* command. I want to run a program 'pi' now using the cluster, so where do i place de file on de master and the compute nodes?
Also how do i come to know that the program is using resources of both the nodes? On Sat, Apr 4, 2009 at 7:05 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > It might be best to: > > 1. Setup a non-root user to run MPI applications > 2. Setup SSH keys between the hosts for this non-root user so that you can > "ssh <otherhost> uptime" and not be prompted for a password/passphrase > > This should help. > > > > On Apr 4, 2009, at 5:51 AM, Ankush Kaul wrote: > > I followed the steps given here to setup up openMPI cluster : >> http://www.ps3cluster.umassd.edu/step3mpi.html >> >> My cluster consists of two nodes, master(192.168.67.18) and >> salve(192.168.45.65), connected directly through a cross cable. >> >> After setting up the cluster n configuring the master node, i mounted >> /tmp folder of master node on the slave node(i had some problems with nfs >> at first but i worked my way out of it). >> >> Then i copied the 'pi.c' program in the /tmp folder and successfully >> complied it, giving me a binary file 'pi'. >> >> Now when i try to run the binary file using the following command >> >> #mpirun –np 2 ./Pi >> >> root@192.168.45.65's password: >> <it asks for the password> >> >> after entering the password it gives the following error: >> >> bash: orted: command not found >> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> base/pls_base_orted_cmds.c at line 275 >> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> pls_rsh_module.c at line 1166 >> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c >> at line 90 >> [ccomp.cluster:18963] ERROR: A daemon on node 192.168.45.65 failed to >> start as expected. >> [ccomp.cluster:18963] ERROR: There may be more information available from >> [ccomp.cluster:18963] ERROR: the remote shell (see above). >> [ccomp.cluster:18963] ERROR: The daemon exited unexpectedly with status >> 127. >> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> base/pls_base_orted_cmds.c at line 188 >> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> pls_rsh_module.c at line 1198 >> -------------------------------------------------------------------------- >> mpirun was unable to cleanly terminate the daemons for this job. Returned >> value Timeout instead of ORTE_SUCCESS. >> -------------------------------------------------------------------------- >> >> I am totally lost now, as this is the first time i am working on a cluster >> project, and need some help >> >> Thank you >> Ankush >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- > Jeff Squyres > Cisco Systems > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >