I'm trying to setup a small cluster of 2 nodes.

Both nodes are running Fedora 11 Kernel 2.6.29.4, have the same user
mpiuser with the same password. Both of them have their env vars set
as follows in /etc/profile itself:
PATH                                usr/lib/openmpi/bin
LD_LIBRARY_PATH           usr/lib/openmpi/lib

Currently, mpirun executes successfully on either node individually.
However, when trying to run over the network, I get:

[mpiuser@c-199 ~]$ mpirun -np 3 --hostfile .mpi_hostfile ./a.out
bash: orted: command not found
--------------------------------------------------------------------------
A daemon (pid 12639) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished

What fixes should I try to get the cluster to work?

Reply via email to