Hi Souvik

I would guess you only installed OpenMPI only on ict1, not on ict2.
If that is the case you won't have the required  OpenMPI libraries
on ict:/usr/local, and the job won't run on ict2.

I am guessing this, because you used a prefix under /usr/local,
which tends to be a "per machine" directory,
not a typical name of an NFS
mounted directory.
Using an NFS mounted directory is another way to make
OpenMPI visible to all nodes.
See this FAQ:

I hope this helps,
Gus Correa
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA

souvik bhattacherjee wrote:
Dear all,

Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3 separately on two of my machines ict1 and ict2. These machines are dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors and are connected by Gigabit ethernet switch. As a prerequisite, I can ssh between them without a password or passphrase ( I did not supply the passphrase at all ). Thereafter,

$ cd openmpi-1.3.3
$ mkdir build
$ cd build
$ ../configure --prefix=/usr/local/openmpi-1.3.3/

Then as a root user,

# make all install

Also .bash_profile and .bashrc had the following lines written into them:



$ cd ../examples/
$ make
$ mpirun -np 2 --host ict1 hello_c
hello_c: error while loading shared libraries: libmpi.so.0: cannot open shared object file: No suchfile or directory hello_c: error while loading shared libraries: libmpi.so.0: cannot open shared object file: No suchfile or directory

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
   Hello, world, I am 1 of 2
   Hello, world, I am 0 of 2

But the program hangs when ....

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2 hello_c
This statement does not produce any output. Doing top on either machines does not show any hello_c running. However, when I press Ctrl+C the following output appears

^Cmpirun: killing job...

mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
        ict2 - daemon did not report back when launched


The same thing repeats itself when hello_c is run from ict2. Since, the program does not produce any error, it becomes difficult to locate where I might have gone wrong.

Did anyone of you encounter this problem or anything similar ? Any help would be much appreciated.





users mailing list

Reply via email to