Hi Gus (and all OpenMPI users), Thanks for your interest in my problem. However, the points you had raised earlier in your mails, seems to me that, I had already taken care of them. I had enlisted them below pointwise. Your comments are rewritten in *RED *and my replies in *BLACK.*
1) As you have mentioned: "*I would guess you only installed OpenMPI only on ict1, not on ict2*". However, I had mentioned initially: "*I had installed openmpi-1.3.3 separately on two of my machines ict1 and ict2*". 2) Next you said: "*I am guessing this, because you used a prefix under /usr/local*". However, I had installed them under: *$ mkdir build $ cd build $ ../configure --prefix=/usr/local/openmpi-1.3.3/ # make all install* 3) Next as you pointed out: "* ...not a typical name of an NFS mounted directory. Using an NFS mounted directory is another way to make OpenMPI visible to all nodes *". Let me tell you once again, that I am not going for an NFS installation as the first point in this list makes it clear. 4) In your next mail: " *If you can ssh passwordless from ict1 to ict2 *and* vice versa *". Again as I had mentioned earlier " *As a prerequisite, I can ssh between them without a password or passphrase ( I did not supply the passphrase at all ).* " 5) Further as you said: " *If your /etc/hosts file on *both* machines list ict1 and ict2 and their IP addresses *". Let me mention here that, these things are already very well taken care of. 6) Finally as you said: " *In case you have a /home directory on each machine (i.e. /home is not NFS mounted) if your .bashrc files on *both* machines set the PATH and LD_LIBRARY_PATH to point to the OpenMPI directory. *" Again as I had mentioned previously, *Also .bash_profile and .bashrc had the following lines written into them: PATH=$PATH:/usr/local/openmpi-1.3.3/bin/ LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/* * *************************************************************************************************************** * ** As an additional bit of information, (which might assist you in the investigation) I had used *Mandriva 2009.1* on all of my systems. Hope, this will help you. Eagerly awaiting a response. Thanks, On 9/18/09, Gus Correa <g...@ldeo.columbia.edu> wrote: > > Hi Souvik > > Also worth checking: > > 1) If you can ssh passwordless from ict1 to ict2 *and* vice versa. > 2) If your /etc/hosts file on *both* machines list ict1 and ict2 > and their IP addresses. > 3) In case you have a /home directory on each machine (i.e. /home is > not NFS mounted) if your .bashrc files on *both* machines set the PATH > and LD_LIBRARY_PATH to point to the OpenMPI directory. > > Gus Correa > > Gus Correa wrote: > >> Hi Souvik >> >> I would guess you only installed OpenMPI only on ict1, not on ict2. >> If that is the case you won't have the required OpenMPI libraries >> on ict:/usr/local, and the job won't run on ict2. >> >> I am guessing this, because you used a prefix under /usr/local, >> which tends to be a "per machine" directory, >> not a typical name of an NFS >> mounted directory. >> Using an NFS mounted directory is another way to make >> OpenMPI visible to all nodes. >> See this FAQ: >> http://www.open-mpi.org/faq/?category=building#where-to-install >> >> I hope this helps, >> Gus Correa >> --------------------------------------------------------------------- >> Gustavo Correa >> Lamont-Doherty Earth Observatory - Columbia University >> Palisades, NY, 10964-8000 - USA >> --------------------------------------------------------------------- >> >> >> souvik bhattacherjee wrote: >> >>> Dear all, >>> >>> Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3 >>> separately on two of my machines ict1 and ict2. These machines are >>> dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors and >>> are connected by Gigabit ethernet switch. As a prerequisite, I can ssh >>> between them without a password or passphrase ( I did not supply the >>> passphrase at all ). Thereafter, >>> >>> $ cd openmpi-1.3.3 >>> $ mkdir build >>> $ cd build >>> $ ../configure --prefix=/usr/local/openmpi-1.3.3/ >>> >>> Then as a root user, >>> >>> # make all install >>> >>> Also .bash_profile and .bashrc had the following lines written into them: >>> >>> PATH=$PATH:/usr/local/openmpi-1.3.3/bin/ >>> LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/ >>> >>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> >>> >>> $ cd ../examples/ >>> $ make >>> $ mpirun -np 2 --host ict1 hello_c >>> hello_c: error while loading shared libraries: libmpi.so.0: cannot open >>> shared object file: No suchfile or directory >>> hello_c: error while loading shared libraries: libmpi.so.0: cannot open >>> shared object file: No suchfile or directory >>> >>> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c >>> Hello, world, I am 1 of 2 >>> Hello, world, I am 0 of 2 >>> >>> But the program hangs when .... >>> >>> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2 >>> hello_c >>> This statement does not produce any output. Doing top on either machines >>> does not show any hello_c running. However, when I press Ctrl+C the >>> following output appears >>> >>> ^Cmpirun: killing job... >>> >>> -------------------------------------------------------------------------- >>> >>> mpirun noticed that the job aborted, but has no info as to the process >>> that caused that situation. >>> -------------------------------------------------------------------------- >>> >>> -------------------------------------------------------------------------- >>> >>> mpirun was unable to cleanly terminate the daemons on the nodes shown >>> below. Additional manual cleanup may be required - please refer to >>> the "orte-clean" tool for assistance. >>> -------------------------------------------------------------------------- >>> >>> ict2 - daemon did not report back when launched >>> >>> $ >>> >>> The same thing repeats itself when hello_c is run from ict2. Since, the >>> program does not produce any error, it becomes difficult to locate where I >>> might have gone wrong. >>> >>> Did anyone of you encounter this problem or anything similar ? Any help >>> would be much appreciated. >>> >>> Thanks, >>> >>> -- >>> >>> Souvik >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Souvik