Finally, it seems I'm able to run my program on a remote host. The problem was due to some firewall settings. Modifying the firewall ACCEPT policy as shown below, did the work.
# /etc/init.d/ip6tables stop Resetting built-in chains to the default ACCEPT policy: [ OK ] # /etc/init.d/iptables stop Resetting built-in chains to the default ACCEPT policy: [ OK ] Another related query: Let me mention once again, I had installed openmpi-1.3.3 separately on two of my machines ict1 and ict2. Now when I issue the following command : $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 4 --host ict2,ict1 hello_c -------------------------------------------------------------------------- mpirun was unable to launch the specified application as it could not find an executable: Executable: hello_c Node: ict1 while attempting to start process rank 1. -------------------------------------------------------------------------- So, I did a *make* on the examples directory on ict1 to generate the executable (One can also copy the executable from ict2 to ict1 in the same directory). Now, it seems to run fine. $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 4 --host ict2,ict1 hello_c Hello, world, I am 0 of 8 Hello, world, I am 2 of 8 Hello, world, I am 4 of 8 Hello, world, I am 6 of 8 Hello, world, I am 5 of 8 Hello, world, I am 3 of 8 Hello, world, I am 7 of 8 Hello, world, I am 1 of 8 $ This implies that one has to copy the executables in the remote host each time one requires to run a program which is different from the previous one. Is the implication correct or is there some way around. Thanks, On Mon, Sep 21, 2009 at 1:54 PM, souvik bhattacherjee <souvi...@gmail.com>wrote: > As Ralph suggested, I *reversed the order of my PATH settings*: > > This is what I it shows: > > $ echo $PATH > > /usr/local/openmpi-1.3.3/bin/:/usr/bin:/bin:/usr/local/bin:/usr/X11R6/bin/:/usr/games:/usr/lib/qt4/bin:/usr/bin:/opt/kde3/bin > > $ echo $LD_LIBRARY_PATH > /usr/local/openmpi-1.3.3/lib/ > > Moreover, I checked that there were *NO* system supplied versions of OMPI, > previously installed. ( I did install MPICH2 earlier, but I had removed the > binaries and the related files). This is because, > > $ locate mpicc > > /home/souvik/software/openmpi-1.3.3/build/ompi/contrib/vt/wrappers/mpicc-vt-wrapper-data.txt > > /home/souvik/software/openmpi-1.3.3/build/ompi/tools/wrappers/mpicc-wrapper-data.txt > /home/souvik/software/openmpi-1.3.3/build/ompi/tools/wrappers/mpicc.1 > > /home/souvik/software/openmpi-1.3.3/contrib/platform/win32/ConfigFiles/mpicc-wrapper-data.txt.cmake > > /home/souvik/software/openmpi-1.3.3/ompi/contrib/vt/wrappers/mpicc-vt-wrapper-data.txt > /home/souvik/software/openmpi-1.3.3/ompi/contrib/vt/wrappers/ > mpicc-vt-wrapper-data.txt.in > > /home/souvik/software/openmpi-1.3.3/ompi/tools/wrappers/mpicc-wrapper-data.txt > /home/souvik/software/openmpi-1.3.3/ompi/tools/wrappers/ > mpicc-wrapper-data.txt.in > /usr/local/openmpi-1.3.3/bin/mpicc > /usr/local/openmpi-1.3.3/bin/mpicc-vt > /usr/local/openmpi-1.3.3/share/man/man1/mpicc.1 > /usr/local/openmpi-1.3.3/share/openmpi/mpicc-vt-wrapper-data.txt > /usr/local/openmpi-1.3.3/share/openmpi/mpicc-wrapper-data.txt > > does not show the occurrence of mpicc in any directory related to MPICH2. > > The results are same with mpirun > > $ locate mpirun > /home/souvik/software/openmpi-1.3.3/build/ompi/tools/ortetools/mpirun.1 > /home/souvik/software/openmpi-1.3.3/ompi/runtime/mpiruntime.h > /usr/local/openmpi-1.3.3/bin/mpirun > /usr/local/openmpi-1.3.3/share/man/man1/mpirun.1 > > *These tests were done both on ict1 and ict2*. > > I performed another test which probably proves that the executable finds > the required files on the remote host. The program was run from ict2. > > $ cd /home/souvik/software/openmpi-1.3.3/examples/ > > $ mpirun -np 4 --host ict2,ict1 hello_c > bash: orted: command not found > -------------------------------------------------------------------------- > A daemon (pid 28023) died unexpectedly with status 127 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -------------------------------------------------------------------------- > mpirun: clean termination accomplished > > $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 4 --host ict2,ict1 hello_c > > *This command-line statement as usual does not produce any output. On > pressing Crtl+C, the following output occurs* > > ^Cmpirun: killing job... > > -------------------------------------------------------------------------- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun was unable to cleanly terminate the daemons on the nodes shown > below. Additional manual cleanup may be required - please refer to > the "orte-clean" tool for assistance. > -------------------------------------------------------------------------- > ict1 - daemon did not report back when launched > > $ > > Also, doing *top *does not show any *mpirun* & *hello_c* process running > in both the hosts. However, running hello_c in a single host say, ict2 does > show *mpirun* & *hello_c* in the process list. > > > > > > > On Sat, Sep 19, 2009 at 8:13 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> One thing that flags my attention. In your PATH definition, you put $PATH >> ahead of your OMPI 1.3.3 installation. Thus, if there are any system >> supplied versions of OMPI hanging around (and there often are), they will be >> executed instead of your new installation. >> You might try reversing that order. >> >> On Sep 19, 2009, at 7:33 AM, souvik bhattacherjee wrote: >> >> Hi Gus (and all OpenMPI users), >> >> Thanks for your interest in my problem. However, the points you had raised >> earlier in your mails, seems to me that, I had already taken care of them. I >> had enlisted them below pointwise. Your comments are rewritten in *RED *and >> my replies in *BLACK.* >> >> 1) As you have mentioned: "*I would guess you only installed OpenMPI only >> on ict1, not on ict2*". However, I had mentioned initially: "*I had >> installed openmpi-1.3.3 separately on two of my machines ict1 and ict2*". >> >> 2) Next you said: "*I am guessing this, because you used a prefix under >> /usr/local*". However, I had installed them under: >> *$ mkdir build >> $ cd build >> $ ../configure --prefix=/usr/local/openmpi-1.3.3/ >> # make all install* >> >> 3) Next as you pointed out: "* ...not a typical name of an NFS mounted >> directory. Using an NFS mounted directory is another way to make OpenMPI >> visible to all nodes *". >> Let me tell you once again, that I am not going for an NFS installation as >> the first point in this list makes it clear. >> >> 4) In your next mail: " *If you can ssh passwordless from ict1 to ict2 >> *and* vice versa *". Again as I had mentioned earlier " *As a >> prerequisite, I can ssh between them without a password or passphrase ( I >> did not supply the passphrase at all ).* " >> >> 5) Further as you said: " *If your /etc/hosts file on *both* machines >> list ict1 and ict2 >> and their IP addresses *". Let me mention here that, these things are >> already very well taken care of. >> >> 6) Finally as you said: " *In case you have a /home directory on each >> machine (i.e. /home is not NFS mounted) if your .bashrc files on *both* >> machines set the PATH >> and LD_LIBRARY_PATH to point to the OpenMPI directory. *" >> >> Again as I had mentioned previously, *Also .bash_profile and .bashrc had >> the following lines written into them: >> >> PATH=$PATH:/usr/local/openmpi-1.3.3/bin/ >> LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/* >> * >> *************************************************************************************************************** >> * >> ** >> As an additional bit of information, (which might assist you in the >> investigation) I had used *Mandriva 2009.1* on all of my systems. >> >> Hope, this will help you. Eagerly awaiting a response. >> >> Thanks, >> >> On 9/18/09, Gus Correa <g...@ldeo.columbia.edu> wrote: >>> >>> Hi Souvik >>> >>> Also worth checking: >>> >>> 1) If you can ssh passwordless from ict1 to ict2 *and* vice versa. >>> 2) If your /etc/hosts file on *both* machines list ict1 and ict2 >>> and their IP addresses. >>> 3) In case you have a /home directory on each machine (i.e. /home is >>> not NFS mounted) if your .bashrc files on *both* machines set the PATH >>> and LD_LIBRARY_PATH to point to the OpenMPI directory. >>> >>> Gus Correa >>> >>> Gus Correa wrote: >>> >>>> Hi Souvik >>>> >>>> I would guess you only installed OpenMPI only on ict1, not on ict2. >>>> If that is the case you won't have the required OpenMPI libraries >>>> on ict:/usr/local, and the job won't run on ict2. >>>> >>>> I am guessing this, because you used a prefix under /usr/local, >>>> which tends to be a "per machine" directory, >>>> not a typical name of an NFS >>>> mounted directory. >>>> Using an NFS mounted directory is another way to make >>>> OpenMPI visible to all nodes. >>>> See this FAQ: >>>> http://www.open-mpi.org/faq/?category=building#where-to-install >>>> >>>> I hope this helps, >>>> Gus Correa >>>> --------------------------------------------------------------------- >>>> Gustavo Correa >>>> Lamont-Doherty Earth Observatory - Columbia University >>>> Palisades, NY, 10964-8000 - USA >>>> --------------------------------------------------------------------- >>>> >>>> >>>> souvik bhattacherjee wrote: >>>> >>>>> Dear all, >>>>> >>>>> Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3 >>>>> separately on two of my machines ict1 and ict2. These machines are >>>>> dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors and >>>>> are connected by Gigabit ethernet switch. As a prerequisite, I can ssh >>>>> between them without a password or passphrase ( I did not supply the >>>>> passphrase at all ). Thereafter, >>>>> >>>>> $ cd openmpi-1.3.3 >>>>> $ mkdir build >>>>> $ cd build >>>>> $ ../configure --prefix=/usr/local/openmpi-1.3.3/ >>>>> >>>>> Then as a root user, >>>>> >>>>> # make all install >>>>> >>>>> Also .bash_profile and .bashrc had the following lines written into >>>>> them: >>>>> >>>>> PATH=$PATH:/usr/local/openmpi-1.3.3/bin/ >>>>> LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/ >>>>> >>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> $ cd ../examples/ >>>>> $ make >>>>> $ mpirun -np 2 --host ict1 hello_c >>>>> hello_c: error while loading shared libraries: libmpi.so.0: cannot >>>>> open shared object file: No suchfile or directory >>>>> hello_c: error while loading shared libraries: libmpi.so.0: cannot >>>>> open shared object file: No suchfile or directory >>>>> >>>>> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c >>>>> Hello, world, I am 1 of 2 >>>>> Hello, world, I am 0 of 2 >>>>> >>>>> But the program hangs when .... >>>>> >>>>> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2 >>>>> hello_c >>>>> This statement does not produce any output. Doing top on either >>>>> machines does not show any hello_c running. However, when I press Ctrl+C >>>>> the >>>>> following output appears >>>>> >>>>> ^Cmpirun: killing job... >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> mpirun noticed that the job aborted, but has no info as to the process >>>>> that caused that situation. >>>>> -------------------------------------------------------------------------- >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> mpirun was unable to cleanly terminate the daemons on the nodes shown >>>>> below. Additional manual cleanup may be required - please refer to >>>>> the "orte-clean" tool for assistance. >>>>> -------------------------------------------------------------------------- >>>>> >>>>> ict2 - daemon did not report back when launched >>>>> >>>>> $ >>>>> >>>>> The same thing repeats itself when hello_c is run from ict2. Since, the >>>>> program does not produce any error, it becomes difficult to locate where I >>>>> might have gone wrong. >>>>> >>>>> Did anyone of you encounter this problem or anything similar ? Any help >>>>> would be much appreciated. >>>>> >>>>> Thanks, >>>>> >>>>> -- >>>>> >>>>> Souvik >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> >> -- >> Souvik >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Souvik Bhattacherjee > > -- Souvik Bhattacherjee