The problem is likely that your path variables aren't being set properly on the 
remote machine when mpirun launches the remote daemon. You might check to see 
that your default shell rc file is also setting those values correctly. 
Alternatively, modify your mpirun cmd line a bit by adding

mpirun -prefix /myname ...

so it will set the remove prefix and see if that helps. If it does, you can add 
--enable-orterun-prefix-by-default to your configure line so mpirun always adds 
it.


On Apr 28, 2013, at 7:56 AM, "E.O." <ooyama.eii...@gmail.com> wrote:

> Hello
> 
> I have five linux machines (one is redhat and the other are busybox)
> I downloaded openmpi-1.6.4.tar.gz into my main redhat machine and 
> configure'ed/compiled it successfully. 
> ./configure --prefix=/myname
> I installed it to /myname directory successfully. I am able to run a simple 
> hallo.c on my redhat machine.
> 
> [root@host1 /tmp] # mpirun -np 4 ./hello.out
> I am parent
> I am a child
> I am a child
> I am a child
> [root@host1 /tmp] #
> 
> Then, I sent entire /myname directory to the another machine (host2).
> [root@host1 /] # tar zcf - myname | ssh host2 "(cd /; tar zxf -)"
> 
> and ran mpirun for the host (host2).
> 
> [root@host1 tmp]# mpirun -np 4 -host host2 ./hello.out
> --------------------------------------------------------------------------
> Sorry!  You were supposed to get help about:
>     opal_init:startup:internal-failure
> But I couldn't open the help file:
>     //share/openmpi/help-opal-runtime.txt: No such file or directory.  Sorry!
> --------------------------------------------------------------------------
> [host2:26294] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file 
> runtime/orte_init.c at line 79
> [host2:26294] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file 
> orted/orted_main.c at line 358
> --------------------------------------------------------------------------
> A daemon (pid 23691) died unexpectedly with status 255 while attempting
> to launch so we are aborting.
> 
> There may be more information reported by the environment (see above).
> 
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> [root@host1 tmp]#
> 
> I set those environment variables
> 
> [root@host1 tmp]# echo $LD_LIBRARY_PATH
> /myname/lib/
> [root@host1 tmp]# echo $OPAL_PREFIX
> /myname/
> [root@host1 tmp]#
> 
> [root@host2 /] # ls -la /myname/lib/libmpi.so.1
> lrwxrwxrwx    1 root     root            15 Apr 28 10:21 
> /myname/lib/libmpi.so.1 -> libmpi.so.1.0.7
> [root@host2 /] #
> 
> If I ran the ./hello.out binary inside host2, it works fine
> 
> [root@host1 tmp]# ssh host2
> [root@host2 /] # /tmp/hello.out
> I am parent
> [root@host2 /] #
> 
> Can someone help me figure out why I cannot run hello.out in host2 from host1 
> ?
> Am I missing any env variables ?
> 
> Thank you,
> 
> Eiichi
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to