What happens is mpirun does under the hood
<remote_exec> orted
And your remote_exec does not propagate LD_LIBRARY_PATH
one option is to configure your remote_exec to do so, but I'd rather suggest
you re-configure ompi with --enable-orterun-prefix-by-default
If your remote_exec is ssh (if you are not running under a supported batch
manager), then
ssh node188 ldd $path_to_openmpi_bin/orted
should show zero unresolved libraries

Cheers,

Gilles

On Sunday, April 9, 2017, Ilchenko Evgeniy <ilchenk...@gmail.com> wrote:

> Hi!
>
> Problem with random segfault for java-programs solved by adding mca
> options:
>
> $path_to_openmpi_bin/mpirun -np 1  -mca btl self,sm,openib
>  $path_to_java_bin/java randomTest
>
> Thanks to Eshsou Hashba and Michael Kalugin!
>
>
> But i get other problems!
>
> If I start mpirun from manager-node (without ssh-login to calculation node)
>
> $path_to_openmpi_bin/mpirun  -np 2 -host node188,node189 -mca btl
> self,sm,openib   $path_to_java_bin/java randomTest
>
> I get next error:
>
>
> $openmpi1.10_folder/bin/orted: error while loading shared libraries:
> libimf.so: cannot open shared object file: No such file or directory
> --------------------------------------------------------------------------
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
>
> * not finding the required libraries and/or binaries on
>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>   settings, or configure OMPI with --enable-orterun-prefix-by-default
>
> * lack of authority to execute on one or more specified nodes.
>   Please verify your allocation and authorities.
>
> * the inability to write startup files into /tmp
> (--tmpdir/orte_tmpdir_base).
>   Please check with your sys admin to determine the correct location to
> use.
>
> *  compilation of the orted with dynamic libraries when static are required
>   (e.g., on Cray). Please check your configure cmd line and consider using
>   one of the contrib/platform definitions for your system type.
>
> * an inability to create a connection back to mpirun due to a
>   lack of common network interfaces and/or no route found between
>   them. Please check network connectivity (including firewalls
>   and network routing requirements).
> --------------------------------------------------------------------------
>
> If I throw LD_LIBRARY_PATH (that contain path to  libimf.so) via -x option
> to mpirun:
>
> $path_to_openmpi_bin/mpirun  -x LD_LIBRARY_PATH -np 2 -host
> node188,node189 -mca btl self,sm,openib   $path_to_java_bin/java randomTest
>
> then I get same error (orted: error while loading shared libraries:
> libimf.so: cannot open shared object file: No such file or directory).
>
> How I can throw lib path for spawned mpi processes and orted?
> I don't have root-privileges on this cluster.
>
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to