Sorry – I did actually send a thank you to Gilles and John @ 8:48 local time 
but it looks like at some point in my conversation with Gilles we stopped 
CC’ing the list – which means John never saw my thank you.

So, “Thanks for the help, John!”

From: users <users-boun...@lists.open-mpi.org> On Behalf Of Jeff Squyres 
(jsquyres) via users
Sent: Wednesday, April 7, 2021 10:28 AM
To: John Hearns <hear...@gmail.com>
Cc: Jeff Squyres (jsquyres) <jsquy...@cisco.com>; Open MPI User's List 
<users@lists.open-mpi.org>
Subject: Re: [OMPI users] Building Open-MPI with Intel C

:-)

For the web archives: Mike confirmed to me off-list that the non-interactive 
login setup was, indeed, the issue, and he's now good to go.



On Apr 7, 2021, at 10:09 AM, John Hearns 
<hear...@gmail.com<mailto:hear...@gmail.com>> wrote:

Jeff, you know as well as I do that EVERYTHING is in the path at Cornelis 
Networks.

On Wed, 7 Apr 2021 at 14:59, Jeff Squyres (jsquyres) 
<jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote:
Check the output from ldd in a non-interactive login: your LD_LIBRARY_PATH 
probably doesn't include the location of the Intel runtime.

E.g.

    ssh othernode ldd /path/to/orted

Your shell startup files may well differentiate between interactive and 
non-interactive logins (i.e., it may set PATH / LD_LIBRARY_PATH / etc. 
differently).



On Apr 7, 2021, at 7:21 AM, John Hearns via users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:

Manually log into one of your nodes. Load the modules you use in a batch job. 
Run 'ldd' on your executable.
Start at the bottom and work upwards...

By the way, have you looked at using Easybuild? Would be good to have your 
input there maybe.


On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:
I’m having a heck of a time building OMPI with Intel C. Compilation goes fine, 
installation goes fine, compiling test apps (the OSU benchmarks) goes fine…

but when I go to actually run an MPI app I get:

[awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/mpirun -np 2 -H 
awbp025,awbp026,awbp027,awbp028 -x FI_PROVIDER=opa1x -x 
LD_LIBRARY_PATH=/usr/mpi/icc/openmpi-icc/lib64:/lib hostname
/usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory
/usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory

Looking at orted, it does seem like the binary is linking correctly:

[awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/orted
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
ess_env_module.c at line 135
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
util/session_dir.c at line 107
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
util/session_dir.c at line 346
[awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
base/ess_base_std_orted.c at line 264
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_session_dir failed
  --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
--------------------------------------------------------------------------

and…

[awbp025:~/work/osu-icc](N/A)$ ldd /usr/mpi/icc/openmpi-icc/bin/orted
        linux-vdso.so.1 (0x00007fffc2ebf000)
        libopen-rte.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-rte.so.40 
(0x00007fdaa6404000)
        libopen-pal.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-pal.so.40 
(0x00007fdaa60bd000)
        libopen-orted-mpir.so => 
/usr/mpi/icc/openmpi-icc/lib/libopen-orted-mpir.so (0x00007fdaa5ebb000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fdaa5b39000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fdaa5931000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007fdaa572d000)
        libz.so.1 => /lib64/libz.so.1 (0x00007fdaa5516000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fdaa52fe000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdaa50de000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fdaa4d1b000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fdaa4b17000)
        libimf.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so
 (0x00007fdaa4494000)
        libsvml.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so
 (0x00007fdaa29c4000)
        libirng.so => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so
 (0x00007fdaa2659000)
        libintlc.so.5 => 
/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5
 (0x00007fdaa23e1000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fdaa66d6000)

Can anyone suggest what I’m forgetting to do?

---
Michael Heinz
Fabric Software Engineer, Cornelis Networks



--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>



--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>

Reply via email to