If you don't have ibv_devinfo installed on your compute nodes, then you likely 
don't have the verbs package installed at all on your compute nodes.  That's 
why you're getting errors about not finding libibverbs.so.

Specifically:

- It sounds like Open MPI was able to find libibverbs.so when it was built.  So 
whatever node you were on when you configured/compiled/installed Open MPI, that 
node had libibverbs.so (and friends) installed properly, Open MPI found them 
during configure/make, and therefore it built/installed support for verbs.

- But then you're running that installed Open MPI on nodes where libibverbs.so 
potentially is not available (e.g., that package was not installed), so Open 
MPI fails to load the verbs-based plugins (because they need libibverbs.so), 
and therefore Open MPI emits warnings about that.

The same may well be true for the crypto libraries.

(This is a more expanded version of what I said in 
https://www.mail-archive.com/users@lists.open-mpi.org/msg32727.html and 
https://www.mail-archive.com/users@lists.open-mpi.org/msg32720.html).


> On Oct 10, 2018, at 5:02 PM, Castellana Michele <michele.castell...@curie.fr> 
> wrote:
> 
> Dear John, 
> I see, thank you for your reply. Unfortunately the cluster support is of poor 
> quality, and it would take a while to get this information from them. Is 
> there any way in which I can check this by myself? Also, it looks like 
> ibv_devinfo does not exist on the cluster
> 
> $ ibv_devinfo
> -bash: ibv_devinfo: command not found
> 
> Best,
> Michele
> 
> 
>> On Oct 9, 2018, at 5:53 PM, John Hearns <hear...@googlemail.com> wrote:
>> 
>> Michele, as other have said  libibverbs.so.1  is not in your library path.
>> Can you ask the person who manages yoru cluster where libibverbs is
>> located on the compute nodes?
>> Also try to run    ibv_devinfo
>> 
>> On Tue, 9 Oct 2018 at 16:03, Castellana Michele
>> <michele.castell...@curie.fr> wrote:
>>> 
>>> Dear John,
>>> Thank you for your reply. Here is the output of ldd
>>> 
>>> $ ldd ./code.io
>>> linux-vdso.so.1 =>  (0x00007ffcc759f000)
>>> liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x00007fbc1c613000)
>>> libgsl.so.0 => /usr/lib64/libgsl.so.0 (0x00007fbc1c1ea000)
>>> libgslcblas.so.0 => /usr/lib64/libgslcblas.so.0 (0x00007fbc1bfad000)
>>> libmpi.so.40 => /data/users/xx/openmpi/lib/libmpi.so.40 (0x00007fbc1bcad000)
>>> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fbc1b9a6000)
>>> libm.so.6 => /usr/lib64/libm.so.6 (0x00007fbc1b6a4000)
>>> libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007fbc1b48e000)
>>> libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007fbc1b272000)
>>> libc.so.6 => /usr/lib64/libc.so.6 (0x00007fbc1aea5000)
>>> libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007fbc1ac4c000)
>>> libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fbc1a92a000)
>>> libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007fbc19cdd000)
>>> libopen-rte.so.40 => /data/users/xx/openmpi/lib/libopen-rte.so.40 
>>> (0x00007fbc19a2d000)
>>> libopen-pal.so.40 => /data/users/xx/openmpi/lib/libopen-pal.so.40 
>>> (0x00007fbc19733000)
>>> libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007fbc1952f000)
>>> librt.so.1 => /usr/lib64/librt.so.1 (0x00007fbc19327000)
>>> libutil.so.1 => /usr/lib64/libutil.so.1 (0x00007fbc19124000)
>>> libz.so.1 => /usr/lib64/libz.so.1 (0x00007fbc18f0e000)
>>> /lib64/ld-linux-x86-64.so.2 (0x00007fbc1cd70000)
>>> libquadmath.so.0 => /usr/lib64/libquadmath.so.0 (0x00007fbc18cd2000)
>>> 
>>> and the one for the PBS version
>>> 
>>> $   qstat --version
>>> Version: 6.1.2
>>> Commit: 661e092552de43a785c15d39a3634a541d86898e
>>> 
>>> After I created the symbolic links libcrypto.so.0.9.8  libssl.so.0.9.8, I 
>>> still have one error message left from MPI:
>>> 
>>> mca_base_component_repository_open: unable to open mca_btl_openib: 
>>> libibverbs.so.1: cannot open shared object file: No such file or directory 
>>> (ignored)
>>> 
>>> Please let me know if you have any suggestions.
>>> 
>>> Best,
>>> 
>>> 
>>> On Oct 4, 2018, at 3:12 PM, John Hearns via users 
>>> <users@lists.open-mpi.org> wrote:
>>> 
>>> Michele, the command is   ldd ./code.io
>>> I just Googled - ldd  means List dynamic Dependencies
>>> 
>>> To find out the PBS batch system type - that is a good question!
>>> Try this:     qstat --version
>>> 
>>> 
>>> 
>>> On Thu, 4 Oct 2018 at 10:12, Castellana Michele
>>> <michele.castell...@curie.fr> wrote:
>>> 
>>> 
>>> Dear John,
>>> Thank you for your reply. I have tried
>>> 
>>> ldd mpirun ./code.o
>>> 
>>> but I get an error message, I do not know what is the proper syntax to use 
>>> ldd command. Here is the information about the Linux version
>>> 
>>> $ cat /etc/os-release
>>> NAME="CentOS Linux"
>>> VERSION="7 (Core)"
>>> ID="centos"
>>> ID_LIKE="rhel fedora"
>>> VERSION_ID="7"
>>> PRETTY_NAME="CentOS Linux 7 (Core)"
>>> ANSI_COLOR="0;31"
>>> CPE_NAME="cpe:/o:centos:centos:7"
>>> HOME_URL="https://www.centos.org/";
>>> BUG_REPORT_URL="https://bugs.centos.org/";
>>> 
>>> CENTOS_MANTISBT_PROJECT="CentOS-7"
>>> CENTOS_MANTISBT_PROJECT_VERSION="7"
>>> REDHAT_SUPPORT_PRODUCT="centos"
>>> REDHAT_SUPPORT_PRODUCT_VERSION=“7"
>>> 
>>> May you please tell me how to check whether the batch system is PBSPro or 
>>> OpenPBS?
>>> 
>>> Best,
>>> 
>>> 
>>> 
>>> 
>>> On Oct 4, 2018, at 10:30 AM, John Hearns via users 
>>> <users@lists.open-mpi.org> wrote:
>>> 
>>> Michele  one tip:   log into a compute node using ssh and as your own 
>>> username.
>>> If you use the Modules envirnonment then load the modules you use in
>>> the job script
>>> then use the  ldd  utility to check if you can load all the libraries
>>> in the code.io executable
>>> 
>>> Actually you are better to submit a short batch job which does not use
>>> mpirun but uses ldd
>>> A proper batch job will duplicate the environment you wish to run in.
>>> 
>>>  ldd ./code.io
>>> 
>>> By the way, is the batch system PBSPro or OpenPBS?  Version 6 seems a bit 
>>> old.
>>> Can you say what version of Redhat or CentOS this cluster is installed with?
>>> 
>>> 
>>> 
>>> On Thu, 4 Oct 2018 at 00:02, Castellana Michele
>>> <michele.castell...@curie.fr> wrote:
>>> 
>>> I fixed it, the correct file was in /lib64, not in /lib.
>>> 
>>> Thank you for your help.
>>> 
>>> On Oct 3, 2018, at 11:30 PM, Castellana Michele 
>>> <michele.castell...@curie.fr> wrote:
>>> 
>>> Thank you, I found some libcrypto files in /usr/lib indeed:
>>> 
>>> $ ls libcry*
>>> libcrypt-2.17.so  libcrypto.so.10  libcrypto.so.1.0.2k  libcrypt.so.1
>>> 
>>> but I could not find libcrypto.so.0.9.8. Here they suggest to create a 
>>> hyperlink, but if I do I still get an error from MPI. Is there another way 
>>> around this?
>>> 
>>> Best,
>>> 
>>> On Oct 3, 2018, at 11:00 PM, Jeff Squyres (jsquyres) via users 
>>> <users@lists.open-mpi.org> wrote:
>>> 
>>> It's probably in your Linux distro somewhere -- I'd guess you're missing a 
>>> package (e.g., an RPM or a deb) out on your compute nodes...?
>>> 
>>> 
>>> On Oct 3, 2018, at 4:24 PM, Castellana Michele 
>>> <michele.castell...@curie.fr> wrote:
>>> 
>>> Dear Ralph,
>>> Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 
>>> ?
>>> 
>>> Best,
>>> 
>>> On Oct 3, 2018, at 9:41 PM, Ralph H Castain <r...@open-mpi.org> wrote:
>>> 
>>> Actually, I see that you do have the tm components built, but they cannot 
>>> be loaded because you are missing libcrypto from your LD_LIBRARY_PATH
>>> 
>>> 
>>> On Oct 3, 2018, at 12:33 PM, Ralph H Castain <r...@open-mpi.org> wrote:
>>> 
>>> Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t 
>>> build PBS support and so we only see one node with a single slot allocated 
>>> to it.
>>> 
>>> 
>>> On Oct 3, 2018, at 12:02 PM, Castellana Michele 
>>> <michele.castell...@curie.fr> wrote:
>>> 
>>> Dear all,
>>> I am having trouble running an MPI code across multiple cores on a new 
>>> computer cluster, which uses PBS. Here is a minimal example, where I want 
>>> to run two MPI processes, each on  a different node. The PBS script is
>>> 
>>> #!/bin/bash
>>> #PBS -l walltime=00:01:00
>>> #PBS -l mem=1gb
>>> #PBS -l nodes=2:ppn=1
>>> #PBS -q batch
>>> #PBS -N test
>>> mpirun -np 2 ./code.o
>>> 
>>> and when I submit it with
>>> 
>>> $qsub script.sh
>>> 
>>> I get the following message in the PBS error file
>>> 
>>> $ cat test.e1234
>>> [shbli040:08879] mca_base_component_repository_open: unable to open 
>>> mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such 
>>> file or directory (ignored)
>>> [shbli040:08879] mca_base_component_repository_open: unable to open 
>>> mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file 
>>> or directory (ignored)
>>> [shbli040:08879] mca_base_component_repository_open: unable to open 
>>> mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such 
>>> file or directory (ignored)
>>> --------------------------------------------------------------------------
>>> There are not enough slots available in the system to satisfy the 2 slots
>>> that were requested by the application:
>>> ./code.o
>>> 
>>> Either request fewer slots for your application, or make more slots 
>>> available
>>> for use.
>>> —————————————————————————————————————
>>> 
>>> The PBS version is
>>> 
>>> $ qstat --version
>>> Version: 6.1.2
>>> 
>>> and here is some additional information on the MPI version
>>> 
>>> $ mpicc -v
>>> Using built-in specs.
>>> COLLECT_GCC=/bin/gcc
>>> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
>>> Target: x86_64-redhat-linux
>>> […]
>>> Thread model: posix
>>> gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
>>> 
>>> Do you guys know what may be the issue here?
>>> 
>>> Thank you
>>> Best,
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> 
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to