On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote: > That being said, the error suggest mca_oob_ud.so is a module from a > previous install, > Open MPI was not built on the system it is running, or libibverbs.so.1 > has been removed after > Open MPI was built.
yes, understood, i compiled openmpi on a node that has all the libraries installed for our various interconnects, opa/psm/mxm/ib, but i ran mpirun on a node that has none of them so the resulting warnings i get mca_btl_openib: lbrdmacm.so.1 mca_btl_usnic: libfabric.so.1 mca_oob_ud: libibverbs.so.1 mca_mtl_mxm: libmxm.so.2 mca_mtl_ofi: libfabric.so.1 mca_mtl_psm: libpsm_infinipath.so.1 mca_mtl_psm2: libpsm2.so.2 mca_pml_yalla: libmxm.so.2 you referenced them as "errors" above, but mpi actually runs just fine for me even with these msgs, so i would consider them more warnings. > So I do encourage you to take a step back, and think if you can find a > better solution for your site. there are two alternatives 1 i can compile a specific version of openmpi for each of our clusters with each specific interconnect libraries 2 i can install all the libraries on all the machines regardless of whether the interconnect is present both are certainly plausible, but my effort here is to see if i can reduce the size of our software stack and/or reduce the number of compiled versions of openmpi it would be nice if openmpi had (or may already have) a simple switch that lets me disable entire portions of the library chain, ie this host doesn't have a particular interconnect, so don't load any of the libraries. this might run counter to how openmpi discovers and load libs though. _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users