Hi all - I'm trying to get openmpi with ucx working on a new Rocky Linux 8 + 
OpenHPC machine. I'm used to running with
mpirun --mca pml ucx --mca osc ucx --mca btl ^vader,tcp,openib --bind-to core 
--map-by core --rank-by core
However, now it complains that it can't start the pml, with the message
--------------------------------------------------------------------------
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      tin2
  Framework: pml
--------------------------------------------------------------------------

I thought maybe there were infiniband issues ("ucx_info -d" shows no active IB 
interface), so I removed the "--mca btl", but I still get the following error
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: tin2
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      tin2
  Framework: pml
--------------------------------------------------------------------------
[tin2:924804] PML ucx cannot be selected

I would have expected that it would work with some sort of shared memory, since 
I'm just running on a single node. The ucx library is in LD_LIBRARY_PATH. 
However, I did notice that "omp_info --all" does not show the "uct" btl, which 
does show up on an older machine where this works.

Is there any way to figure out where the initialization process is failing?

thanks,
Noam
  • [OMPI use... Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users
    • Re: ... Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users
      • ... Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users
        • ... Jeff Squyres (jsquyres) via users

Reply via email to