Yes, that is the intended behavior: Open MPI basically only uses UCX for IB 
transports (and shared memory -- but only when also used with IB transports).

If IB can't be used, the UCX PML disqualifies itself.  This is by design, even 
though UCX can handle other transports (including TCP and shared memory).  The 
rationale for that is that the Open MPI community wanted direct control non-IB 
transports (e.g., shared memory).  Otherwise, a very large portion of the Open 
MPI code base and functionality would be subsumed by the UCX code base, and we 
would be reliant on the UCX community for core Open MPI functionality across 
several different

Hence, by default, UCX is basically used for IB and nothing else.

You can override this behavior by setting the opal_common_ucx_tls env variable 
to a comma-delimited list of UCX transports that the UCX PML will be allowed to 
use.  This MCA param defaults to:


rc_verbs,ud_verbs,rc_mlx5,dc_mlx5,ud_mlx5,cuda_ipc,rocm_ipc

(you'll need to ask the UCX community what each of those do/are)

--
Jeff Squyres
jsquy...@cisco.com
________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Bernstein, Noam CIV 
USN NRL (6393) Washington DC (USA) via users <users@lists.open-mpi.org>
Sent: Thursday, August 25, 2022 12:27 PM
To: Tim Carlson <timothy.carl...@pnnl.gov>
Cc: Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) 
<noam.bernst...@nrl.navy.mil>; Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] ucx problems

Yeah, that appears to have been the issue - IB is entirely dead (it's a new 
machine, so maybe no subnet manager, or maybe a bad cable). I'll track that 
down, and follow up here if there's still an issue once the low level IB 
problem is fixed.

However, given that ucx says it supports shared memory transports, I'm a bit 
surprised that it cannot operate (at least in OpenMPI) without IB active (it's 
a single node job).  I added some print statements to common_ucx.c, and 
discovered that ucx knows about a few transports like posix and tcp, but 
OpenMPI never tries to use those, so it never finds a match.  Is that expected 
from how OpenMPI tries to use ucx?

thanks,
Noam

On Aug 25, 2022, at 12:10 PM, Tim Carlson 
<timothy.carl...@pnnl.gov<mailto:timothy.carl...@pnnl.gov>> wrote:

And the output of

ibstat
ibhosts

Is what? Maybe no subnet manager running?

  • [OMPI use... Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users
    • Re: ... Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users
      • ... Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users
        • ... Jeff Squyres (jsquyres) via users

Reply via email to