Yes, that is the intended behavior: Open MPI basically only uses UCX for IB transports (and shared memory -- but only when also used with IB transports).
If IB can't be used, the UCX PML disqualifies itself. This is by design, even though UCX can handle other transports (including TCP and shared memory). The rationale for that is that the Open MPI community wanted direct control non-IB transports (e.g., shared memory). Otherwise, a very large portion of the Open MPI code base and functionality would be subsumed by the UCX code base, and we would be reliant on the UCX community for core Open MPI functionality across several different Hence, by default, UCX is basically used for IB and nothing else. You can override this behavior by setting the opal_common_ucx_tls env variable to a comma-delimited list of UCX transports that the UCX PML will be allowed to use. This MCA param defaults to: rc_verbs,ud_verbs,rc_mlx5,dc_mlx5,ud_mlx5,cuda_ipc,rocm_ipc (you'll need to ask the UCX community what each of those do/are) -- Jeff Squyres jsquy...@cisco.com ________________________________ From: users <users-boun...@lists.open-mpi.org> on behalf of Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users <users@lists.open-mpi.org> Sent: Thursday, August 25, 2022 12:27 PM To: Tim Carlson <timothy.carl...@pnnl.gov> Cc: Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) <noam.bernst...@nrl.navy.mil>; Open MPI Users <users@lists.open-mpi.org> Subject: Re: [OMPI users] ucx problems Yeah, that appears to have been the issue - IB is entirely dead (it's a new machine, so maybe no subnet manager, or maybe a bad cable). I'll track that down, and follow up here if there's still an issue once the low level IB problem is fixed. However, given that ucx says it supports shared memory transports, I'm a bit surprised that it cannot operate (at least in OpenMPI) without IB active (it's a single node job). I added some print statements to common_ucx.c, and discovered that ucx knows about a few transports like posix and tcp, but OpenMPI never tries to use those, so it never finds a match. Is that expected from how OpenMPI tries to use ucx? thanks, Noam On Aug 25, 2022, at 12:10 PM, Tim Carlson <timothy.carl...@pnnl.gov<mailto:timothy.carl...@pnnl.gov>> wrote: And the output of ibstat ibhosts Is what? Maybe no subnet manager running?