Yeah, that appears to have been the issue - IB is entirely dead (it's a new machine, so maybe no subnet manager, or maybe a bad cable). I'll track that down, and follow up here if there's still an issue once the low level IB problem is fixed.
However, given that ucx says it supports shared memory transports, I'm a bit surprised that it cannot operate (at least in OpenMPI) without IB active (it's a single node job). I added some print statements to common_ucx.c, and discovered that ucx knows about a few transports like posix and tcp, but OpenMPI never tries to use those, so it never finds a match. Is that expected from how OpenMPI tries to use ucx? thanks, Noam On Aug 25, 2022, at 12:10 PM, Tim Carlson <timothy.carl...@pnnl.gov<mailto:timothy.carl...@pnnl.gov>> wrote: And the output of ibstat ibhosts Is what? Maybe no subnet manager running?