Yeah, that appears to have been the issue - IB is entirely dead (it's a new 
machine, so maybe no subnet manager, or maybe a bad cable). I'll track that 
down, and follow up here if there's still an issue once the low level IB 
problem is fixed.

However, given that ucx says it supports shared memory transports, I'm a bit 
surprised that it cannot operate (at least in OpenMPI) without IB active (it's 
a single node job).  I added some print statements to common_ucx.c, and 
discovered that ucx knows about a few transports like posix and tcp, but 
OpenMPI never tries to use those, so it never finds a match.  Is that expected 
from how OpenMPI tries to use ucx?

thanks,
Noam

On Aug 25, 2022, at 12:10 PM, Tim Carlson 
<timothy.carl...@pnnl.gov<mailto:timothy.carl...@pnnl.gov>> wrote:

And the output of

ibstat
ibhosts

Is what? Maybe no subnet manager running?

  • [OMPI use... Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users
    • Re: ... Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users
      • ... Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users
        • ... Jeff Squyres (jsquyres) via users

Reply via email to