Hello, All ! We have a problem with OpenMPI version 1.10.2 on a cluster with newly installed Mellanox InfiniBand adapters. OpenMPI was re-configured and re-compiled using: --with-verbs --with-verbs-libdir=/usr/lib
And our test MPI task returns proper results but it seems OpenMPI continues to use existing 1Gbit Ethernet network instead of InfiniBand. An output file contains these lines: -------------------------------------------------------------------------- No OpenFabrics connection schemes reported that they were able to be used on a specific port. As such, the openib BTL (OpenFabrics support) will be disabled for this port. Local host: node1 Local device: mlx4_0 Local port: 1 CPCs attempted: rdmacm, udcm -------------------------------------------------------------------------- InfiniBand network itself seems to be working: $ ibstat mlx4_0 shows: CA 'mlx4_0' CA type: MT4099 Number of ports: 1 Firmware version: 2.35.5100 Hardware version: 0 Node GUID: 0x7cfe900300bddec0 System image GUID: 0x7cfe900300bddec3 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 3 LMC: 0 SM lid: 3 Capability mask: 0x0251486a Port GUID: 0x7cfe900300bddec1 Link layer: InfiniBand ibping also works. ibnetdiscover shows the correct topology of IB network. Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not installed). Is it enough for OpenMPI to have RDMA only or IPoIB should also be installed? What else can be checked? Thanks a lot for any help!
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users