We are upgrading a cluster from RHEL6 to RHEL8, and have migrated some
nodes to a new partition and reimaged with RHEL8.  I am having some issues
getting openmpi to work with infiniband on the nodes upgraded to RHEL8.

For testing purposes, I am trying to run a simple MPI "hello world" code on
the local RHEL8 host (obviously, also having issues on multiple nodes, but
trying to simplify).

If I run with BTL set to vader,self or tcp,self on command line, the MPI
code
runs as expected.  If I set to openib,self (or leave unset), the job just
hangs indefinitely, e.g.
bash> mpirun -H localhost -v --mca mpi_cuda_support 0 --mca
btl_openib_verbose 1 --mca btl openib,self -n 1 --show-progress -d
--debug-daemons ./hello-world-mpi
[compute-a20-3.XXX.YYY.ZZZ:30383] procdir:
/tmp/ompi.compute-a20-3.34676/pid.30383/0/0
[compute-a20-3.XXX.YYY.ZZZ:30383] jobdir:
/tmp/ompi.compute-a20-3.34676/pid.30383/0
[compute-a20-3.XXX.YYY.ZZZ:30383] top:
/tmp/ompi.compute-a20-3.34676/pid.30383
[compute-a20-3.XXX.YYY.ZZZ:30383] top: /tmp/ompi.compute-a20-3.34676
[compute-a20-3.XXX.YYY.ZZZ:30383] tmp: /tmp
[compute-a20-3.XXX.YYY.ZZZ:30383] sess_dir_cleanup: job session dir does
not exist
[compute-a20-3.XXX.YYY.ZZZ:30383] sess_dir_cleanup: top session dir not
empty - leaving
[compute-a20-3.XXX.YYY.ZZZ:30383] procdir:
/tmp/ompi.compute-a20-3.34676/pid.30383/0/0
[compute-a20-3.XXX.YYY.ZZZ:30383] jobdir:
/tmp/ompi.compute-a20-3.34676/pid.30383/0
[compute-a20-3.XXX.YYY.ZZZ:30383] top:
/tmp/ompi.compute-a20-3.34676/pid.30383
[compute-a20-3.XXX.YYY.ZZZ:30383] top: /tmp/ompi.compute-a20-3.34676
[compute-a20-3.XXX.YYY.ZZZ:30383] tmp: /tmp
[compute-a20-3.XXX.YYY.ZZZ:30383] [[29315,0],0] orted_cmd: received
add_local_procs
[compute-a20-3.XXX.YYY.ZZZ:30383] [[29315,0],0] Releasing job data for
[INVALID]
App launch reported: 1 (out of 1) daemons - 0 (out of 1) procs
  MPIR_being_debugged = 0
  MPIR_debug_state = 1
  MPIR_partial_attach_ok = 1
  MPIR_i_am_starter = 0
  MPIR_forward_output = 0
  MPIR_proctable_size = 1
  MPIR_proctable:
    (i, host, exe, pid) = (0, compute-a20-3,
/software/hello-world/1.0/gcc/8.4.0/openmpi/3.1.5/linux-rhel8-x86_64/bin/./hello-world-mpi,
30387)
MPIR_executable_path: NULL
MPIR_server_arguments: NULL
[compute-a20-3.XXX.YYY.ZZZ:30387] procdir:
/tmp/ompi.compute-a20-3.34676/pid.30383/1/0
[compute-a20-3.XXX.YYY.ZZZ:30387] jobdir:
/tmp/ompi.compute-a20-3.34676/pid.30383/1
[compute-a20-3.XXX.YYY.ZZZ:30387] top:
/tmp/ompi.compute-a20-3.34676/pid.30383
[compute-a20-3.XXX.YYY.ZZZ:30387] top: /tmp/ompi.compute-a20-3.34676
[compute-a20-3.XXX.YYY.ZZZ:30387] tmp: /tmp
[compute-a20-3][[29315,1],0][btl_openib_ini.c:172:opal_btl_openib_ini_query]
Querying INI files for vendor 0x02c9, part ID 4099
[compute-a20-3][[29315,1],0][btl_openib_ini.c:188:opal_btl_openib_ini_query]
Found corresponding INI values: Mellanox Hermon
[compute-a20-3][[29315,1],0][btl_openib_ini.c:172:opal_btl_openib_ini_query]
Querying INI files for vendor 0x0000, part ID 0
[compute-a20-3][[29315,1],0][btl_openib_ini.c:188:opal_btl_openib_ini_query]
Found corresponding INI values: default

At this point the code just hangs indefinitely.  I see a PID 30387 named
hello-world-mpi with 3 threads,, which is consuming ~100% of a CPU core but
strace just shows doing epool_wait calls.

The "Releasing job data for [INVALID]" looks suspicious, but looking at
source code I think that is just because I am running outside of a
scheduler so no job number.  I suspect the problem is the 0 in the line
App launch reported: 1 (out of 1) daemons - 0 (out of 1) procs
but I am at a loss as to why or how to fix it.

I can run the same example above on one of the nodes still at RHEL6
(compiled for the OpenMPI we have on that system) and it works as expected.

I am able to run ibv_tc_pingpong between nodes (both between a pair of
RHEL8 nodes, a pair of RHEL6 nodes, and mixed (one RHEL6 and one RHEL8, and
of course within the same node), so I do not see any obvious Infiniband
issues.

If anyone could give suggestions/tips/ideas on how to proceed/diagnose/fix
this issue would be grateful.  Thanks in advance for any suggestions.

================================================
System/etc details
================================================

The issue is occurring on RHEL8 system, specifically 8.1 with kernel
4.18.0-147.5.1.el8_1.x86_64
running OpenMPI 3.1.5 (built with gcc 8.4.0 using spack)

The issue is in the openib BTL (vader and tcp BTLs seem to be working) and
is using OpenFabrics from Mellanox
(libibverbs-41mlnx1-OFED.5.0.0.0.9.50100.0.src.rpm)
We are using a subnet manager running on a Mellanox FDR IB switch
(SX_PPC_M460EX)

The "working" RHEL6 system are running 6.10, kernel
2.6.32-754.25.1.el6.x86_64, with OpenMPI 1.10.2 built with gcc 6.1.0)

The memorylocked limit on both RHEL8 and RHEL6 is unlimited.

On the RHEL8 node, ibv_devinfo returns:
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.32.5100
        node_guid:                      f452:1403:0070:1c80
        sys_image_guid:                 f452:1403:0070:1c83
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x1
        board_id:                       DEL0A30000019
        phys_port_cnt:                  1
        Device ports:
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 532
                        port_lid:               536
                        port_lmc:               0x00
                        link_layer:             InfiniBand

(The "working" RHEL6 system basically has an identical result from
ibv_devinfo,
with exception of different values node_guid, sys_image_guid, and port_lid.)

The results of ompi-info --all on the RHEL8 node is attached.  As indicated
earlier, I am running on the same node as the mpirun command is issued.

The result of ifconfig -a on the RHEL8 node is:
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.103.132.13  netmask 255.255.224.0  broadcast 10.103.159.255
        inet6 fe80::3617:ebff:fee6:6a31  prefixlen 64  scopeid 0x20<link>
        ether 34:17:eb:e6:6a:31  txqueuelen 1000  (Ethernet)
        RX packets 1599943  bytes 345477382 (329.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2147871  bytes 3010964444 (2.8 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x91120000-9113ffff

eno2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 34:17:eb:e6:6a:32  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x91100000-9111ffff

ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 2044
        inet 192.168.68.13  netmask 255.255.224.0  broadcast 192.168.95.255
        inet6 fe80::f652:1403:70:1c81  prefixlen 64  scopeid 0x20<link>
Infiniband hardware address can be incorrect! Please read BUGS section in
ifconfig(8).
        infiniband
A0:00:02:20:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256
 (InfiniBand)
        RX packets 49701  bytes 45121502 (43.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 25427  bytes 5740480 (5.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 476287  bytes 23889166 (22.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 476287  bytes 23889166 (22.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0




 --
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads        paye...@umd.edu
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831

Attachment: ompi-info.all.a20-3.bz2
Description: application/bzip

Reply via email to