I've been doing a lot of research on this issue (See my next e-mail on this topic which I'll be posting ina  few minutes), and OpenMPI will use ibverbs or UCX. In OpenMPI 4.0 and later, ibverbs is deprecated in favor of UCX.

Prentice

On 7/27/20 7:49 PM, gil...@rist.or.jp wrote:
Prentice,

ibverbs might be used by UCX (either pml/ucx or btl/uct),
so to be 100% sure, you should

mpirun --mca pml ob1 --mca btl ^openib,uct ...

in order to force btl/tcp, you need to ensure pml/ob1 is used,
and then you always need the btl/self component

mpirun --mca pml ob1 --mca btl tcp,self ...

Cheers,

Gilles

----- Original Message -----
Can anyone explain why my job still calls libibverbs when I run it
with
'-mca btl ^openib'?

If I instead use '-mca btl tcp', my jobs don't segfault. I would assum
'mca btl ^openib' and '-mca btl tcp' to essentially be equivalent, but
there's obviously a difference in the two.

Prentice

On 7/23/20 3:34 PM, Prentice Bisbal wrote:
I manage a cluster that is very heterogeneous. Some nodes have
InfiniBand, while others have 10 Gb/s Ethernet. We recently upgraded
to CentOS 7, and built a new software stack for CentOS 7. We are
using
OpenMPI 4.0.3, and we are using Slurm 19.05.5 as our job scheduler.

We just noticed that when jobs are sent to the nodes with IB, the
segfault immediately, with the segfault appearing to come from
libibverbs.so. This is what I see in the stderr output for one of
these failed jobs:

srun: error: greene021: tasks 0-3: Segmentation fault

And here is what I see in the log messages of the compute node where
that segfault happened:

Jul 23 15:19:41 greene021 kernel: mpihello[7911]: segfault at
7f0635f38910 ip 00007f0635f49405 sp 00007ffe354485a0 error 4
Jul 23 15:19:41 greene021 kernel: mpihello[7912]: segfault at
7f23d51ea910 ip 00007f23d51fb405 sp 00007ffef250a9a0 error 4
Jul 23 15:19:41 greene021 kernel: in
libibverbs.so.1.5.22.4[7f23d51ec000+18000]
Jul 23 15:19:41 greene021 kernel:
Jul 23 15:19:41 greene021 kernel: mpihello[7909]: segfault at
7ff504ba5910 ip 00007ff504bb6405 sp 00007ffff917ccb0 error 4
Jul 23 15:19:41 greene021 kernel: in
libibverbs.so.1.5.22.4[7ff504ba7000+18000]
Jul 23 15:19:41 greene021 kernel:
Jul 23 15:19:41 greene021 kernel: mpihello[7910]: segfault at
7fa58abc5910 ip 00007fa58abd6405 sp 00007ffdde50c0d0 error 4
Jul 23 15:19:41 greene021 kernel: in
libibverbs.so.1.5.22.4[7fa58abc7000+18000]
Jul 23 15:19:41 greene021 kernel:
Jul 23 15:19:41 greene021 kernel: in
libibverbs.so.1.5.22.4[7f0635f3a000+18000]
Jul 23 15:19:41 greene021 kernel

Any idea what is going on here, or how to debug further? I've been
using OpenMPI for years, and it usually just works.

I normally start my job with srun like this:

srun ./mpihello

But even if I try to take IB out of the equation by starting the job
like this:

mpirun -mca btl ^openib ./mpihello

I still get a segfault issue, although the message to stderr is now
a
little different:

--------------------------------------------------------------------
------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------
------
--------------------------------------------------------------------
------
mpirun noticed that process rank 1 with PID 8502 on node greene021
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------
------

The segfaults happens immediately. It seems to happen as soon as
MPI_Init() is called. The program I'm running is very simple MPI
"Hello world!" program.

The output of  ompi_info is below my signature, in case that helps.

Prentice

$ ompi_info
                  Package: Open MPI u...@host.example.com
Distribution
                 Open MPI: 4.0.3
   Open MPI repo revision: v4.0.3
    Open MPI release date: Mar 03, 2020
                 Open RTE: 4.0.3
   Open RTE repo revision: v4.0.3
    Open RTE release date: Mar 03, 2020
                     OPAL: 4.0.3
       OPAL repo revision: v4.0.3
        OPAL release date: Mar 03, 2020
                  MPI API: 3.1.0
             Ident string: 4.0.3
                   Prefix: /usr/pppl/gcc/9.3-pkgs/openmpi-4.0.3
  Configured architecture: x86_64-unknown-linux-gnu
           Configure host: dawson027.pppl.gov
            Configured by: lglant
            Configured on: Mon Jun  1 12:37:07 EDT 2020
           Configure host: dawson027.pppl.gov
   Configure command line: '--prefix=/usr/pppl/gcc/9.3-pkgs/openmpi-4.
0.3'
                           '--with-ucx' '--with-verbs' '--with-
libfabric'
                           '--with-libevent=/usr'
'--with-libevent-libdir=/usr/lib64'
'--with-pmix=/usr/pppl/pmix/3.1.5' '--with-pmi'
                 Built by: lglant
                 Built on: Mon Jun  1 13:05:40 EDT 2020
               Built host: dawson027.pppl.gov
               C bindings: yes
             C++ bindings: no
              Fort mpif.h: yes (all)
             Fort use mpi: yes (full: ignore TKR)
        Fort use mpi size: deprecated-ompi-info-value
         Fort use mpi_f08: yes
  Fort mpi_f08 compliance: The mpi_f08 module is available, but due
to
                           limitations in the gfortran compiler and/
or
Open
                           MPI, does not support the following: array
                           subsections, direct passthru (where
possible) to
                           underlying Open MPI's C functionality
   Fort mpi_f08 subarrays: no
            Java bindings: no
   Wrapper compiler rpath: runpath
               C compiler: gcc
      C compiler absolute: /usr/pppl/gcc/9.3.0/bin/gcc
   C compiler family name: GNU
       C compiler version: 9.3.0
             C++ compiler: g++
    C++ compiler absolute: /usr/pppl/gcc/9.3.0/bin/g++
            Fort compiler: gfortran
        Fort compiler abs: /usr/pppl/gcc/9.3.0/bin/gfortran
          Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
    Fort 08 assumed shape: yes
       Fort optional args: yes
           Fort INTERFACE: yes
     Fort ISO_FORTRAN_ENV: yes
        Fort STORAGE_SIZE: yes
       Fort BIND(C) (all): yes
       Fort ISO_C_BINDING: yes
  Fort SUBROUTINE BIND(C): yes
        Fort TYPE,BIND(C): yes
  Fort T,BIND(C,name="a"): yes
             Fort PRIVATE: yes
           Fort PROTECTED: yes
            Fort ABSTRACT: yes
        Fort ASYNCHRONOUS: yes
           Fort PROCEDURE: yes
          Fort USE...ONLY: yes
            Fort C_FUNLOC: yes
  Fort f08 using wrappers: yes
          Fort MPI_SIZEOF: yes
              C profiling: yes
            C++ profiling: no
    Fort mpif.h profiling: yes
   Fort use mpi profiling: yes
    Fort use mpi_f08 prof: yes
           C++ exceptions: no
           Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL
support: yes,
                           OMPI progress: no, ORTE progress: yes,
Event
lib:
                           yes)
            Sparse Groups: no
   Internal debug support: no
   MPI interface warnings: yes
      MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
               dl support: yes
    Heterogeneous support: no
  mpirun default --prefix: no
        MPI_WTIME support: native
      Symbol vis. support: yes
    Host topology support: yes
             IPv6 support: no
       MPI1 compatibility: no
           MPI extensions: affinity, cuda, pcollreq
    FT Checkpoint support: no (checkpoint thread: no)
    C/R Enabled Debugging: no
   MPI_MAX_PROCESSOR_NAME: 256
     MPI_MAX_ERROR_STRING: 256
      MPI_MAX_OBJECT_NAME: 64
         MPI_MAX_INFO_KEY: 36
         MPI_MAX_INFO_VAL: 256
        MPI_MAX_PORT_NAME: 1024
   MPI_MAX_DATAREP_STRING: 128
            MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
            MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
            MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0,
Component
v4.0.3)
                  MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.
0.3)
                  MCA btl: uct (MCA v2.1.0, API v3.1.0, Component v4.
0.3)
                  MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.
0.3)
                  MCA btl: usnic (MCA v2.1.0, API v3.1.0, Component
v4.0.3)
                  MCA btl: vader (MCA v2.1.0, API v3.1.0, Component
v4.0.3)
                  MCA btl: openib (MCA v2.1.0, API v3.1.0, Component
v4.0.3)
             MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
             MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                  MCA crs: none (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                   MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component
v4.0.3)
                MCA event: external (MCA v2.1.0, API v2.0.0,
Component
v4.0.3)
                MCA hwloc: hwloc201 (MCA v2.1.0, API v2.0.0,
Component
v4.0.3)
                   MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
                   MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
          MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
          MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
               MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                MCA mpool: hugepage (MCA v2.1.0, API v3.0.0,
Component
v4.0.3)
              MCA patcher: overwrite (MCA v2.1.0, API v1.0.0,
Component
                           v4.0.3)
                 MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                 MCA pmix: isolated (MCA v2.1.0, API v2.0.0,
Component
v4.0.3)
                 MCA pmix: s2 (MCA v2.1.0, API v2.0.0, Component v4.0.
3)
                 MCA pmix: ext3x (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                 MCA pmix: s1 (MCA v2.1.0, API v2.0.0, Component v4.0.
3)
                 MCA pmix: pmix3x (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
               MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component
v4.0.3)
            MCA reachable: weighted (MCA v2.1.0, API v2.0.0,
Component
v4.0.3)
                MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                MCA timer: linux (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
               MCA errmgr: default_app (MCA v2.1.0, API v3.0.0,
Component
                           v4.0.3)
               MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0,
Component
                           v4.0.3)
               MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0,
Component
                           v4.0.3)
               MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0,
Component
                           v4.0.3)
                  MCA ess: env (MCA v2.1.0, API v3.0.0, Component v4.
0.3)
                  MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v4.
0.3)
                  MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v4.
0.3)
                  MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component
v4.0.3)
                  MCA ess: singleton (MCA v2.1.0, API v3.0.0,
Component
                           v4.0.3)
                  MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v4.
0.3)
                MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
              MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component
v4.0.3)
                  MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                  MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                  MCA iof: orted (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                 MCA odls: pspawn (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                 MCA odls: default (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                  MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                  MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                  MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                  MCA plm: isolated (MCA v2.1.0, API v2.0.0,
Component
v4.0.3)
                  MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                  MCA ras: simulator (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
                 MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v4.
0.3)
                 MCA regx: naive (MCA v2.1.0, API v1.0.0, Component
v4.0.3)
                 MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component
v4.0.3)
                MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
                MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                MCA rmaps: resilient (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
                MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
                MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                  MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v4.
0.3)
               MCA routed: direct (MCA v2.1.0, API v3.0.0, Component
v4.0.3)
               MCA routed: binomial (MCA v2.1.0, API v3.0.0,
Component
v4.0.3)
               MCA routed: radix (MCA v2.1.0, API v3.0.0, Component
v4.0.3)
                  MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component
v4.0.3)
               MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component
v4.0.3)
               MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v4.
0.3)
               MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v4.
0.3)
               MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v4.
0.3)
                MCA state: orted (MCA v2.1.0, API v1.0.0, Component
v4.0.3)
                MCA state: novm (MCA v2.1.0, API v1.0.0, Component v4.
0.3)
                MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v4.
0.3)
                MCA state: app (MCA v2.1.0, API v1.0.0, Component v4.
0.3)
                MCA state: tool (MCA v2.1.0, API v1.0.0, Component v4.
0.3)
                  MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v4.0.
3)
                 MCA coll: self (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                 MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v4.0.
3)
                 MCA coll: inter (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                 MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                 MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                 MCA coll: monitoring (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
                 MCA coll: basic (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                 MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                 MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
                MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
                MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                MCA fcoll: individual (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
                   MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                   MCA io: romio321 (MCA v2.1.0, API v2.0.0,
Component
v4.0.3)
                   MCA io: ompio (MCA v2.1.0, API v2.0.0, Component
v4.0.3)
                  MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                  MCA mtl: psm (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                  MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.
0.3)
                  MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component
v4.0.3)
                  MCA osc: monitoring (MCA v2.1.0, API v3.0.0,
Component
                           v4.0.3)
                  MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v4.0.
3)
                  MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v4.
0.3)
                  MCA pml: v (MCA v2.1.0, API v2.0.0, Component v4.0.
3)
                  MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                  MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
                  MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v4.0.
3)
                  MCA pml: monitoring (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
                  MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v4.
0.3)
             MCA sharedfp: individual (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
             MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v4.0.
3)
             MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)
                 MCA topo: treematch (MCA v2.1.0, API v2.2.0,
Component
                           v4.0.3)
                 MCA topo: basic (MCA v2.1.0, API v2.2.0, Component
v4.0.3)
            MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0,
Component
                           v4.0.3)


--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov


--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov

Reply via email to