Thanks. That /is/ one solution, and what I’ll do in the interim since this has 
to work in at least some fashion, but I would actually like to use UCX if 
OpenIB is going to be deprecated. How do I find out what’s actually wrong?

--
#BlackLivesMatter
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Jul 29, 2021, at 11:35 AM, Ralph Castain via users 
> <users@lists.open-mpi.org> wrote:
> 
> So it _is_ UCX that is the problem! Try using OMPI_MCA_pml=ob1 instead
> 
>> On Jul 29, 2021, at 8:33 AM, Ryan Novosielski <novos...@rutgers.edu> wrote:
>> 
>> Thanks, Ralph. This /does/ change things, but not very much. I was not under 
>> the impression that I needed to do that, since when I ran without having 
>> built against UCX, it warned me about the openib method being deprecated. By 
>> default, does OpenMPI not use either anymore, and I need to specifically 
>> call for UCX? Seems strange.
>> 
>> Anyhow, I’ve got some variables defined still, in addition to your 
>> suggestion, for verbosity:
>> 
>> [novosirj@amarel-test2 ~]$ env | grep ^OMPI
>> OMPI_MCA_pml=ucx
>> OMPI_MCA_opal_common_ucx_opal_mem_hooks=1
>> OMPI_MCA_pml_ucx_verbose=100
>> 
>> Here goes:
>> 
>> [novosirj@amarel-test2 ~]$ srun -n 2 --mpi=pmi2 -p oarc  --reservation=UCX 
>> ./mpihello-gcc-8-openmpi-4.0.6
>> srun: job 13995650 queued and waiting for resources
>> srun: job 13995650 has been allocated resources
>> --------------------------------------------------------------------------
>> WARNING: There was an error initializing an OpenFabrics device.
>> 
>> Local host:   gpu004
>> Local device: mlx4_0
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> WARNING: There was an error initializing an OpenFabrics device.
>> 
>> Local host:   gpu004
>> Local device: mlx4_0
>> --------------------------------------------------------------------------
>> [gpu004.amarel.rutgers.edu:29823] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using OPAL 
>> memory hooks as external events
>> [gpu004.amarel.rutgers.edu:29824] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using OPAL 
>> memory hooks as external events
>> [gpu004.amarel.rutgers.edu:29823] 
>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:197 
>> mca_pml_ucx_open: UCX version 1.5.2
>> [gpu004.amarel.rutgers.edu:29824] 
>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:197 
>> mca_pml_ucx_open: UCX version 1.5.2
>> [gpu004.amarel.rutgers.edu:29823] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 self/self: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29823] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 tcp/eno1: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29823] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 tcp/ib0: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29824] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 self/self: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29823] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>> rc/mlx4_0:1: did not match transport list
>> [gpu004.amarel.rutgers.edu:29823] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>> ud/mlx4_0:1: did not match transport list
>> [gpu004.amarel.rutgers.edu:29823] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 mm/sysv: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29823] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 mm/posix: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29823] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 cma/cma: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29823] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:311 support 
>> level is none
>> [gpu004.amarel.rutgers.edu:29824] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 tcp/eno1: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29824] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 tcp/ib0: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29824] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>> rc/mlx4_0:1: did not match transport list
>> --------------------------------------------------------------------------
>> No components were able to be opened in the pml framework.
>> 
>> This typically means that either no components of this type were
>> installed, or none of the installed components can be loaded.
>> Sometimes this means that shared libraries required by these
>> components are unable to be found/loaded.
>> 
>> Host:      gpu004
>> Framework: pml
>> --------------------------------------------------------------------------
>> [gpu004.amarel.rutgers.edu:29823] PML ucx cannot be selected
>> [gpu004.amarel.rutgers.edu:29824] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>> ud/mlx4_0:1: did not match transport list
>> [gpu004.amarel.rutgers.edu:29824] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 mm/sysv: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29824] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 mm/posix: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29824] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 cma/cma: 
>> did not match transport list
>> [gpu004.amarel.rutgers.edu:29824] 
>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:311 support 
>> level is none
>> --------------------------------------------------------------------------
>> No components were able to be opened in the pml framework.
>> 
>> This typically means that either no components of this type were
>> installed, or none of the installed components can be loaded.
>> Sometimes this means that shared libraries required by these
>> components are unable to be found/loaded.
>> 
>> Host:      gpu004
>> Framework: pml
>> --------------------------------------------------------------------------
>> [gpu004.amarel.rutgers.edu:29824] PML ucx cannot be selected
>> slurmstepd: error: *** STEP 13995650.0 ON gpu004 CANCELLED AT 
>> 2021-07-29T11:31:19 ***
>> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>> srun: error: gpu004: tasks 0-1: Exited with exit code 1
>> 
>> --
>> #BlackLivesMatter
>> ____
>> || \\UTGERS,          
>> |---------------------------*O*---------------------------
>> ||_// the State       |         Ryan Novosielski - novos...@rutgers.edu
>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>> ||  \\    of NJ       | Office of Advanced Research Computing - MSB C630, 
>> Newark
>>    `'
>> 
>>> On Jul 29, 2021, at 8:34 AM, Ralph Castain via users 
>>> <users@lists.open-mpi.org> wrote:
>>> 
>>> Ryan - I suspect what Sergey was trying to say was that you need to ensure 
>>> OMPI doesn't try to use the OpenIB driver, or at least that it doesn't 
>>> attempt to initialize it. Try adding
>>> 
>>> OMPI_MCA_pml=ucx
>>> 
>>> to your environment.
>>> 
>>> 
>>>> On Jul 29, 2021, at 1:56 AM, Sergey Oblomov via users 
>>>> <users@lists.open-mpi.org> wrote:
>>>> 
>>>> Hi
>>>> 
>>>> This issue arrives from BTL OpenIB, not related to UCX
>>>> 
>>>> From: users <users-boun...@lists.open-mpi.org> on behalf of Ryan 
>>>> Novosielski via users <users@lists.open-mpi.org>
>>>> Date: Thursday, 29 July 2021, 08:25
>>>> To: users@lists.open-mpi.org <users@lists.open-mpi.org>
>>>> Cc: Ryan Novosielski <novos...@rutgers.edu>
>>>> Subject: [OMPI users] OpenMPI 4.0.6 w/GCC 8.5 on CentOS 7.9; "WARNING: 
>>>> There was an error initializing an OpenFabrics device."
>>>> 
>>>> Hi there,
>>>> 
>>>> New to using UCX, as a result of having built OpenMPI without it and 
>>>> running tests and getting warned. Installed UCX from the distribution:
>>>> 
>>>> [novosirj@amarel-test2 ~]$ rpm -qa ucx
>>>> ucx-1.5.2-1.el7.x86_64
>>>> 
>>>> …and rebuilt OpenMPI. Built fine. However, I’m getting some pretty 
>>>> unhelpful messages about not using the IB card. I looked around the 
>>>> internet some and set a couple of environment variables to get a little 
>>>> more information:
>>>> 
>>>> OMPI_MCA_opal_common_ucx_opal_mem_hooks=1
>>>> export OMPI_MCA_pml_ucx_verbose=100
>>>> 
>>>> Here’s what happens:
>>>> 
>>>> [novosirj@amarel-test2 ~]$ srun -n 2 --mpi=pmi2 -p oarc  --reservation=UCX 
>>>> ./mpihello-gcc-8-openmpi-4.0.6 
>>>> srun: job 13993927 queued and waiting for resources
>>>> srun: job 13993927 has been allocated resources
>>>> --------------------------------------------------------------------------
>>>> WARNING: There was an error initializing an OpenFabrics device.
>>>> 
>>>> Local host:   gpu004
>>>> Local device: mlx4_0
>>>> --------------------------------------------------------------------------
>>>> --------------------------------------------------------------------------
>>>> WARNING: There was an error initializing an OpenFabrics device.
>>>> 
>>>> Local host:   gpu004
>>>> Local device: mlx4_0
>>>> --------------------------------------------------------------------------
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using 
>>>> OPAL memory hooks as external events
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:197 
>>>> mca_pml_ucx_open: UCX version 1.5.2
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using 
>>>> OPAL memory hooks as external events
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:197 
>>>> mca_pml_ucx_open: UCX version 1.5.2
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>> self/self: did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>> tcp/eno1: did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>> self/self: did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 tcp/ib0: 
>>>> did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>> rc/mlx4_0:1: did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>> ud/mlx4_0:1: did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 mm/sysv: 
>>>> did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>> mm/posix: did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 cma/cma: 
>>>> did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:311 support 
>>>> level is none
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:268 
>>>> mca_pml_ucx_close
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>> tcp/eno1: did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 tcp/ib0: 
>>>> did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>> rc/mlx4_0:1: did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>> ud/mlx4_0:1: did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 mm/sysv: 
>>>> did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 
>>>> mm/posix: did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:304 cma/cma: 
>>>> did not match transport list
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:311 support 
>>>> level is none
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/ompi/mca/pml/ucx/pml_ucx.c:268 
>>>> mca_pml_ucx_close
>>>> [gpu004.amarel.rutgers.edu:02326] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using 
>>>> OPAL memory hooks as external events
>>>> [gpu004.amarel.rutgers.edu:02327] 
>>>> ../../../../../openmpi-4.0.6/opal/mca/common/ucx/common_ucx.c:147 using 
>>>> OPAL memory hooks as external events
>>>> Hello world from processor gpu004.amarel.rutgers.edu, rank 0 out of 2 
>>>> processors
>>>> Hello world from processor gpu004.amarel.rutgers.edu, rank 1 out of 2 
>>>> processors
>>>> 
>>>> Here’s the output of a couple more commands that seem to be recommended 
>>>> when looking into this:
>>>> 
>>>> [novosirj@gpu004 ~]$ ucx_info -d
>>>> #
>>>> # Memory domain: self
>>>> #            component: self
>>>> #             register: unlimited, cost: 0 nsec
>>>> #           remote key: 8 bytes
>>>> #
>>>> #   Transport: self
>>>> #
>>>> #   Device: self
>>>> #
>>>> #      capabilities:
>>>> #            bandwidth: 6911.00 MB/sec
>>>> #              latency: 0 nsec
>>>> #             overhead: 10 nsec
>>>> #            put_short: <= 4294967295
>>>> #            put_bcopy: unlimited
>>>> #            get_bcopy: unlimited
>>>> #             am_short: <= 8k
>>>> #             am_bcopy: <= 8k
>>>> #               domain: cpu
>>>> #           atomic_add: 32, 64 bit
>>>> #           atomic_and: 32, 64 bit
>>>> #            atomic_or: 32, 64 bit
>>>> #           atomic_xor: 32, 64 bit
>>>> #          atomic_fadd: 32, 64 bit
>>>> #          atomic_fand: 32, 64 bit
>>>> #           atomic_for: 32, 64 bit
>>>> #          atomic_fxor: 32, 64 bit
>>>> #          atomic_swap: 32, 64 bit
>>>> #         atomic_cswap: 32, 64 bit
>>>> #           connection: to iface
>>>> #             priority: 0
>>>> #       device address: 0 bytes
>>>> #        iface address: 8 bytes
>>>> #       error handling: none
>>>> #
>>>> #
>>>> # Memory domain: tcp
>>>> #            component: tcp
>>>> #
>>>> #   Transport: tcp
>>>> #
>>>> #   Device: eno1
>>>> #
>>>> #      capabilities:
>>>> #            bandwidth: 113.16 MB/sec
>>>> #              latency: 5776 nsec
>>>> #             overhead: 50000 nsec
>>>> #             am_bcopy: <= 8k
>>>> #           connection: to iface
>>>> #             priority: 1
>>>> #       device address: 4 bytes
>>>> #        iface address: 2 bytes
>>>> #       error handling: none
>>>> #
>>>> #   Device: ib0
>>>> #
>>>> #      capabilities:
>>>> #            bandwidth: 6239.81 MB/sec
>>>> #              latency: 5210 nsec
>>>> #             overhead: 50000 nsec
>>>> #             am_bcopy: <= 8k
>>>> #           connection: to iface
>>>> #             priority: 1
>>>> #       device address: 4 bytes
>>>> #        iface address: 2 bytes
>>>> #       error handling: none
>>>> #
>>>> #
>>>> # Memory domain: ib/mlx4_0
>>>> #            component: ib
>>>> #             register: unlimited, cost: 90 nsec
>>>> #           remote key: 16 bytes
>>>> #           local memory handle is required for zcopy
>>>> #
>>>> #   Transport: rc
>>>> #
>>>> #   Device: mlx4_0:1
>>>> #
>>>> #      capabilities:
>>>> #            bandwidth: 6433.22 MB/sec
>>>> #              latency: 900 nsec + 1 * N
>>>> #             overhead: 75 nsec
>>>> #            put_short: <= 88
>>>> #            put_bcopy: <= 8k
>>>> #            put_zcopy: <= 1g, up to 6 iov
>>>> #  put_opt_zcopy_align: <= 512
>>>> #        put_align_mtu: <= 2k
>>>> #            get_bcopy: <= 8k
>>>> #            get_zcopy: 33..1g, up to 6 iov
>>>> #  get_opt_zcopy_align: <= 512
>>>> #        get_align_mtu: <= 2k
>>>> #             am_short: <= 87
>>>> #             am_bcopy: <= 8191
>>>> #             am_zcopy: <= 8191, up to 5 iov
>>>> #   am_opt_zcopy_align: <= 512
>>>> #         am_align_mtu: <= 2k
>>>> #            am header: <= 127
>>>> #               domain: device
>>>> #           connection: to ep
>>>> #             priority: 10
>>>> #       device address: 3 bytes
>>>> #           ep address: 4 bytes
>>>> #       error handling: peer failure
>>>> #
>>>> #
>>>> #   Transport: ud
>>>> #
>>>> #   Device: mlx4_0:1
>>>> #
>>>> #      capabilities:
>>>> #            bandwidth: 6433.22 MB/sec
>>>> #              latency: 910 nsec
>>>> #             overhead: 105 nsec
>>>> #             am_short: <= 172
>>>> #             am_bcopy: <= 4088
>>>> #             am_zcopy: <= 4088, up to 7 iov
>>>> #   am_opt_zcopy_align: <= 512
>>>> #         am_align_mtu: <= 4k
>>>> #            am header: <= 3984
>>>> #           connection: to ep, to iface
>>>> #             priority: 10
>>>> #       device address: 3 bytes
>>>> #        iface address: 3 bytes
>>>> #           ep address: 6 bytes
>>>> #       error handling: peer failure
>>>> #
>>>> #
>>>> # Memory domain: rdmacm
>>>> #            component: rdmacm
>>>> #           supports client-server connection establishment via sockaddr
>>>> #   < no supported devices found >
>>>> #
>>>> # Memory domain: sysv
>>>> #            component: sysv
>>>> #             allocate: unlimited
>>>> #           remote key: 32 bytes
>>>> #
>>>> #   Transport: mm
>>>> #
>>>> #   Device: sysv
>>>> #
>>>> #      capabilities:
>>>> #            bandwidth: 6911.00 MB/sec
>>>> #              latency: 80 nsec
>>>> #             overhead: 10 nsec
>>>> #            put_short: <= 4294967295
>>>> #            put_bcopy: unlimited
>>>> #            get_bcopy: unlimited
>>>> #             am_short: <= 92
>>>> #             am_bcopy: <= 8k
>>>> #               domain: cpu
>>>> #           atomic_add: 32, 64 bit
>>>> #           atomic_and: 32, 64 bit
>>>> #            atomic_or: 32, 64 bit
>>>> #           atomic_xor: 32, 64 bit
>>>> #          atomic_fadd: 32, 64 bit
>>>> #          atomic_fand: 32, 64 bit
>>>> #           atomic_for: 32, 64 bit
>>>> #          atomic_fxor: 32, 64 bit
>>>> #          atomic_swap: 32, 64 bit
>>>> #         atomic_cswap: 32, 64 bit
>>>> #           connection: to iface
>>>> #             priority: 0
>>>> #       device address: 8 bytes
>>>> #        iface address: 16 bytes
>>>> #       error handling: none
>>>> #
>>>> #
>>>> # Memory domain: posix
>>>> #            component: posix
>>>> #             allocate: unlimited
>>>> #           remote key: 37 bytes
>>>> #
>>>> #   Transport: mm
>>>> #
>>>> #   Device: posix
>>>> #
>>>> #      capabilities:
>>>> #            bandwidth: 6911.00 MB/sec
>>>> #              latency: 80 nsec
>>>> #             overhead: 10 nsec
>>>> #            put_short: <= 4294967295
>>>> #            put_bcopy: unlimited
>>>> #            get_bcopy: unlimited
>>>> #             am_short: <= 92
>>>> #             am_bcopy: <= 8k
>>>> #               domain: cpu
>>>> #           atomic_add: 32, 64 bit
>>>> #           atomic_and: 32, 64 bit
>>>> #            atomic_or: 32, 64 bit
>>>> #           atomic_xor: 32, 64 bit
>>>> #          atomic_fadd: 32, 64 bit
>>>> #          atomic_fand: 32, 64 bit
>>>> #           atomic_for: 32, 64 bit
>>>> #          atomic_fxor: 32, 64 bit
>>>> #          atomic_swap: 32, 64 bit
>>>> #         atomic_cswap: 32, 64 bit
>>>> #           connection: to iface
>>>> #             priority: 0
>>>> #       device address: 8 bytes
>>>> #        iface address: 16 bytes
>>>> #       error handling: none
>>>> #
>>>> #
>>>> # Memory domain: cma
>>>> #            component: cma
>>>> #             register: unlimited, cost: 9 nsec
>>>> #
>>>> #   Transport: cma
>>>> #
>>>> #   Device: cma
>>>> #
>>>> #      capabilities:
>>>> #            bandwidth: 11145.00 MB/sec
>>>> #              latency: 80 nsec
>>>> #             overhead: 400 nsec
>>>> #            put_zcopy: unlimited, up to 16 iov
>>>> #  put_opt_zcopy_align: <= 1
>>>> #        put_align_mtu: <= 1
>>>> #            get_zcopy: unlimited, up to 16 iov
>>>> #  get_opt_zcopy_align: <= 1
>>>> #        get_align_mtu: <= 1
>>>> #           connection: to iface
>>>> #             priority: 0
>>>> #       device address: 8 bytes
>>>> #        iface address: 4 bytes
>>>> #       error handling: none
>>>> #
>>>> 
>>>> [novosirj@gpu004 ~]$ ucx_info -p -u t
>>>> #
>>>> # UCP context
>>>> #
>>>> #            md 0  :  self
>>>> #            md 1  :  tcp
>>>> #            md 2  :  ib/mlx4_0
>>>> #            md 3  :  rdmacm
>>>> #            md 4  :  sysv
>>>> #            md 5  :  posix
>>>> #            md 6  :  cma
>>>> #
>>>> #      resource 0  :  md 0  dev 0  flags -- self/self
>>>> #      resource 1  :  md 1  dev 1  flags -- tcp/eno1
>>>> #      resource 2  :  md 1  dev 2  flags -- tcp/ib0
>>>> #      resource 3  :  md 2  dev 3  flags -- rc/mlx4_0:1
>>>> #      resource 4  :  md 2  dev 3  flags -- ud/mlx4_0:1
>>>> #      resource 5  :  md 3  dev 4  flags -s rdmacm/sockaddr
>>>> #      resource 6  :  md 4  dev 5  flags -- mm/sysv
>>>> #      resource 7  :  md 5  dev 6  flags -- mm/posix
>>>> #      resource 8  :  md 6  dev 7  flags -- cma/cma
>>>> #
>>>> # memory: 0.84MB, file descriptors: 2
>>>> # create time: 5.032 ms
>>>> #
>>>> 
>>>> Thanks for any help you can offer. What am I missing?
>>>> 
>>>> --
>>>> #BlackLivesMatter
>>>> ____
>>>> || \\UTGERS,      
>>>> |---------------------------*O*---------------------------
>>>> ||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
>>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>>>> ||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
>>>>   `'
>>>> 
>>> 
>> 
> 
> 

Reply via email to