We're using OpenMPI 4.1.1, CUDA aware on RHEL 8 cluster that we load as a
module with Infiniband controller Mellanox Technologies MT28908 Family
ConnectX-6, we see this warning runnig mpirun without any MCA
options/parameters:
WARNING: There was an error initializing an OpenFabrics device.
  Local host:   xxxx
  Local device: mlx5_0
---------------------------------------------

I did add 0x02c9 to our mca-btl-openib-device-params.ini file for the
Mellanox ConnectX6 stanza as we were getting the following warning that no
longer appears:

WARNING: No preset parameters were found for the device that Open MPI detected:

  Local host:            xxxx
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4123

Which I found is referenced in these comments
<https://accserv.classe.cornell.edu/svn/packages/openmpi/opal/mca/btl/openib/mca-btl-openib-device-params.ini>
:

# Note: Several vendors resell Mellanox hardware and put their own firmware
# on the cards, therefore overriding the default Mellanox vendor ID.
#
#     Mellanox      0x02c9

Running  ompi_info --param btl all we have:
MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.1.1)
MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.1.1)
MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.1.1)
MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.1.1)
MCA btl: smcuda (MCA v2.1.0, API v3.1.0, Component v4.1.1)

So I am trying to wrap my head around the various warnings, and what these
various options/parameters available to use can improve performance and/or
when to use them.

I've gone through the OpenMPI run-time tuning documentation
<https://www.open-mpi.org/faq/?category=tuning>, and I've used  this
STREAMS benchmark
<https://anilmaurya.wordpress.com/2016/10/12/stream-benchmarks/>,
https://anilmaurya.wordpress.com/2016/10/12/stream-benchmarks/ as well as
these OSU Micro-Benchmarks at
https://ulhpc-tutorials.readthedocs.io/en/latest/parallel/mpi/OSU_MicroBenchmarks/


With version 4.1.1, if I use --mca btl 'openib' I get seg faults which I
believe is expected as it's deprecated
<https://docs.open-mpi.org/en/v5.0.x/release-notes/networks.html>. I've
tried --mca  btl '^openib', --mca  btl 'tcp' (or  --mca  btl 'tcp,self' using
the OSU BMs) and the benchmark results are very similar even when I use
multiple CPUs, threads and/or nodes. They also run without the warning
messages. If I don't use a --mca option, I get the WARNING: message.

Does anyone know of a tried and true way to run these benchmarks so know if
these MCA parameters make a difference or am I just not understanding how
to use these? Perhaps running these benchmarks on a very active cluster
with shared CPUs/nodes will affect the results?

I can share any desired results if that helps the discussion.

Thanks!

Reply via email to