This may have changed since, but these used to be relevant points.
Overall, the Open MPI FAQ have lots of good suggestions:
https://www.open-mpi.org/faq/
some specific for performance tuning:
https://www.open-mpi.org/faq/?category=tuning
https://www.open-mpi.org/faq/?category=openfabrics

1) Make sure you are not using the Ethernet TCP/IP, which is widely
available in compute nodes:

mpirun --mca btl self,sm,openib ...

https://www.open-mpi.org/faq/?category=tuning#selecting-components

However, this may have changed lately:
https://www.open-mpi.org/faq/?category=tcp#tcp-auto-disable

2) Maximum locked memory used by IB and their system limit. Start here:
https://www.open-mpi.org/faq/?category=openfabrics#limiting-registered-memory-usage

3) The eager vs. rendezvous message size threshold.
I wonder if it may sit right where you see the latency spike.
https://www.open-mpi.org/faq/?category=all#ib-locked-pages-user

4) Processor and memory locality/affinity and binding (please check
the current options and syntax)
https://www.open-mpi.org/faq/?category=tuning#using-paffinity-v1.4


On Mon, Feb 7, 2022 at 11:01 AM Benson Muite via users <
users@lists.open-mpi.org> wrote:

> Following https://www.open-mpi.org/doc/v3.1/man1/mpirun.1.php
>
> mpirun --verbose --display-map
>
> Have you tried newer OpenMPI versions?
>
> Do you get similar behavior for the osu_reduce and osu_gather benchmarks?
>
> Typically internal buffer sizes as well as your hardware will affect
> performance. Can you give specifications similar to what is available at:
> http://mvapich.cse.ohio-state.edu/performance/collectives/
> where the operating system, switch, node type and memory are indicated.
>
> If you need good performance, may want to also specify the algorithm
> used. You can find some of the parameters you can tune using:
>
> ompi_info --all
>
> A particular helpful parameter is:
>
> MCA coll tuned: parameter "coll_tuned_allreduce_algorithm" (current
> value: "ignore", data source: default, level: 5 tuner/detail, type: int)
>                            Which allreduce algorithm is used. Can be
> locked down to any of: 0 ignore, 1 basic linear, 2 nonoverlapping (tuned
> reduce + tuned bcast), 3 recursive doubling, 4 ring, 5 segmented ring
>                            Valid values: 0:"ignore", 1:"basic_linear",
> 2:"nonoverlapping", 3:"recursive_doubling", 4:"ring",
> 5:"segmented_ring", 6:"rabenseifner"
>            MCA coll tuned: parameter
> "coll_tuned_allreduce_algorithm_segmentsize" (current value: "0", data
> source: default, level: 5 tuner/detail, type: int)
>
> For OpenMPI 4.0, there is a tuning program [2] that might also be helpful.
>
> [1]
>
> https://stackoverflow.com/questions/36635061/how-to-check-which-mca-parameters-are-used-in-openmpi
> [2] https://github.com/open-mpi/ompi-collectives-tuning
>
> On 2/7/22 4:49 PM, Bertini, Denis Dr. wrote:
> > Hi
> >
> > When i repeat i always got the huge discrepancy at the
> >
> > message size of 16384.
> >
> > May be there is a way to run mpi in verbose mode in order
> >
> > to further investigate this behaviour?
> >
> > Best
> >
> > Denis
> >
> > ------------------------------------------------------------------------
> > *From:* users <users-boun...@lists.open-mpi.org> on behalf of Benson
> > Muite via users <users@lists.open-mpi.org>
> > *Sent:* Monday, February 7, 2022 2:27:34 PM
> > *To:* users@lists.open-mpi.org
> > *Cc:* Benson Muite
> > *Subject:* Re: [OMPI users] Using OSU benchmarks for checking Infiniband
> > network
> > Hi,
> > Do you get similar results when you repeat the test? Another job could
> > have interfered with your run.
> > Benson
> > On 2/7/22 3:56 PM, Bertini, Denis Dr. via users wrote:
> >> Hi
> >>
> >> I am using OSU microbenchmarks compiled with openMPI 3.1.6 in order to
> >> check/benchmark
> >>
> >> the infiniband network for our cluster.
> >>
> >> For that i use the collective all_reduce benchmark and run over 200
> >> nodes, using 1 process per node.
> >>
> >> And this is the results i obtained 😎
> >>
> >>
> >>
> >> ################################################################
> >>
> >> # OSU MPI Allreduce Latency Test v5.7.1
> >> # Size       Avg Latency(us)   Min Latency(us)   Max Latency(us)
> Iterations
> >> 4                     114.65             83.22            147.98
> 1000
> >> 8                     133.85            106.47            164.93
> 1000
> >> 16                    116.41             87.57            150.58
> 1000
> >> 32                    112.17             93.25            130.23
> 1000
> >> 64                    106.85             81.93            134.74
> 1000
> >> 128                   117.53             87.50            152.27
> 1000
> >> 256                   143.08            115.63            173.97
> 1000
> >> 512                   130.34            100.20            167.56
> 1000
> >> 1024                  155.67            111.29            188.20
> 1000
> >> 2048                  151.82            116.03            198.19
> 1000
> >> 4096                  159.11            122.09            199.24
> 1000
> >> 8192                  176.74            143.54            221.98
> 1000
> >> 16384               48862.85          39270.21          54970.96
> 1000
> >> 32768                2737.37           2614.60           2802.68
> 1000
> >> 65536                2723.15           2585.62           2813.65
> 1000
> >>
> >> ####################################################################
> >>
> >> Could someone explain me what is happening for message = 16384 ?
> >> One can notice a huge latency (~ 300 time larger)  compare to message
> >> size = 8192.
> >> I do not really understand what could  create such an increase in the
> >> latency.
> >> The reason i use the OSU microbenchmarks is that we
> >> sporadically experience a drop
> >> in the bandwith for typical collective operations such as MPI_Reduce in
> >> our cluster
> >> which is difficult to understand.
> >> I would be grateful if somebody can share its expertise or such problem
> >> with me.
> >>
> >> Best,
> >> Denis
> >>
> >>
> >>
> >> ---------
> >> Denis Bertini
> >> Abteilung: CIT
> >> Ort: SB3 2.265a
> >>
> >> Tel: +49 6159 71 2240
> >> Fax: +49 6159 71 2986
> >> E-Mail: d.bert...@gsi.de
> >>
> >> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> >> Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
> >>
> >> Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
> >> Managing Directors / Geschäftsführung:
> >> Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
> >> Chairman of the GSI Supervisory Board / Vorsitzender des
> GSI-Aufsichtsrats:
> >> Ministerialdirigent Dr. Volkmar Dietz
> >>
> >
>
>

Reply via email to