I am not aware of anything similar in Open MPI. Maybe OSU-INAM can work with other MPI implementations? Would be worth investigating...

Joseph

On 2/11/22 06:54, Bertini, Denis Dr. wrote:

Hi Joseph

Looking at the MVAPICH i noticed that, in this MPI implementation

a Infiniband Network Analysis  and Profiling Tool  is provided:


OSU-INAM


Is there something equivalent using openMPI ?

Best

Denis


------------------------------------------------------------------------
*From:* users <users-boun...@lists.open-mpi.org> on behalf of Joseph Schuchart via users <users@lists.open-mpi.org>
*Sent:* Tuesday, February 8, 2022 4:02:53 PM
*To:* users@lists.open-mpi.org
*Cc:* Joseph Schuchart
*Subject:* Re: [OMPI users] Using OSU benchmarks for checking Infiniband network
Hi Denis,

Sorry if I missed it in your previous messages but could you also try
running a different MPI implementation (MVAPICH) to see whether Open MPI
is at fault or the system is somehow to blame for it?

Thanks
Joseph

On 2/8/22 03:06, Bertini, Denis Dr. via users wrote:
>
> Hi
>
> Thanks for all these informations !
>
>
> But i have to confess that in this multi-tuning-parameter space,
>
> i got somehow lost.
>
> Furthermore it is somtimes mixing between user-space and kernel-space.
>
> I have only possibility to act on the user space.
>
>
> 1) So i have on the system max locked memory:
>
>                         - ulimit -l unlimited (default )
>
>   and i do not see any warnings/errors related to that when launching MPI.
>
>
> 2) I tried differents algorithms for MPI_all_reduce op.  all showing
> drop in
>
> bw for size=16384
>
>
> 4) I disable openIB ( no RDMA, ) and used only TCP, and i noticed
>
> the same behaviour.
>
>
> 3) i realized that increasing the so-called warm up parameter  in the
>
> OSU benchmark (argument -x 200 as default) the discrepancy.
>
> At the contrary putting lower threshold ( -x 10 ) can increase this BW
>
> discrepancy up to factor 300 at message size 16384 compare to
>
> message size 8192 for example.
>
> So does it means that there are some caching effects
>
> in the internode communication?
>
>
> From my experience, to tune parameters is a time-consuming and cumbersome
>
> task.
>
>
> Could it also be the problem is not really on the openMPI
> implemenation but on the
>
> system?
>
>
> Best
>
> Denis
>
> ------------------------------------------------------------------------
> *From:* users <users-boun...@lists.open-mpi.org> on behalf of Gus
> Correa via users <users@lists.open-mpi.org>
> *Sent:* Monday, February 7, 2022 9:14:19 PM
> *To:* Open MPI Users
> *Cc:* Gus Correa
> *Subject:* Re: [OMPI users] Using OSU benchmarks for checking
> Infiniband network
> This may have changed since, but these used to be relevant points.
> Overall, the Open MPI FAQ have lots of good suggestions:
> https://www.open-mpi.org/faq/
> some specific for performance tuning:
> https://www.open-mpi.org/faq/?category=tuning
> https://www.open-mpi.org/faq/?category=openfabrics
>
> 1) Make sure you are not using the Ethernet TCP/IP, which is widely
> available in compute nodes:
> mpirun  --mca btl self,sm,openib  ...
>
> https://www.open-mpi.org/faq/?category=tuning#selecting-components
>
> However, this may have changed lately:
> https://www.open-mpi.org/faq/?category=tcp#tcp-auto-disable
> 2) Maximum locked memory used by IB and their system limit. Start
> here:
> https://www.open-mpi.org/faq/?category=openfabrics#limiting-registered-memory-usage
> 3) The eager vs. rendezvous message size threshold. I wonder if it may
> sit right where you see the latency spike.
> https://www.open-mpi.org/faq/?category=all#ib-locked-pages-user
> 4) Processor and memory locality/affinity and binding (please check
> the current options and syntax)
> https://www.open-mpi.org/faq/?category=tuning#using-paffinity-v1.4
>
> On Mon, Feb 7, 2022 at 11:01 AM Benson Muite via users
> <users@lists.open-mpi.org> wrote:
>
>     Following https://www.open-mpi.org/doc/v3.1/man1/mpirun.1.php
>
>     mpirun --verbose --display-map
>
>     Have you tried newer OpenMPI versions?
>
>     Do you get similar behavior for the osu_reduce and osu_gather
>     benchmarks?
>
>     Typically internal buffer sizes as well as your hardware will affect
>     performance. Can you give specifications similar to what is
>     available at:
> http://mvapich.cse.ohio-state.edu/performance/collectives/
>     where the operating system, switch, node type and memory are
>     indicated.
>
>     If you need good performance, may want to also specify the algorithm
>     used. You can find some of the parameters you can tune using:
>
>     ompi_info --all
>
>     A particular helpful parameter is:
>
>     MCA coll tuned: parameter "coll_tuned_allreduce_algorithm" (current
>     value: "ignore", data source: default, level: 5 tuner/detail,
>     type: int)
>                                Which allreduce algorithm is used. Can be
>     locked down to any of: 0 ignore, 1 basic linear, 2 nonoverlapping
>     (tuned
>     reduce + tuned bcast), 3 recursive doubling, 4 ring, 5 segmented ring
>                                Valid values: 0:"ignore",
>     1:"basic_linear",
>     2:"nonoverlapping", 3:"recursive_doubling", 4:"ring",
>     5:"segmented_ring", 6:"rabenseifner"
>                MCA coll tuned: parameter
>     "coll_tuned_allreduce_algorithm_segmentsize" (current value: "0",
>     data
>     source: default, level: 5 tuner/detail, type: int)
>
>     For OpenMPI 4.0, there is a tuning program [2] that might also be
>     helpful.
>
>     [1]
> https://stackoverflow.com/questions/36635061/how-to-check-which-mca-parameters-are-used-in-openmpi
>     [2] https://github.com/open-mpi/ompi-collectives-tuning
>
>     On 2/7/22 4:49 PM, Bertini, Denis Dr. wrote:
>     > Hi
>     >
>     > When i repeat i always got the huge discrepancy at the
>     >
>     > message size of 16384.
>     >
>     > May be there is a way to run mpi in verbose mode in order
>     >
>     > to further investigate this behaviour?
>     >
>     > Best
>     >
>     > Denis
>     >
>     >
> ------------------------------------------------------------------------
>     > *From:* users <users-boun...@lists.open-mpi.org> on behalf of
>     Benson
>     > Muite via users <users@lists.open-mpi.org>
>     > *Sent:* Monday, February 7, 2022 2:27:34 PM
>     > *To:* users@lists.open-mpi.org
>     > *Cc:* Benson Muite
>     > *Subject:* Re: [OMPI users] Using OSU benchmarks for checking
>     Infiniband
>     > network
>     > Hi,
>     > Do you get similar results when you repeat the test? Another job
>     could
>     > have interfered with your run.
>     > Benson
>     > On 2/7/22 3:56 PM, Bertini, Denis Dr. via users wrote:
>     >> Hi
>     >>
>     >> I am using OSU microbenchmarks compiled with openMPI 3.1.6 in
>     order to
>     >> check/benchmark
>     >>
>     >> the infiniband network for our cluster.
>     >>
>     >> For that i use the collective all_reduce benchmark and run over
>     200
>     >> nodes, using 1 process per node.
>     >>
>     >> And this is the results i obtained 😎
>     >>
>     >>
>     >>
>     >> ################################################################
>     >>
>     >> # OSU MPI Allreduce Latency Test v5.7.1
>     >> # Size       Avg Latency(us)   Min Latency(us)  Max
>     Latency(us)  Iterations
>     >> 4                     114.65  83.22       147.98
>         1000
>     >> 8                     133.85 106.47       164.93
>         1000
>     >> 16                    116.41  87.57       150.58
>         1000
>     >> 32                    112.17  93.25       130.23
>         1000
>     >> 64                    106.85  81.93       134.74
>         1000
>     >> 128                   117.53  87.50       152.27
>         1000
>     >> 256                   143.08 115.63       173.97
>         1000
>     >> 512                   130.34 100.20       167.56
>         1000
>     >> 1024                  155.67 111.29       188.20
>         1000
>     >> 2048                  151.82 116.03       198.19
>         1000
>     >> 4096                  159.11 122.09       199.24
>         1000
>     >> 8192                  176.74 143.54       221.98
>         1000
>     >> 16384               48862.85 39270.21     54970.96
>         1000
>     >> 32768                2737.37  2614.60      2802.68
>         1000
>     >> 65536                2723.15  2585.62      2813.65
>         1000
>     >>
>     >>
> ####################################################################
>     >>
>     >> Could someone explain me what is happening for message = 16384 ?
>     >> One can notice a huge latency (~ 300 time larger) compare to
>     message
>     >> size = 8192.
>     >> I do not really understand what could create such an increase
>     in the
>     >> latency.
>     >> The reason i use the OSU microbenchmarks is that we
>     >> sporadically experience a drop
>     >> in the bandwith for typical collective operations such as
>     MPI_Reduce in
>     >> our cluster
>     >> which is difficult to understand.
>     >> I would be grateful if somebody can share its expertise or such
>     problem
>     >> with me.
>     >>
>     >> Best,
>     >> Denis
>     >>
>     >>
>     >>
>     >> ---------
>     >> Denis Bertini
>     >> Abteilung: CIT
>     >> Ort: SB3 2.265a
>     >>
>     >> Tel: +49 6159 71 2240
>     >> Fax: +49 6159 71 2986
>     >> E-Mail: d.bert...@gsi.de
>     >>
>     >> GSI Helmholtzzentrum für Schwerionenforschung GmbH
>     >> Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
>     <http://www.gsi.de>
>     >>
>     >> Commercial Register / Handelsregister: Amtsgericht Darmstadt,
>     HRB 1528
>     >> Managing Directors / Geschäftsführung:
>     >> Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
>     >> Chairman of the GSI Supervisory Board / Vorsitzender des
>     GSI-Aufsichtsrats:
>     >> Ministerialdirigent Dr. Volkmar Dietz
>     >>
>     >
>


Reply via email to