Hello, Alina! If I use --map-by node I will get only intranode communications on osu_mbw_mr. I use --map-by core instead.
I have 2 nodes, each node has 2 sockets with 8 cores per socket. When I run osu_mbw_mr on 2 nodes with 32 MPI procs (command see below), I expect to see the unidirectional bandwidth of 4xFDR link as a result of this test. With IntelMPI I get 6367 MB/s, With ompi_yalla I get about 3744 MB/s (problem: it is a half of impi result) With openmpi without mxm (ompi_clear) I get 6321 MB/s. How can I increase yalla results? IntelMPI cmd: /opt/software/intel/impi/4.1.0.030/intel64/bin/mpiexec.hydra -machinefile machines.pYAvuK -n 32 -binding domain=core ../osu_impi/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v -r=0 ompi_yalla cmd: /gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-mellanox-fca-v1.8.5/bin/mpirun -report-bindings -display-map -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx4_0:1 -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off --mca pml yalla --map-by core --bind-to core --hostfile hostlist ../osu_ompi_hcoll/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v -r=0 ompi_clear cmd: /gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-clear-v1.8.5/bin/mpirun -report-bindings -display-map --hostfile hostlist --map-by core --bind-to core ../osu_ompi_clear/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v -r=0 I have attached output files to this letter: ompi_clear.out, ompi_clear.err - contains ompi_clear results ompi_yalla.out, ompi_yalla.err - contains ompi_yalla results impi.out, impi.err - contains intel MPI results Best regards, Timur Воскресенье, 7 июня 2015, 16:11 +03:00 от Alina Sklarevich <ali...@dev.mellanox.co.il>: >Hi Timur, > >After running the osu_mbw_mr benchmark in our lab, we obsereved that the >binding policy made a difference on the performance. >Can you please rerun your ompi tests with the following added to your command >line? (one of them in each run) > >1. --map-by node --bind-to socket >2. --map-by node --bind-to core > >Please attach your results. > >Thank you, >Alina. > >On Thu, Jun 4, 2015 at 6:53 PM, Timur Ismagilov < tismagi...@mail.ru > wrote: >>Hello, Alina. >>1. Here is my >>ompi_yalla command line: >>$HPCX_MPI_DIR/bin/mpirun -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx4_0:1 >>-x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off --mca pml yalla --hostfile >>hostlist $@ >>echo $HPCX_MPI_DIR >>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ >> ompi-mellanox-fca-v1.8.5 >>This mpi was configured with: --with-mxm=/path/to/mxm >>--with-hcoll=/path/to/hcoll >>--with-platform=contrib/platform/mellanox/optimized --prefix=/path/to/ >>ompi-mellanox-fca-v1.8.5 >>ompi_clear command line: >>HPCX_MPI_DIR/bin/mpirun --hostfile hostlist $@ >>echo $HPCX_MPI_DIR >>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ >> ompi-clear-v1.8.5 >>This mpi was configured with: >>--with-platform=contrib/platform/mellanox/optimized --prefix=/path/to >>/ompi-clear-v1.8.5 >>2. When i run osu_mbr_mr with key "-x MXM_TLS=self,shm,rc" . It fails with >>segmentation fault : >>stdout log is in attached file osu_mbr_mr_n-2_ppn-16.out; >>stderr log is in attached file osu_mbr_mr_n-2_ppn-16.err; >>cmd line: >>$HPCX_MPI_DIR/bin/mpirun -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx4_0:1 >>-x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off --mca pml yalla -x >>MXM_TLS=self,shm,rc --hostfile hostlist osu_mbw_mr -v -r=0 >>osu_mbw_mr.c >>I have changed WINDOW_SIZES in osu_mbw_mr.c: >>#define WINDOW_SIZES {8, 16, 32, 64, 128, 256, 512, 1024 } >>3. I add results of running osu_mbw_mr with yalla and without hcoll on 32 and >>64 nodes (512 and 1024 mpi procs >>) to mvs10p_mpi.xls : list osu_mbr_mr. >>The results are 20 percents smaller than old results (with hcoll). >> >> >> >>Среда, 3 июня 2015, 10:29 +03:00 от Alina Sklarevich < >>ali...@dev.mellanox.co.il >: >>>Hello Timur, >>> >>>I will review your results and try to reproduce them in our lab. >>> >>>You are using an old OFED - OFED-1.5.4.1 and we suspect that this may be >>>causing the performance issues you are seeing. >>> >>>In the meantime, could you please: >>> >>>1. send us the exact command lines that you were running when you got these >>>results? >>> >>>2. add the following to the command line that you are running with 'pml >>>yalla' and attach the results? >>>"-x MXM_TLS=self,shm,rc" >>> >>>3. run your command line with yalla and without hcoll? >>> >>>Thanks, >>>Alina. >>> >>> >>> >>>On Tue, Jun 2, 2015 at 4:56 PM, Timur Ismagilov < tismagi...@mail.ru > >>>wrote: >>>>Hi, Mike! >>>>I have impi v 4.1.2 (- impi) >>>>I build ompi 1.8.5 with MXM and hcoll (- ompi_yalla) >>>>I build ompi 1.8.5 without MXM and hcoll (- ompi_clear) >>>>I start osu p2p: osu_mbr_mr test with this MPIs. >>>>You can find the result of benchmark in attached file(mvs10p_mpi.xls: list >>>>osu_mbr_mr) >>>> >>>>On 64 nodes (and 1024 mpi processes) ompi_yalla get 2 time worse perf than >>>>ompi_clear. >>>>Is mxm with yalla reduces performance in p2p compared with ompi_clear(and >>>>impi)? >>>>Am I doing something wrong? >>>>P.S. My colleague Alexander Semenov is in CC >>>>Best regards, >>>>Timur >>>> >>>>Четверг, 28 мая 2015, 20:02 +03:00 от Mike Dubman < >>>>mi...@dev.mellanox.co.il >: >>>>>it is not apples-to-apples comparison. >>>>> >>>>>yalla/mxm is point-to-point library, it is not collective library. >>>>>collective algorithm happens on top of yalla. >>>>> >>>>>Intel collective algorithm for a2a is better than OMPI built-in collective >>>>>algorithm. >>>>> >>>>>To see benefit of yalla - you should run p2p benchmarks >>>>>(osu_lat/bw/bibw/mr) >>>>> >>>>> >>>>>On Thu, May 28, 2015 at 7:35 PM, Timur Ismagilov < tismagi...@mail.ru > >>>>>wrote: >>>>>>I compare ompi-1.8.5 (hpcx-1.3.3-icc) with impi v 4.1.4. >>>>>> >>>>>>I build ompi with MXM but without HCOLL and without knem (I work on it). >>>>>>Configure options are: >>>>>> ./configure --prefix=my_prefix >>>>>>--with-mxm=path/to/hpcx/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/mxm >>>>>> --with-platform=contrib/platform/mellanox/optimized >>>>>> >>>>>>As a result of the IMB-MPI1 Alltoall test, I have got disappointing >>>>>>results: for the most message sizes on 64 nodes and 16 processes per >>>>>>node impi is much (~40%) better. >>>>>> >>>>>>You can look at the results in the file "mvs10p_mpi.xlsx", I attach it. >>>>>>System configuration is also there. >>>>>> >>>>>>What do you think about? Is there any way to improve ompi yalla >>>>>>performance results? >>>>>> >>>>>>I attach the output of "IMB-MPI1 Alltoall" for yalla and impi. >>>>>> >>>>>>P.S. My colleague Alexander Semenov is in CC >>>>>> >>>>>>Best regards, >>>>>>Timur >>>>> >>>>> >>>>>-- >>>>> >>>>>Kind Regards, >>>>> >>>>>M. >>>> >>>> >>>> >>>>_______________________________________________ >>>>users mailing list >>>>us...@open-mpi.org >>>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>Link to this post: >>>>http://www.open-mpi.org/community/lists/users/2015/06/27029.php >>> >> >> >> >
impi.err
Description: Binary data
ompi_clear.err
Description: Binary data
ompi_yalla.err
Description: Binary data
ompi_clear.out
Description: Binary data
ompi_yalla.out
Description: Binary data
impi.out
Description: Binary data