With ' --bind-to socket' i get the same results as '--bind-to-core' : 3813 MB/s. I have attached ompi_yalla_socket.out and ompi_yalla_socket.err files to this letter.
Вторник, 16 июня 2015, 18:15 +03:00 от Alina Sklarevich <ali...@dev.mellanox.co.il>: >Hi Timur, > >Can you please try running your ompi_yalla cmd with ' --bind-to socket' >(instead of binding to core) and check if it affects the results? >We saw that it made a difference on the performance in our lab so that's why I >asked you to try the same. > >Thanks, >Alina. > >On Tue, Jun 16, 2015 at 5:53 PM, Timur Ismagilov < tismagi...@mail.ru > wrote: >>Hello, Alina! >> >>If I use --map-by node I will get only intranode communications on >>osu_mbw_mr. I use --map-by core instead. >> >>I have 2 nodes, each node has 2 sockets with 8 cores per socket. >> >>When I run osu_mbw_mr on 2 nodes with 32 MPI procs (command see below), I >>expect to see the unidirectional bandwidth of 4xFDR link as a result of >>this test. >> >>With IntelMPI I get 6367 MB/s, >>With ompi_yalla I get about 3744 MB/s (problem: it is a half of impi result) >>With openmpi without mxm (ompi_clear) I get 6321 MB/s. >> >>How can I increase yalla results? >> >>IntelMPI cmd: >>/opt/software/intel/impi/ 4.1.0.030/intel64/bin/mpiexec.hydra -machinefile >>machines.pYAvuK -n 32 -binding domain=core >>../osu_impi/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v -r=0 >> >>ompi_yalla cmd: >>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-mellanox-fca-v1.8.5/bin/mpirun >> -report-bindings -display-map -mca coll_hcoll_enable 1 -x >>HCOLL_MAIN_IB=mlx4_0:1 -x MXM_IB_PORTS=mlx4_0:1 -x >>MXM_SHM_KCOPY_MODE=off --mca pml yalla --map-by core --bind-to core >>--hostfile hostlist >>../osu_ompi_hcoll/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v -r=0 >> >>ompi_clear cmd: >>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-clear-v1.8.5/bin/mpirun >> -report-bindings -display-map --hostfile hostlist --map-by core --bind-to >>core ../osu_ompi_clear/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v >>-r=0 >> >>I have attached output files to this letter: >>ompi_clear.out, ompi_clear.err - contains ompi_clear results >>ompi_yalla.out, ompi_yalla.err - contains ompi_yalla results >>impi.out, impi.err - contains intel MPI results >> >>Best regards, >>Timur >> >>Воскресенье, 7 июня 2015, 16:11 +03:00 от Alina Sklarevich < >>ali...@dev.mellanox.co.il >: >>>Hi Timur, >>> >>>After running the osu_mbw_mr benchmark in our lab, we obsereved that the >>>binding policy made a difference on the performance. >>>Can you please rerun your ompi tests with the following added to your >>>command line? (one of them in each run) >>> >>>1. --map-by node --bind-to socket >>>2. --map-by node --bind-to core >>> >>>Please attach your results. >>> >>>Thank you, >>>Alina. >>> >>>On Thu, Jun 4, 2015 at 6:53 PM, Timur Ismagilov < tismagi...@mail.ru > >>>wrote: >>>>Hello, Alina. >>>>1. Here is my >>>>ompi_yalla command line: >>>>$HPCX_MPI_DIR/bin/mpirun -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx4_0:1 >>>>-x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off --mca pml yalla >>>>--hostfile hostlist $@ >>>>echo $HPCX_MPI_DIR >>>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ >>>> ompi-mellanox-fca-v1.8.5 >>>>This mpi was configured with: --with-mxm=/path/to/mxm >>>>--with-hcoll=/path/to/hcoll >>>>--with-platform=contrib/platform/mellanox/optimized --prefix=/path/to/ >>>>ompi-mellanox-fca-v1.8.5 >>>>ompi_clear command line: >>>>HPCX_MPI_DIR/bin/mpirun --hostfile hostlist $@ >>>>echo $HPCX_MPI_DIR >>>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ >>>> ompi-clear-v1.8.5 >>>>This mpi was configured with: >>>>--with-platform=contrib/platform/mellanox/optimized --prefix=/path/to >>>>/ompi-clear-v1.8.5 >>>>2. When i run osu_mbr_mr with key "-x MXM_TLS=self,shm,rc" . It fails with >>>>segmentation fault : >>>>stdout log is in attached file osu_mbr_mr_n-2_ppn-16.out; >>>>stderr log is in attached file osu_mbr_mr_n-2_ppn-16.err; >>>>cmd line: >>>>$HPCX_MPI_DIR/bin/mpirun -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx4_0:1 >>>>-x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off --mca pml yalla -x >>>>MXM_TLS=self,shm,rc --hostfile hostlist osu_mbw_mr -v -r=0 >>>>osu_mbw_mr.c >>>>I have changed WINDOW_SIZES in osu_mbw_mr.c: >>>>#define WINDOW_SIZES {8, 16, 32, 64, 128, 256, 512, 1024 } >>>>3. I add results of running osu_mbw_mr with yalla and without hcoll on 32 >>>>and 64 nodes (512 and 1024 mpi procs >>>>) to mvs10p_mpi.xls : list osu_mbr_mr. >>>>The results are 20 percents smaller than old results (with hcoll). >>>> >>>> >>>> >>>>Среда, 3 июня 2015, 10:29 +03:00 от Alina Sklarevich < >>>>ali...@dev.mellanox.co.il >: >>>>>Hello Timur, >>>>> >>>>>I will review your results and try to reproduce them in our lab. >>>>> >>>>>You are using an old OFED - OFED-1.5.4.1 and we suspect that this may be >>>>>causing the performance issues you are seeing. >>>>> >>>>>In the meantime, could you please: >>>>> >>>>>1. send us the exact command lines that you were running when you got >>>>>these results? >>>>> >>>>>2. add the following to the command line that you are running with 'pml >>>>>yalla' and attach the results? >>>>>"-x MXM_TLS=self,shm,rc" >>>>> >>>>>3. run your command line with yalla and without hcoll? >>>>> >>>>>Thanks, >>>>>Alina. >>>>> >>>>> >>>>> >>>>>On Tue, Jun 2, 2015 at 4:56 PM, Timur Ismagilov < tismagi...@mail.ru > >>>>>wrote: >>>>>>Hi, Mike! >>>>>>I have impi v 4.1.2 (- impi) >>>>>>I build ompi 1.8.5 with MXM and hcoll (- ompi_yalla) >>>>>>I build ompi 1.8.5 without MXM and hcoll (- ompi_clear) >>>>>>I start osu p2p: osu_mbr_mr test with this MPIs. >>>>>>You can find the result of benchmark in attached file(mvs10p_mpi.xls: >>>>>>list osu_mbr_mr) >>>>>> >>>>>>On 64 nodes (and 1024 mpi processes) ompi_yalla get 2 time worse perf >>>>>>than ompi_clear. >>>>>>Is mxm with yalla reduces performance in p2p compared with >>>>>>ompi_clear(and impi)? >>>>>>Am I doing something wrong? >>>>>>P.S. My colleague Alexander Semenov is in CC >>>>>>Best regards, >>>>>>Timur >>>>>> >>>>>>Четверг, 28 мая 2015, 20:02 +03:00 от Mike Dubman < >>>>>>mi...@dev.mellanox.co.il >: >>>>>>>it is not apples-to-apples comparison. >>>>>>> >>>>>>>yalla/mxm is point-to-point library, it is not collective library. >>>>>>>collective algorithm happens on top of yalla. >>>>>>> >>>>>>>Intel collective algorithm for a2a is better than OMPI built-in >>>>>>>collective algorithm. >>>>>>> >>>>>>>To see benefit of yalla - you should run p2p benchmarks >>>>>>>(osu_lat/bw/bibw/mr) >>>>>>> >>>>>>> >>>>>>>On Thu, May 28, 2015 at 7:35 PM, Timur Ismagilov < tismagi...@mail.ru > >>>>>>>wrote: >>>>>>>>I compare ompi-1.8.5 (hpcx-1.3.3-icc) with impi v 4.1.4. >>>>>>>> >>>>>>>>I build ompi with MXM but without HCOLL and without knem (I work on >>>>>>>>it). Configure options are: >>>>>>>> ./configure --prefix=my_prefix >>>>>>>>--with-mxm=path/to/hpcx/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/mxm >>>>>>>> --with-platform=contrib/platform/mellanox/optimized >>>>>>>> >>>>>>>>As a result of the IMB-MPI1 Alltoall test, I have got disappointing >>>>>>>>results: for the most message sizes on 64 nodes and 16 processes per >>>>>>>>node impi is much (~40%) better. >>>>>>>> >>>>>>>>You can look at the results in the file "mvs10p_mpi.xlsx", I attach it. >>>>>>>>System configuration is also there. >>>>>>>> >>>>>>>>What do you think about? Is there any way to improve ompi yalla >>>>>>>>performance results? >>>>>>>> >>>>>>>>I attach the output of "IMB-MPI1 Alltoall" for yalla and impi. >>>>>>>> >>>>>>>>P.S. My colleague Alexander Semenov is in CC >>>>>>>> >>>>>>>>Best regards, >>>>>>>>Timur >>>>>>> >>>>>>> >>>>>>>-- >>>>>>> >>>>>>>Kind Regards, >>>>>>> >>>>>>>M. >>>>>> >>>>>> >>>>>> >>>>>>_______________________________________________ >>>>>>users mailing list >>>>>>us...@open-mpi.org >>>>>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>Link to this post: >>>>>>http://www.open-mpi.org/community/lists/users/2015/06/27029.php >>>>> >>>> >>>> >>>> >>> >> >> >> >