Hello, Alina!

If I use  --map-by node I will get only intranode communications on osu_mbw_mr. 
I use --map-by core instead.

I have 2 nodes, each node has 2 sockets with 8 cores per socket.

When I run osu_mbw_mr on 2 nodes with 32 MPI procs (command see below), I  
expect to see the unidirectional bandwidth of 4xFDR  link as a result  of this 
test.

With IntelMPI I get 6367 MB/s, 
With ompi_yalla I get about 3744 MB/s (problem: it is a half of impi result)
With openmpi without mxm (ompi_clear) I get 6321 MB/s.

How can I increase yalla results?

IntelMPI cmd:
/opt/software/intel/impi/4.1.0.030/intel64/bin/mpiexec.hydra  -machinefile 
machines.pYAvuK -n 32 -binding domain=core  
../osu_impi/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v -r=0

ompi_yalla cmd:
/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-mellanox-fca-v1.8.5/bin/mpirun
  -report-bindings -display-map -mca coll_hcoll_enable 1 -x  
HCOLL_MAIN_IB=mlx4_0:1 -x     MXM_IB_PORTS=mlx4_0:1 -x  MXM_SHM_KCOPY_MODE=off 
--mca pml yalla --map-by core --bind-to core  --hostfile hostlist  
../osu_ompi_hcoll/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v  -r=0

ompi_clear cmd:
/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-clear-v1.8.5/bin/mpirun
  -report-bindings -display-map --hostfile hostlist --map-by core  --bind-to 
core  ../osu_ompi_clear/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v  
-r=0

I have attached output files to this letter:
ompi_clear.out, ompi_clear.err - contains ompi_clear results
ompi_yalla.out, ompi_yalla.err - contains ompi_yalla results
impi.out, impi.err - contains intel MPI results

Best regards,
Timur

Воскресенье,  7 июня 2015, 16:11 +03:00 от Alina Sklarevich 
<ali...@dev.mellanox.co.il>:
>Hi Timur,
>
>After running the osu_mbw_mr benchmark in our lab, we obsereved that the 
>binding policy made a difference on the performance.
>Can you please rerun your ompi tests with the following added to your command 
>line? (one of them in each run)
>
>1. --map-by node --bind-to socket
>2. --map-by node --bind-to core
>
>Please attach your results.
>
>Thank you,
>Alina.
>
>On Thu, Jun 4, 2015 at 6:53 PM, Timur Ismagilov  < tismagi...@mail.ru > wrote:
>>Hello, Alina.
>>1. Here is my 
>>ompi_yalla command line:
>>$HPCX_MPI_DIR/bin/mpirun -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx4_0:1 
>>-x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off --mca pml yalla --hostfile 
>>hostlist $@
>>echo $HPCX_MPI_DIR 
>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/
>> ompi-mellanox-fca-v1.8.5
>>This mpi was configured with: --with-mxm=/path/to/mxm 
>>--with-hcoll=/path/to/hcoll 
>>--with-platform=contrib/platform/mellanox/optimized --prefix=/path/to/ 
>>ompi-mellanox-fca-v1.8.5
>>ompi_clear command line:
>>HPCX_MPI_DIR/bin/mpirun  --hostfile hostlist $@
>>echo $HPCX_MPI_DIR 
>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/
>> ompi-clear-v1.8.5
>>This mpi was configured with: 
>>--with-platform=contrib/platform/mellanox/optimized --prefix=/path/to 
>>/ompi-clear-v1.8.5
>>2. When i run osu_mbr_mr with key "-x MXM_TLS=self,shm,rc" . It fails with 
>>segmentation fault : 
>>stdout log is in attached file osu_mbr_mr_n-2_ppn-16.out; 
>>stderr log is in attached file osu_mbr_mr_n-2_ppn-16.err;
>>cmd line:
>>$HPCX_MPI_DIR/bin/mpirun -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx4_0:1 
>>-x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off --mca pml yalla -x 
>>MXM_TLS=self,shm,rc --hostfile hostlist osu_mbw_mr -v -r=0
>>osu_mbw_mr.c
>>I have changed WINDOW_SIZES in osu_mbw_mr.c:
>>#define WINDOW_SIZES {8, 16, 32, 64,  128, 256, 512, 1024 }  
>>3. I add results of running osu_mbw_mr with yalla and without hcoll on 32 and 
>>64 nodes (512 and 1024 mpi procs
>>) to  mvs10p_mpi.xls : list osu_mbr_mr.
>>The results are 20 percents smaller than old results (with hcoll).
>>
>>
>>
>>Среда,  3 июня 2015, 10:29 +03:00 от Alina Sklarevich < 
>>ali...@dev.mellanox.co.il >:
>>>Hello Timur,
>>>
>>>I will review your results and try to reproduce them in our lab.
>>>
>>>You are using an old OFED - OFED-1.5.4.1 and we suspect that this may be 
>>>causing the performance issues you are seeing.
>>>
>>>In the meantime, could you please:
>>>
>>>1. send us the exact command lines that you were running when you got these 
>>>results?
>>>
>>>2. add the following to the command line that you are running with 'pml 
>>>yalla' and attach the results?
>>>"-x MXM_TLS=self,shm,rc"
>>>
>>>3. run your command line with yalla and without hcoll?
>>>
>>>Thanks,
>>>Alina.
>>>
>>>
>>>
>>>On Tue, Jun 2, 2015 at 4:56 PM, Timur Ismagilov  < tismagi...@mail.ru > 
>>>wrote:
>>>>Hi, Mike!
>>>>I have impi v 4.1.2 (- impi)
>>>>I build ompi 1.8.5 with MXM and hcoll (- ompi_yalla)
>>>>I build ompi 1.8.5 without MXM and hcoll (- ompi_clear)
>>>>I start osu p2p: osu_mbr_mr test with this MPIs.
>>>>You can find the result of benchmark in attached file(mvs10p_mpi.xls: list 
>>>>osu_mbr_mr)
>>>>
>>>>On 64 nodes (and 1024 mpi processes) ompi_yalla get 2 time worse perf than 
>>>>ompi_clear.
>>>>Is mxm with yalla  reduces performance in p2p  compared with ompi_clear(and 
>>>>impi)?
>>>>Am  I  doing something wrong?
>>>>P.S. My colleague Alexander Semenov is in CC
>>>>Best regards,
>>>>Timur
>>>>
>>>>Четверг, 28 мая 2015, 20:02 +03:00 от Mike Dubman < 
>>>>mi...@dev.mellanox.co.il >:
>>>>>it is not apples-to-apples comparison.
>>>>>
>>>>>yalla/mxm is point-to-point library, it is not collective library.
>>>>>collective algorithm happens on top of yalla.
>>>>>
>>>>>Intel collective algorithm for a2a is better than OMPI built-in collective 
>>>>>algorithm.
>>>>>
>>>>>To see benefit of yalla - you should run p2p benchmarks 
>>>>>(osu_lat/bw/bibw/mr)
>>>>>
>>>>>
>>>>>On Thu, May 28, 2015 at 7:35 PM, Timur Ismagilov  < tismagi...@mail.ru > 
>>>>>wrote:
>>>>>>I compare ompi-1.8.5 (hpcx-1.3.3-icc) with impi v 4.1.4.
>>>>>>
>>>>>>I build ompi with MXM but without HCOLL and without  knem (I work on it). 
>>>>>>Configure options are:
>>>>>> ./configure  --prefix=my_prefix   
>>>>>>--with-mxm=path/to/hpcx/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/mxm
>>>>>>   --with-platform=contrib/platform/mellanox/optimized
>>>>>>
>>>>>>As a result of the IMB-MPI1 Alltoall test, I have got disappointing  
>>>>>>results: for the most message sizes on 64 nodes and 16 processes per  
>>>>>>node impi is much (~40%) better.
>>>>>>
>>>>>>You can look at the results in the file "mvs10p_mpi.xlsx", I attach it. 
>>>>>>System configuration is also there.
>>>>>>
>>>>>>What do you think about? Is there any way to improve ompi yalla 
>>>>>>performance results?
>>>>>>
>>>>>>I attach the output of  "IMB-MPI1 Alltoall" for yalla and impi.
>>>>>>
>>>>>>P.S. My colleague Alexander Semenov is in CC
>>>>>>
>>>>>>Best regards,
>>>>>>Timur
>>>>>
>>>>>
>>>>>-- 
>>>>>
>>>>>Kind Regards,
>>>>>
>>>>>M.
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>users mailing list
>>>>us...@open-mpi.org
>>>>Subscription:  http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>Link to this post:  
>>>>http://www.open-mpi.org/community/lists/users/2015/06/27029.php
>>>
>>
>>
>>
>




Attachment: impi.err
Description: Binary data

Attachment: ompi_clear.err
Description: Binary data

Attachment: ompi_yalla.err
Description: Binary data

Attachment: ompi_clear.out
Description: Binary data

Attachment: ompi_yalla.out
Description: Binary data

Attachment: impi.out
Description: Binary data

Reply via email to