Hi Josh,

Thanks for your reply. I did try setting MXM_RDMA_PORTS=mlx4_0:1 for all my MPI 
processes
and it did improve performance but the performance I obtain isn't completely 
satisfying.

When I use IMB 4.1 pingpong and sendrecv benchmarks between two nodes I get 
using
Open MPI 1.10.3:

 without MXM_RDMA_PORTS

   comm       lat_min      bw_max      bw_max
              pingpong     pingpong    sendrecv
              (us)         (MB/s)      (MB/s)
   -------------------------------------------
   openib     1.79         5947.07    11534
   mxm        2.51         5166.96     8079.18
   yalla      2.47         5167.29     8278.15


 with MXM_RDMA_PORTS=mlx4_0:1

   comm       lat_min      bw_max      bw_max
              pingpong     pingpong    sendrecv
              (us)         (MB/s)      (MB/s)
   -------------------------------------------
   openib     1.79         5827.93    11552.4
   mxm        2.23         5191.77     8201.76
   yalla      2.18         5200.55     8109.48


openib means: pml=ob1                 btl=openib,vader,self  
btl_openib_include_if=mlx4_0
mxm    means: pml=cm,ob1     mtl=mxm  btl=vader,self
yalla  means: pml=yalla,ob1           btl=vader,self

lspci reports for our FDR Infiniband HCA:
  Infiniband Controler: Mellanox Technologies MT27500 Family [ConnectX-3]

and 16 lines like:
  Infiniband Controler: Mellanox Technologies MT27500/MT27520 Family 
[ConnectX-3/ConnectX-3 Pro Virtual Function]

the nodes use two octacore Xeon E5-2650v2 Ivybridge-EP 2.67 GHz sockets

ofed_info reports that mxm version is 3.4.3cce223-0.32200

As you can see the results are not very good. I would expect mxm and yalla to 
perform
better than openib both in term of latency and bandwidth (note: sendrecv 
bandwidth is
full duplex). I would expect the yalla bandwidth to be around 1.1 us like shown 
here
https://www.open-mpi.org/papers/sc-2014/Open-MPI-SC14-BOF.pdf (page 33).

I also ran mxm_perftest (located in /opt/mellanox/bin) and it reports the 
following
latency between two nodes:

 without MXM_RDMA_PORTS                1.92 us
 with    MXM_RDMA_PORTS=mlx4_0:1       1.65 us

Again I think we can expect a better latency with our configuration. 1.65 us is 
not a
very good result.

Note however that the 0.27 us (1.92 - 1.65 = 0.27) reduction reduction in raw 
mxm
latency correspond to the above Open MPI latencies observed with mxm (2.51 - 
2.23 = 0.28)
and yalla (2.47 - 2.18 = 0.29).

Another detail: everything is run inside LXC containers. Also SR-IOV is 
probably used.

Does anyone has any idea what's wrong with our cluster ?

Martin Audet


> Hi, Martin
>
> The environment variable:
>
> MXM_RDMA_PORTS=device:port
>
> is what you're looking for. You can specify a device/port pair on your OMPI
> command line like:
>
> mpirun -np 2 ... -x MXM_RDMA_PORTS=mlx4_0:1 ...
>
>
> Best,
>
> Josh

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to