Hi Josh, Thanks for your reply. I did try setting MXM_RDMA_PORTS=mlx4_0:1 for all my MPI processes and it did improve performance but the performance I obtain isn't completely satisfying.
When I use IMB 4.1 pingpong and sendrecv benchmarks between two nodes I get using Open MPI 1.10.3: without MXM_RDMA_PORTS comm lat_min bw_max bw_max pingpong pingpong sendrecv (us) (MB/s) (MB/s) ------------------------------------------- openib 1.79 5947.07 11534 mxm 2.51 5166.96 8079.18 yalla 2.47 5167.29 8278.15 with MXM_RDMA_PORTS=mlx4_0:1 comm lat_min bw_max bw_max pingpong pingpong sendrecv (us) (MB/s) (MB/s) ------------------------------------------- openib 1.79 5827.93 11552.4 mxm 2.23 5191.77 8201.76 yalla 2.18 5200.55 8109.48 openib means: pml=ob1 btl=openib,vader,self btl_openib_include_if=mlx4_0 mxm means: pml=cm,ob1 mtl=mxm btl=vader,self yalla means: pml=yalla,ob1 btl=vader,self lspci reports for our FDR Infiniband HCA: Infiniband Controler: Mellanox Technologies MT27500 Family [ConnectX-3] and 16 lines like: Infiniband Controler: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] the nodes use two octacore Xeon E5-2650v2 Ivybridge-EP 2.67 GHz sockets ofed_info reports that mxm version is 3.4.3cce223-0.32200 As you can see the results are not very good. I would expect mxm and yalla to perform better than openib both in term of latency and bandwidth (note: sendrecv bandwidth is full duplex). I would expect the yalla bandwidth to be around 1.1 us like shown here https://www.open-mpi.org/papers/sc-2014/Open-MPI-SC14-BOF.pdf (page 33). I also ran mxm_perftest (located in /opt/mellanox/bin) and it reports the following latency between two nodes: without MXM_RDMA_PORTS 1.92 us with MXM_RDMA_PORTS=mlx4_0:1 1.65 us Again I think we can expect a better latency with our configuration. 1.65 us is not a very good result. Note however that the 0.27 us (1.92 - 1.65 = 0.27) reduction reduction in raw mxm latency correspond to the above Open MPI latencies observed with mxm (2.51 - 2.23 = 0.28) and yalla (2.47 - 2.18 = 0.29). Another detail: everything is run inside LXC containers. Also SR-IOV is probably used. Does anyone has any idea what's wrong with our cluster ? Martin Audet > Hi, Martin > > The environment variable: > > MXM_RDMA_PORTS=device:port > > is what you're looking for. You can specify a device/port pair on your OMPI > command line like: > > mpirun -np 2 ... -x MXM_RDMA_PORTS=mlx4_0:1 ... > > > Best, > > Josh
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users