I'm not sure why nobody has encountered this issue on the mailing list. After some fiddling I was finally able to isolated to a performance regression introduced between 2.0.1 and 2.0.2. While trying to binary search the exact commit causing the performance regression my colleague brought this to my attention,
https://github.com/open-mpi/ompi/issues/3003 Yes this is exactly the issue! So just in case anybody run into the same issue again ... On Tue, Mar 7, 2017 at 12:59 PM, Yong Qin <yong....@gmail.com> wrote: > OK, did some testing with MVAPICH and everything is normal so this is > clearly with OMPI. Is there anything that I should try? > > Thanks, > > Yong Qin > > On Mon, Mar 6, 2017 at 11:46 AM, Yong Qin <yong....@gmail.com> wrote: > >> Hi, >> >> I'm wondering if anybody who has done perf testing on Mellanox EDR with >> OMPI can shed some light here? >> >> We have a pair of EDR HCAs connected back to back. We are testing with >> two dual-socked Intel Xeon E5-2670v3 (Haswell) nodes @2.30GHz, 64GB memory. >> OS is Scientific Linux 6.7 with kernel >> 2.6.32-642.6.2.el6.x86_64, vanilla OFED 3.18-2. HCAs are running the >> latest FW. OMPI 2.0.2. >> >> OSU bandwidth test only delivers ~5.5 GB/s at 4MB message size, latency >> is ~2.7 us at 0B message size. Both are far behind the claimed values. RDMA >> perf on the same set up was not too shabby - bandwidth ~10.6 GB/s, latency >> ~ 1.0 us. >> >> So I'm wondering if I'm missing anything in the OMPI setup that causes >> such a huge delta? OMPI command was simplely: mpirun -np 2 -H host1,host2 >> -mca btl openib,sm,self osu_bw >> >> Thanks, >> >> Yong Qin >> > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users