I'm not sure why nobody has encountered this issue on the mailing list.
After some fiddling I was finally able to isolated to a performance
regression introduced between 2.0.1 and 2.0.2. While trying to binary
search the exact commit causing the performance regression my colleague
brought this to my attention,

https://github.com/open-mpi/ompi/issues/3003

Yes this is exactly the issue!

So just in case anybody run into the same issue again ...

On Tue, Mar 7, 2017 at 12:59 PM, Yong Qin <yong....@gmail.com> wrote:

> OK, did some testing with MVAPICH and everything is normal so this is
> clearly with OMPI. Is there anything that I should try?
>
> Thanks,
>
> Yong Qin
>
> On Mon, Mar 6, 2017 at 11:46 AM, Yong Qin <yong....@gmail.com> wrote:
>
>> Hi,
>>
>> I'm wondering if anybody who has done perf testing on Mellanox EDR with
>> OMPI can shed some light here?
>>
>> We have a pair of EDR HCAs connected back to back. We are testing with
>> two dual-socked Intel Xeon E5-2670v3 (Haswell) nodes @2.30GHz, 64GB memory.
>> OS is Scientific Linux 6.7 with kernel
>> 2.6.32-642.6.2.el6.x86_64, vanilla OFED 3.18-2. HCAs are running the
>> latest FW. OMPI 2.0.2.
>>
>> OSU bandwidth test only delivers ~5.5 GB/s at 4MB message size, latency
>> is ~2.7 us at 0B message size. Both are far behind the claimed values. RDMA
>> perf on the same set up was not too shabby - bandwidth ~10.6 GB/s, latency
>> ~ 1.0 us.
>>
>> So I'm wondering if I'm missing anything in the OMPI setup that causes
>> such a huge delta? OMPI command was simplely: mpirun -np 2 -H host1,host2
>> -mca btl openib,sm,self osu_bw
>>
>> Thanks,
>>
>> Yong Qin
>>
>
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to