Re: [OMPI devel] rdmacm and udcm for 2.0.1 and RoCE

2017-01-12 Thread Jeff Squyres (jsquyres)
I checked: 1. This option existed in v2.0.1, but it no longer exists in the soon-to-be-released v2.0.2. 2. Here's where we removed it: https://github.com/open-mpi/ompi/pull/2350 There's no rationale listed on that PR, but the reason is because it's stale and no longer works. Sorry Dave. :-\

Re: [OMPI devel] rdmacm and udcm for 2.0.1 and RoCE

2017-01-12 Thread Jeff Squyres (jsquyres)
Did we just recently discuss the openib BTL failover capability and decide that it had bit-rotted? If so, we need to amend our documentation and disable the code. > On Jan 11, 2017, at 3:11 PM, Dave Turner wrote: > > > The btl_openib_receive_queues parameters that Howard provided > fixe

Re: [OMPI devel] rdmacm and udcm for 2.0.1 and RoCE

2017-01-11 Thread Dave Turner
The btl_openib_receive_queues parameters that Howard provided fixed our problem with getting 2.0.1 working with RoCE so thanks for all the help. However, we are seeing segfaults with this when configured with --enable-btl-openib-failover. I've included the configuration below that the packag

Re: [OMPI devel] rdmacm and udcm for 2.0.1 and RoCE

2017-01-06 Thread Dave Turner
Howard, The btl_openib_receive_queue parameters you provided work for me on a version of 2.0.1 I compiled in my home directory, but segfault on our globally installed version for some reason. Anyway, thanks for the info. I'll get together with my sys admin and we should be able to figure out

Re: [OMPI devel] rdmacm and udcm for 2.0.1 and RoCE

2017-01-05 Thread Howard Pritchard
Hi Dave, Sorry for the delayed response. Anyway, you have to use rdmacm for connection management when using ROCE. However, with 2.0.1 and later, you have to specify per peer QP info manually on the mpirun command line. Could you try rerunning with mpirun --mca btl_openib_receive_queues P,128,6