HI,
When the segmentation fault happens, I get the following trace:
(gdb) bt
#0 0x7fffee4f007d in ibv_close_xrcd (xrcd=0x2) at
/usr/include/infiniband/verbs.h:1227
#1 0x7fffee4f055f in mca_btl_openib_close_xrc_domain (device=0xfb20c0)
at btl_openib_xrc.c:104
#2 0x7fffee4da0
In 1.10.x is possible for the BTLs to be in use by ether ob1 or an
oshmem component. In 2.x one-sided components can also use BTLs. The MTL
interface doesn't not provide support for accessing hardware atomics and
RDMA. As for UD it stands for Unconnected Datagram. Its usage gets
better messaage ra
Hey Nathan,
I thought only 1 pml could be loaded at a time, and the only pml that
could use btl's was ob1. If that is the case, how can the openib btl run
at the same time as cm and yalla?
Also, what is UD?
Thanks,
David
On 04/21/2016 09:25 AM, Nathan Hjelm wrote:
The openib btl should be
The openib btl should be able to run alongside cm/mxm or yalla. If I
have time this weekend I will get on the mustang and see what the
problem is. The best answer is to change the openmpi-mca-params.conf in
the install to have pml = ob1. I have seen little to no benefit with
using MXM on mustang.
David, thanks for the info you provided.
I will try to dig in further to see what might be causing this issue.
In the meantime, maybe Nathan can please comment about the openib btl
behavior here?
Thanks,
Alina.
On Wed, Apr 20, 2016 at 8:01 PM, David Shrader wrote:
> Hello Alina,
>
> Thank you
Hello Alina,
Thank you for the information about how the pml components work. I knew
that the other components were being opened and ultimately closed in
favor of yalla, but I didn't realize that initial open would cause a
persistent change in the ompi runtime.
Here's the information you req
Hi David,
I was able to reproduce the issue you reported.
When the command line doesn't specify the components to use, ompi will try
to load/open all the ones available (and close them in the end) and then
choose the components according to their priority and whether or not they
were opened succe
Hi, David
We are looking into your report.
Best,
Josh
On Tue, Apr 19, 2016 at 4:41 PM, David Shrader wrote:
> Hello,
>
> I have been investigating using XRC on a cluster with a mellanox
> interconnect. I have found that in a certain situation I get a seg fault. I
> am using 1.10.2 compiled wi
Hello,
I have been investigating using XRC on a cluster with a mellanox
interconnect. I have found that in a certain situation I get a seg
fault. I am using 1.10.2 compiled with gcc 5.3.0, and the simplest
configure line that I have found that still results in the seg fault is
as follows:
$