Sorry -- I missed many of these mails today due to mail filtering (don't ask).
FWIW:
- I'm not opposed to adding LOOPBACK checks into OMPI to avoid this problem
(I'm waiting for a patch, actually). I'm just saying that we're not going to
get a release out immediately with this fix. Our next
On Feb 8, 2010, at 5:09 PM, Jason Gunthorpe wrote:
DESCRIPTION
Associates a source address with an rdma_cm_id. The address may be
wildcarded. If binding to a specific local address, the rdma_cm_id
will also be bound to a local RDMA device.
This statement is trying
On Feb 8, 2010, at 5:13 PM, Sean Hefty wrote:
Are you certain that rdma_bind_addr does NOT work with 127.0.0.1, and that
this
is now the problem?
It does appear to work on OFED 1.4 and on 2.6.26 based on ucmatose. Is the
problem really with rdma_bind_addr succeeding, or with
On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when attempting to
bind to 127.0.0.1 per the email I sent Friday:
http://www.spinics.net/lists/linux-rdma/msg02568.html
This is what I see over IB on 2.6.26, with a couple extra prints added to
cmatose:
Sean, can you try openmpi? It fails for me, and yet ucmatose succeeds.
I don't understand the difference yet...
Sean Hefty wrote:
On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when attempting to
bind to 127.0.0.1 per the email I sent Friday:
Sean, can you try openmpi? It fails for me, and yet ucmatose succeeds.
I don't understand the difference yet...
I believe the issue is that rdma_bind_addr succeeds (returns 0), but no device
is assigned to the rdma_cm_id (verbs field is NULL).
This was a change from commit
On Feb 8, 2010, at 6:48 PM, Sean Hefty wrote:
rc = rdma_bind_addr(cm_id, ipaddr);
if (rc || !cm_id-verbs) {
rc = OMPI_SUCCESS;
goto out3;
}
Ah, yes! Per the OMPI code you cited, I amended my printf's and see:
[svbu-mpi.cisco.com:19315] FAILED to bind to 127.0.0.1:
Jeff Squyres wrote:
On Feb 8, 2010, at 7:30 PM, Pradeep Satyanarayana wrote:
elm3b199:/usr/lib # /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --bynode
--mca btl_openib_cpc_include rdmacm ring
--
mpirun was unable to
On Feb 8, 2010, at 7:50 PM, Pradeep Satyanarayana wrote:
No, there is none. I got this command from one of the mails in the thread.
What should I use instead?
You need to compile and run an MPI program. ring is a typical test program
that sends a message around in a ring. I think that OFED