Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
Sorry -- I missed many of these mails today due to mail filtering (don't ask). FWIW: - I'm not opposed to adding LOOPBACK checks into OMPI to avoid this problem (I'm waiting for a patch, actually). I'm just saying that we're not going to get a release out immediately with this fix. Our next

Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
On Feb 8, 2010, at 5:09 PM, Jason Gunthorpe wrote: DESCRIPTION Associates a source address with an rdma_cm_id. The address may be wildcarded. If binding to a specific local address, the rdma_cm_id will also be bound to a local RDMA device. This statement is trying

Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
On Feb 8, 2010, at 5:13 PM, Sean Hefty wrote: Are you certain that rdma_bind_addr does NOT work with 127.0.0.1, and that this is now the problem? It does appear to work on OFED 1.4 and on 2.6.26 based on ucmatose. Is the problem really with rdma_bind_addr succeeding, or with

RE: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Sean Hefty
On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when attempting to bind to 127.0.0.1 per the email I sent Friday: http://www.spinics.net/lists/linux-rdma/msg02568.html This is what I see over IB on 2.6.26, with a couple extra prints added to cmatose:

Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Steve Wise
Sean, can you try openmpi? It fails for me, and yet ucmatose succeeds. I don't understand the difference yet... Sean Hefty wrote: On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when attempting to bind to 127.0.0.1 per the email I sent Friday:

RE: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Sean Hefty
Sean, can you try openmpi? It fails for me, and yet ucmatose succeeds. I don't understand the difference yet... I believe the issue is that rdma_bind_addr succeeds (returns 0), but no device is assigned to the rdma_cm_id (verbs field is NULL). This was a change from commit

Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
On Feb 8, 2010, at 6:48 PM, Sean Hefty wrote: rc = rdma_bind_addr(cm_id, ipaddr); if (rc || !cm_id-verbs) { rc = OMPI_SUCCESS; goto out3; } Ah, yes! Per the OMPI code you cited, I amended my printf's and see: [svbu-mpi.cisco.com:19315] FAILED to bind to 127.0.0.1:

Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Pradeep Satyanarayana
Jeff Squyres wrote: On Feb 8, 2010, at 7:30 PM, Pradeep Satyanarayana wrote: elm3b199:/usr/lib # /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --bynode --mca btl_openib_cpc_include rdmacm ring -- mpirun was unable to

Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
On Feb 8, 2010, at 7:50 PM, Pradeep Satyanarayana wrote: No, there is none. I got this command from one of the mails in the thread. What should I use instead? You need to compile and run an MPI program. ring is a typical test program that sends a message around in a ring. I think that OFED