Re: Issue with RDMA_CM on systems with multiple IB HCA's.

2010-07-23 Thread Larry
On Thu, Jul 22, 2010 at 3:15 PM, Or Gerlitz ogerl...@voltaire.com wrote: Hari Subramoni wrote: [subra...@amd6 perftest]$ ./ib_rdma_bw -c 172.16.1.5 11928: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=1 | 11928: Local address:  LID , QPN 00, PSN

[PATCH 0/4] RDMA/cxgb4: bug fixes.

2010-07-23 Thread Steve Wise
Here are some bug fixes for 2.6.36. Steve Wise (4): RDMA/cxgb4: Add timeouts when waiting for FW responses. RDMA/cxgb4: Set/Reset the EP timer inside EP lock. RDMA/cxgb4: Use correct control txq. RDMA/cxgb4: Fix race in fini path. drivers/infiniband/hw/cxgb4/cm.c

[PATCH 1/4] RDMA/cxgb4: Fix race in fini path.

2010-07-23 Thread Steve Wise
There exists a race condition where the app disconnects, which initiates an orderly close (via rdma_fini()) concurrently with an ingress abort condition which initiates an abortive close operation. Since rdma_fini() must be called without IRQs disabled, the fini can be called after the QP has

[PATCH 2/4] RDMA/cxgb4: Use correct control txq.

2010-07-23 Thread Steve Wise
There is only one control txq per tx channel. So use the port number as the queue index when sending. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c | 13 + drivers/infiniband/hw/cxgb4/iw_cxgb4.h |1 + 2 files changed, 10

[PATCH 3/4] RDMA/cxgb4: Set/Reset the EP timer inside EP lock.

2010-07-23 Thread Steve Wise
Endpoint timer manipulation needs to be done inside the lock. Otherwise we can get into a situation where a timer is stopped before it is started, which hits the WARN_ON() in stop_ep_timer(). Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c | 34

[PATCH 4/4] RDMA/cxgb4: Add timeouts when waiting for FW responses.

2010-07-23 Thread Steve Wise
Don't hang a host thread if the FW stops responding. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c | 22 ++ 1 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cm.c

Re: Issue with RDMA_CM on systems with multiple IB HCA's.

2010-07-23 Thread Or Gerlitz
Hari Subramoni subra...@cse.ohio-state.edu wrote: The nodes have LID's assigned to them and OpenSM is running fine. I've attached the configurations of the two hosts along with this e-mail. As Jonathan mentioned, we are able to ping between them. are the two HCAs on each of the nodes