Re: [openib-general] RDMA_CM_EVENT_UNREACHABLE(-ETIMEDOUT)

2006-08-09 Thread Or Gerlitz
Sean Hefty wrote: Or Gerlitz wrote: Conceptually, do we agree that it would be better not to expose IB reject code to the CMA consumers? that is in the spirit of the CMA being a framework for doing connection management in RDMA transport independent fashion, etc. My concern is that I do

Re: [openib-general] RDMA_CM_EVENT_UNREACHABLE(-ETIMEDOUT)

2006-08-08 Thread Or Gerlitz
Sean Hefty wrote: Or Gerlitz wrote: 0) ones that are of no interest to the CMA nor to the ULP above it but rather only to the local CM (are there any?) 1) ones that *must* be handled internally by the CMA (are there any?) 2) ones that *can* be handled internally by the CMA (eg stale-conn)

Re: [openib-general] RDMA_CM_EVENT_UNREACHABLE(-ETIMEDOUT)

2006-08-08 Thread Sean Hefty
Or Gerlitz wrote: Conceptually, do we agree that it would be better not to expose IB reject code to the CMA consumers? that is in the spirit of the CMA being a framework for doing connection management in RDMA transport independent fashion, etc. My concern is that I do not want to mask the

Re: [openib-general] RDMA_CM_EVENT_UNREACHABLE(-ETIMEDOUT)

2006-08-07 Thread Or Gerlitz
Sean Hefty wrote: Or Gerlitz wrote: Is it correct that with the gen2 code, the remote **CM** will reconnect on that case? I don't think so. The QP needs to move into timewait, so a new connection request is needed with a different QPN. Just to make sure, you replaced CM id with QP and

Re: [openib-general] RDMA_CM_EVENT_UNREACHABLE(-ETIMEDOUT)

2006-08-07 Thread Sean Hefty
Or Gerlitz wrote: 0) ones that are of no interest to the CMA nor to the ULP above it but rather only to the local CM (are there any?) 1) ones that *must* be handled internally by the CMA (are there any?) 2) ones that *can* be handled internally by the CMA (eg stale-conn) 3) ones that

Re: [openib-general] RDMA_CM_EVENT_UNREACHABLE(-ETIMEDOUT)

2006-08-03 Thread Or Gerlitz
Sean Hefty wrote: I agree. This sounds like an issue where the CM is treating the REQ as an old REQ for the established connection, versus a REQ for a new connection. The desired behavior in this situation would be to reject the new request, and force the remote side to disconnect. Sean,

Re: [openib-general] RDMA_CM_EVENT_UNREACHABLE(-ETIMEDOUT)

2006-08-03 Thread Sean Hefty
Or Gerlitz wrote: Is it correct that with the gen2 code, the remote **CM** will reconnect on that case? I don't think so. The QP needs to move into timewait, so a new connection request is needed with a different QPN. I see in cm.c :: cm_rej_handler() that when the state is IB_CM_REQ_SENT

[openib-general] RDMA_CM_EVENT_UNREACHABLE(-ETIMEDOUT)

2006-08-02 Thread Eric Barton
I've had a report of rdma_connect() failing with a callback event type of RDMA_CM_EVENT_UNREACHABLE and status -ETIMEDOUT although the peer node was up and running at the time. It seems this can be reproduced as follows... 1. Establish a connection between nodes A and B 2. Reboot node A 3.

Re: [openib-general] RDMA_CM_EVENT_UNREACHABLE(-ETIMEDOUT)

2006-08-02 Thread Or Gerlitz
Eric Barton wrote: I've had a report of rdma_connect() failing with a callback event type of RDMA_CM_EVENT_UNREACHABLE and status -ETIMEDOUT although the peer node was up and running at the time. It seems this can be reproduced as follows... 1. Establish a connection between nodes A and B

Re: [openib-general] RDMA_CM_EVENT_UNREACHABLE(-ETIMEDOUT)

2006-08-02 Thread Sean Hefty
Or Gerlitz wrote: My guess this is related to the CM not the SM. I think there is a chance that the CM on node B does not treat the REQ sent by A after the reboot as stale connection situation and hence just **silently** dtop it, that is not REJ is sent. I agree. This sounds like an