Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-22 Thread Sean Hefty
>Cool, I would go for XOR-ing a random value with the **local id** . > >Sean, my understanding it can be narrowed for doing so in: > >1) cm_alloc_id() after calling idr_get_new_above() >2) cm_free_id() before calling idr_remove() >3) cm_get_id() before calling idr_find() > >and initializing the ran

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-22 Thread Or Gerlitz
Sean Hefty wrote: > When a new REQ is received, we enter its timewait structure into two trees: one > sorted by remote ID, one sorted by remote QPN. If the REQ is new, both would > succeed, and timewait_info would be NULL. Since timewait_info is not NULL, we > are dealing with a REQ that re-us

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-22 Thread Or Gerlitz
Roland Dreier wrote: > Sean> If we record a base offset, we can start at any random > Sean> number. We just need to always add/subtract the base when > Sean> getting a value from the IDR. > > Good point -- or better still, we could XOR in a random bit pattern. > That way we don't have

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Roland Dreier
Sean> If we record a base offset, we can start at any random Sean> number. We just need to always add/subtract the base when Sean> getting a value from the IDR. Good point -- or better still, we could XOR in a random bit pattern. That way we don't have to keep straight when to add and

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Sean Hefty
>> If we get here, this means that the REQ was a new REQ and not a >> duplicate, but the remote_id or remote_qpn is already in use. We need >> to reject the new REQ as containing stale data. > >I don't follow, if we get to the else case its as of cm_get_id() >returning NULL. This holds when idr_fi

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Sean Hefty
>> Just to emphasize what Sean has pointed out, you are asking how can a CM >> consumer know that a **local** QPN is not in the timewait state >> according to the **remote** CM. Since the issue is with the remote CM, >> it seems to me that pushing down timewait into verbs is not the correct >> dire

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Sean Hefty
>How about (for the meantime, till this rework is designed && done) going >to projecting the initial random local id into the range of (say) >[0-1022] (i think 1023 is prime, if not choose a prime near it) this way >with very good probability and with very little overhead on memory >consumption a c

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Roland Dreier
Or> How about (for the meantime, till this rework is designed && Or> done) going to projecting the initial random local id into the Or> range of (say) [0-1022] (i think 1023 is prime, if not choose Or> a prime near it) this way with very good probability and with Or> very little

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Or Gerlitz
This email appear in the archive, but seems not to be distributed to the subscribers so i am reposting it. Or Gerlitz wrote: > Sean Hefty wrote: >> Even if we pushed timewait handling under verbs, a user could always >> get a QP that the remote side thinks is connected. The original >> connec

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Or Gerlitz
This email appear in the archive, but seems not to be distributed to the subscribers so i am reposting it. Or Gerlitz wrote: > Arlin Davis wrote: >> We are running into connection reject issues (IB_CM_REJ_STALE_CONN) >> with our application under heavy load and lots of connections. >> >> We occa

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Or Gerlitz
>>> + } else >>> + cm_issue_rej(work->port, work->mad_recv_wc, >>> +IB_CM_REJ_STALE_CONN, >>> CM_MSG_RESPONSE_REQ, >>> +NULL, 0); >> >> >> what is this case? there is no entry but there is

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Or Gerlitz
Sean Hefty wrote: > Or Gerlitz wrote: >> If you don't mind (also related to the patch you have sent Eric of >> randomizing the initial local cm id) to get into this deeper, can we do > There's an issue trying to randomize the initial local CM ID. The way > the IDR works, if you start at a hig

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-17 Thread Sean Hefty
Or Gerlitz wrote: > If you don't mind (also related to the patch you have sent Eric of > randomizing the initial local cm id) to get into this deeper, can we do There's an issue trying to randomize the initial local CM ID. The way the IDR works, if you start at a high value, then the IDR size

Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-16 Thread Sean Hefty
Arlin Davis wrote: > How can a consumer know for sure that the new QP will not be in a > timewait state according to the CM? Given that the QP may have been in use by another process, I don't think that there's any way for the new owner to know. > Does it make sense to push the timewait functio

[openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-16 Thread Arlin Davis
We are running into connection reject issues (IB_CM_REJ_STALE_CONN) with our application under heavy load and lots of connections. We occassionally get a reject based on the QP being in timewait state leftover from a prior connection. It appears that the CM keeps track of the QP's in timewait