Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-05 Thread Roland Dreier
Todd> Fix the broken endpoint and document the potential issue. Todd> As a potential workaround, permit a configuration option in Todd> OFED which sets an upper bound on CM related timeouts such Todd> that broken endpoints can be worked around. No, let's not have any "unbreak_my_sy

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-05 Thread Rimmer, Todd
> From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 04, 2006 5:18 PM > To: Rimmer, Todd > Cc: Sean Hefty; Ishai Rabinovitz; openib-general@openib.org > Subject: Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout > > Quoting r. Rimmer, T

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Michael S. Tsirkin
Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Michael S. Tsirkin wrote: > >>That sounds simple enough for now. (maybe set to 21 = 8 seconds = 2 > >>minutes > >>with retries?) Having the maximum apply at least to remote CM timeout + > >>serv

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Sean Hefty
Michael S. Tsirkin wrote: >>That sounds simple enough for now. (maybe set to 21 = 8 seconds = 2 minutes >>with retries?) Having the maximum apply at least to remote CM timeout + >>service >>timeout would be good. (It appears that Intel MPI just hit into this issue >>after setting the remote

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Michael S. Tsirkin
Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Michael S. Tsirkin wrote: > > So, let's just have a #define for now? And maybe print a warning so we can > > figure out what's wrong ... > > That sounds simple enough for now. (maybe set to 21 = 8

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Sean Hefty
Michael S. Tsirkin wrote: > So, let's just have a #define for now? And maybe print a warning so we can > figure out what's wrong ... That sounds simple enough for now. (maybe set to 21 = 8 seconds = 2 minutes with retries?) Having the maximum apply at least to remote CM timeout + service time

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Michael S. Tsirkin
Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Michael S. Tsirkin wrote: > > The way I see it, we trust e.g. the SRP target anyway. > > So I'm not sure there's much value in range-checking everything. > > The only reason we are touching this is b

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Sean Hefty
Michael S. Tsirkin wrote: > The way I see it, we trust e.g. the SRP target anyway. > So I'm not sure there's much value in range-checking everything. > The only reason we are touching this is because we see a > target reporting an obviously broken service timeout value in MRA - > in the hours range

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Michael S. Tsirkin
Quoting r. Rimmer, Todd <[EMAIL PROTECTED]>: > I recommend sticking with the IB spec for the various timeouts. So what do you suggest, wait a day or so to timeout the MRA? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Rimmer, Todd
> From: Michael S. Tsirkin > Sent: Wednesday, October 04, 2006 4:37 PM > To: Sean Hefty > Cc: Ishai Rabinovitz; openib-general@openib.org > Subject: Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout > > Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > > Subj

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > Should we just chop off too-big timeout > values onconditionally? That's the approach we are discussing with Sean. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailm

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Michael S. Tsirkin
Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Michael S. Tsirkin wrote: > > For remote cm timeout and service timeout this makes sense - they seem > > currently mostly taken out of the blue on implementations I've seen. > > > > But since the pa

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Sean Hefty
Michael S. Tsirkin wrote: > For remote cm timeout and service timeout this makes sense - they seem > currently mostly taken out of the blue on implementations I've seen. > > But since the packet lifetime comes from the SM, it actually has a chance > to reflect some knowledge about the network topo

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Roland Dreier
Ishai> There is a bug in SRP Engenio target that send a large Ishai> value as service timeout. (It gets 30 which mean timeout of Ishai> (2^(30-8))=4195 sec.) Such a long timeout is not Ishai> reasonable and it may leave the kernel module waiting on Ishai> wait_for_completion an

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Michael S. Tsirkin
Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Michael S. Tsirkin wrote: > >>There's several timeout values transfered and used by the cm, most notably > >>the > >>remote cm response timeout and packet life time. Does it make more sense > >>t

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-04 Thread Sean Hefty
Michael S. Tsirkin wrote: >>There's several timeout values transfered and used by the cm, most notably >>the >>remote cm response timeout and packet life time. Does it make more sense to >>have a single, generic timeout maximum instead? > > Hmm. I'm not sure - we are working around an actual b

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-03 Thread Michael S. Tsirkin
Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Ishai Rabinovitz wrote: > > There is a bug in SRP Engenio target that send a large value as service > > timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.) Such a long > > timeout is no

Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-03 Thread Sean Hefty
Ishai Rabinovitz wrote: > There is a bug in SRP Engenio target that send a large value as service > timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.) Such a long > timeout is not reasonable and it may leave the kernel module waiting on > wait_for_completion and may stuck a lot of pr

[openib-general] [PATCH] IB_CM: Limit the MRA timeout

2006-10-03 Thread Ishai Rabinovitz
There is a bug in SRP Engenio target that send a large value as service timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.) Such a long timeout is not reasonable and it may leave the kernel module waiting on wait_for_completion and may stuck a lot of processes. The following patch a