Todd> Fix the broken endpoint and document the potential issue.
Todd> As a potential workaround, permit a configuration option in
Todd> OFED which sets an upper bound on CM related timeouts such
Todd> that broken endpoints can be worked around.
No, let's not have any "unbreak_my_sy
> From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 04, 2006 5:18 PM
> To: Rimmer, Todd
> Cc: Sean Hefty; Ishai Rabinovitz; openib-general@openib.org
> Subject: Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout
>
> Quoting r. Rimmer, T
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] IB_CM: Limit the MRA timeout
>
> Michael S. Tsirkin wrote:
> >>That sounds simple enough for now. (maybe set to 21 = 8 seconds = 2
> >>minutes
> >>with retries?) Having the maximum apply at least to remote CM timeout +
> >>serv
Michael S. Tsirkin wrote:
>>That sounds simple enough for now. (maybe set to 21 = 8 seconds = 2 minutes
>>with retries?) Having the maximum apply at least to remote CM timeout +
>>service
>>timeout would be good. (It appears that Intel MPI just hit into this issue
>>after setting the remote
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] IB_CM: Limit the MRA timeout
>
> Michael S. Tsirkin wrote:
> > So, let's just have a #define for now? And maybe print a warning so we can
> > figure out what's wrong ...
>
> That sounds simple enough for now. (maybe set to 21 = 8
Michael S. Tsirkin wrote:
> So, let's just have a #define for now? And maybe print a warning so we can
> figure out what's wrong ...
That sounds simple enough for now. (maybe set to 21 = 8 seconds = 2 minutes
with retries?) Having the maximum apply at least to remote CM timeout +
service
time
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] IB_CM: Limit the MRA timeout
>
> Michael S. Tsirkin wrote:
> > The way I see it, we trust e.g. the SRP target anyway.
> > So I'm not sure there's much value in range-checking everything.
> > The only reason we are touching this is b
Michael S. Tsirkin wrote:
> The way I see it, we trust e.g. the SRP target anyway.
> So I'm not sure there's much value in range-checking everything.
> The only reason we are touching this is because we see a
> target reporting an obviously broken service timeout value in MRA -
> in the hours range
Quoting r. Rimmer, Todd <[EMAIL PROTECTED]>:
> I recommend sticking with the IB spec for the various timeouts.
So what do you suggest, wait a day or so to timeout the MRA?
--
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.
> From: Michael S. Tsirkin
> Sent: Wednesday, October 04, 2006 4:37 PM
> To: Sean Hefty
> Cc: Ishai Rabinovitz; openib-general@openib.org
> Subject: Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout
>
> Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> > Subj
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Should we just chop off too-big timeout
> values onconditionally?
That's the approach we are discussing with Sean.
--
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailm
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] IB_CM: Limit the MRA timeout
>
> Michael S. Tsirkin wrote:
> > For remote cm timeout and service timeout this makes sense - they seem
> > currently mostly taken out of the blue on implementations I've seen.
> >
> > But since the pa
Michael S. Tsirkin wrote:
> For remote cm timeout and service timeout this makes sense - they seem
> currently mostly taken out of the blue on implementations I've seen.
>
> But since the packet lifetime comes from the SM, it actually has a chance
> to reflect some knowledge about the network topo
Ishai> There is a bug in SRP Engenio target that send a large
Ishai> value as service timeout. (It gets 30 which mean timeout of
Ishai> (2^(30-8))=4195 sec.) Such a long timeout is not
Ishai> reasonable and it may leave the kernel module waiting on
Ishai> wait_for_completion an
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] IB_CM: Limit the MRA timeout
>
> Michael S. Tsirkin wrote:
> >>There's several timeout values transfered and used by the cm, most notably
> >>the
> >>remote cm response timeout and packet life time. Does it make more sense
> >>t
Michael S. Tsirkin wrote:
>>There's several timeout values transfered and used by the cm, most notably
>>the
>>remote cm response timeout and packet life time. Does it make more sense to
>>have a single, generic timeout maximum instead?
>
> Hmm. I'm not sure - we are working around an actual b
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] IB_CM: Limit the MRA timeout
>
> Ishai Rabinovitz wrote:
> > There is a bug in SRP Engenio target that send a large value as service
> > timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.) Such a long
> > timeout is no
Ishai Rabinovitz wrote:
> There is a bug in SRP Engenio target that send a large value as service
> timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.) Such a long
> timeout is not reasonable and it may leave the kernel module waiting on
> wait_for_completion and may stuck a lot of pr
There is a bug in SRP Engenio target that send a large value as service
timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.)
Such a long timeout is not reasonable and it may leave the kernel module
waiting on wait_for_completion and may stuck a lot of processes.
The following patch a
19 matches
Mail list logo