Re: Work completion error: "transport retry counter exceeded"

2012-08-01 Thread Ira Weiny
On Fri, 27 Jul 2012 10:42:12 -0700 Ira Weiny wrote: > First, I have gotten pulled into another project so I have not been able to > debug this further. > > I __really__ appreciate all the responses and will report back when I have > found more information. I feel really stupid admitting this.

RE: Work completion error: "transport retry counter exceeded"

2012-07-27 Thread Paul Grun
Of Albert Strasheim > Sent: Friday, July 27, 2012 10:33 AM > To: Paul Grun > Cc: Roland Dreier; Ira Weiny; linux-rdma@vger.kernel.org > Subject: Re: Work completion error: "transport retry counter exceeded" > > Hello > > On Fri, Jul 27, 2012 at 6:50 PM, Paul Grun >

RE: Work completion error: "transport retry counter exceeded"

2012-07-27 Thread Paul Grun
M > To: Albert Strasheim > Cc: Ira Weiny; linux-rdma@vger.kernel.org > Subject: Re: Work completion error: "transport retry counter exceeded" > > > I wonder if I might be seeing the same thing... > > > > How does one choose a good value for this setting? >

Re: Work completion error: "transport retry counter exceeded"

2012-07-27 Thread Ira Weiny
First, I have gotten pulled into another project so I have not been able to debug this further. I __really__ appreciate all the responses and will report back when I have found more information. Thanks! On Fri, 27 Jul 2012 19:33:18 +0200 Albert Strasheim wrote: > Hello > > On Fri, Jul 27, 2

Re: Work completion error: "transport retry counter exceeded"

2012-07-27 Thread Albert Strasheim
Hello On Fri, Jul 27, 2012 at 6:50 PM, Paul Grun wrote: > Ira, are you by any chance sending immediate data with your RDMA Write? Out of curiosity, what would be the significance if the answer to this question was yes? Regards Albert -- To unsubscribe from this list: send the line "unsubscribe

Re: Work completion error: "transport retry counter exceeded"

2012-07-27 Thread Roland Dreier
On Fri, Jul 27, 2012 at 9:50 AM, Paul Grun wrote: > Note that both an "RNR-NAK retry count exceeded" and a "timeout" error are > reported in the same way, as a locally detected error. Not quite right. There are two different work completion statuses: IBV_WC_RETRY_EXC_ERR IBV_WC_

Re: Work completion error: "transport retry counter exceeded"

2012-07-26 Thread Roland Dreier
> I wonder if I might be seeing the same thing... > > How does one choose a good value for this setting? > > Apparently it maps to 4.096 x 2 ^ attr.timeout microseconds. > > What's the maximum value one can set here? > > What can go wrong if one goes for the maximum value? In theory you want a tim

Re: Work completion error: "transport retry counter exceeded"

2012-07-26 Thread Albert Strasheim
Hello On Thu, Jul 26, 2012 at 9:15 AM, Roland Dreier wrote: > On Wed, Jul 25, 2012 at 7:07 PM, Ira Weiny wrote: >> attr.timeout = 14; > Is this timeout sufficient to account for the round trip on > the fabric and the ack delay on the remote HCA? > I don't think there are any othe

Re: Work completion error: "transport retry counter exceeded"

2012-07-26 Thread Roland Dreier
On Wed, Jul 25, 2012 at 7:07 PM, Ira Weiny wrote: > attr.timeout = 14; Is this timeout sufficient to account for the round trip on the fabric and the ack delay on the remote HCA? I don't think there are any other attributes that would affect getting transport retries. - R. -- T

Work completion error: "transport retry counter exceeded"

2012-07-25 Thread Ira Weiny
I am at a loss. I am hacking some RDMA code to do an RDMA write from a server to a client. I have it working perfectly on a small 2 node test system. When I move the code to another system I am getting a "transport retry counter exceeded" error. I just can't figure out why an RDMA Write is timi