Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-13 Thread Bart Van Assche
On Wed, Jan 13, 2010 at 12:24 AM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: I'd actually think that the decline caused by exhausting credit on the initiator side is far worse than ib_post_recv.. If ib_post_recv really is noticable then maybe just force loading the RQ after XX

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-13 Thread Roland Dreier
The impact of the ib_post_recv() call is small but measurable. I'll publish measurement results soon. Interesting. If there is overhead you could amortize it by keeping some number of spare buffers posting, and batching your calls to ib_post_recv() (eg post a list of 8 buffers every 8

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-13 Thread Jason Gunthorpe
On Wed, Jan 13, 2010 at 08:23:27AM +0100, Bart Van Assche wrote: On Wed, Jan 13, 2010 at 12:24 AM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: [ ... ] Also, I couldn't tell for sure from a cursory examination of the patch, but the initiator must be designed to not stall

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-13 Thread Bart Van Assche
On Wed, Jan 13, 2010 at 7:16 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Jan 13, 2010 at 08:23:27AM +0100, Bart Van Assche wrote: On Wed, Jan 13, 2010 at 12:24 AM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: [ ... ] Also, I couldn't tell for sure

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-13 Thread Jason Gunthorpe
On Wed, Jan 13, 2010 at 07:57:26PM +0100, Bart Van Assche wrote: On Wed, Jan 13, 2010 at 7:16 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Jan 13, 2010 at 08:23:27AM +0100, Bart Van Assche wrote: On Wed, Jan 13, 2010 at 12:24 AM, Jason Gunthorpe

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-12 Thread Roland Dreier
Note: I do not know of any SRP targets that implement approach (2). As far as I know all SRP targets use approach (1). But no other SRP targets send SRP_CRED_REQ -- and that includes all native IB storage arrays that I know of. So there must be some other way around this (and having

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-12 Thread Bart Van Assche
On Tue, Jan 12, 2010 at 6:13 PM, Roland Dreier rdre...@cisco.com wrote:   Note: I do not know of any SRP targets that implement approach (2). As   far as I know all SRP targets use approach (1). But no other SRP targets send SRP_CRED_REQ -- and that includes all native IB storage arrays that

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-12 Thread Roland Dreier
Having separate buffer sets for command receives and response sends might seem a sensible way to design a target. But calling ib_post_recv() before sending a response back to the initiator is not acceptable because this increases communication latency. When a target uses separate buffer

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-12 Thread Jason Gunthorpe
On Tue, Jan 12, 2010 at 02:57:35PM -0800, Roland Dreier wrote: I doubt you could benchmark the overhead of calling ib_post_recv() in the full SRP protocol. Really, I bet it's less than 100 nanoseconds to form the work request and call ib_post_recv(). Maybe I'm wrong but I really expect the

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-12 Thread Bart Van Assche
On Wed, Jan 13, 2010 at 12:24 AM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: [ ... ] Also, I couldn't tell for sure from a cursory examination of the patch, but the initiator must be designed to not stall processing the RQ dependent on the SQ or the credit level, when using a

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-12 Thread Bart Van Assche
On Tue, Jan 12, 2010 at 11:57 PM, Roland Dreier rdre...@cisco.com wrote:   Having separate buffer sets for command receives and response sends   might seem a sensible way to design a target. But calling   ib_post_recv() before sending a response back to the initiator is not   acceptable

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-10 Thread Bart Van Assche
On Sun, Jan 10, 2010 at 7:41 AM, Roland Dreier rdre...@cisco.com wrote:   The patch I posted is really fixing the original bug. The problem was   that neither the SRP target nor the SRP initiator had support for   SRP_CRED_REQ. Support for SRP_CRED_REQ has to be added to both   software

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-09 Thread Roland Dreier
The patch I posted is really fixing the original bug. The problem was that neither the SRP target nor the SRP initiator had support for SRP_CRED_REQ. Support for SRP_CRED_REQ has to be added to both software components in order to fix this bug. There's no way for the target to return

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-07 Thread Bart Van Assche
On Thu, Jan 7, 2010 at 8:57 AM, Roland Dreier rdre...@cisco.com wrote:   It was only recently that I had the time to analyze this issue in   depth and that I added SRP_CRED_REQ support in SCST (checked in that   support about one week ago). Before SRP_CRED_REQ support was added in   SCST, the

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-07 Thread Bart Van Assche
On Thu, Jan 7, 2010 at 9:03 AM, Bart Van Assche bart.vanass...@gmail.com wrote: On Thu, Jan 7, 2010 at 8:57 AM, Roland Dreier rdre...@cisco.com wrote:   It was only recently that I had the time to analyze this issue in   depth and that I added SRP_CRED_REQ support in SCST (checked in that  

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-07 Thread David Dillow
On Thu, 2010-01-07 at 08:59 +0100, Bart Van Assche wrote: On Wed, Jan 6, 2010 at 10:40 PM, Roland Dreier rdre...@cisco.com wrote: Is that regular kernel coding practice, to run away with the work someone else did and to claim authorship ? As far as I know this is considered as

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-06 Thread Roland Dreier
When the SRP initiator is communicating with an SRP target under load it can happen that the SRP initiator locks up. The communication pattern that causes the lockup is as follows: * SRP initiator sends (req_lim - 2) SRP_CMD requests to the target. * The REQUEST LIMIT DELTA field of

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-06 Thread Roland Dreier
I agree that we should add support for SRP_CRED_REQ, but I'm not thrilled with the mix of changes in the patch, as well as the general aesthetics of the result. How about something like the following series -- posted as a follow up to this message -- with proper credit for Bart? I'll

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-06 Thread Roland Dreier
It was only recently that I had the time to analyze this issue in depth and that I added SRP_CRED_REQ support in SCST (checked in that support about one week ago). Before SRP_CRED_REQ support was added in SCST, the SRP initiator could not log the Unhandled SRP opcode message because it

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-06 Thread Bart Van Assche
On Wed, Jan 6, 2010 at 10:40 PM, Roland Dreier rdre...@cisco.com wrote:   Is that regular kernel coding practice, to run away with the work   someone else did and to claim authorship ? As far as I know this is   considered as impolite. I don't see with proper credit for Bart as claiming

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-04 Thread David Dillow
On Mon, 2010-01-04 at 08:13 +0100, Bart Van Assche wrote: On Mon, Jan 4, 2010 at 2:34 AM, David Dillow d...@thedillows.org wrote: I agree that we should add support for SRP_CRED_REQ, but I'm not thrilled with the mix of changes in the patch, as well as the general aesthetics of the result.

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-03 Thread David Dillow
On Sat, 2010-01-02 at 13:19 +0100, Bart Van Assche wrote: This patch fixes this issue by adding support for SRP_CRED_REQ information units in the SRP initiator. Additionally, this patch makes the per-session variable req_lim visible through sysfs, which makes observing the initiator state

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-03 Thread Bart Van Assche
On Mon, Jan 4, 2010 at 2:34 AM, David Dillow d...@thedillows.org wrote: On Sat, 2010-01-02 at 13:19 +0100, Bart Van Assche wrote: This patch fixes this issue by adding support for SRP_CRED_REQ information units in the SRP initiator. Additionally, this patch makes the per-session variable

[PATCH] IB/srp: Fix initiator lockup

2010-01-02 Thread Bart Van Assche
When the SRP initiator is communicating with an SRP target under load it can happen that the SRP initiator locks up. The communication pattern that causes the lockup is as follows: * SRP initiator sends (req_lim - 2) SRP_CMD requests to the target. * The REQUEST LIMIT DELTA field of each SRP_RSP

Re: [PATCH] IB/srp: Fix initiator lockup

2010-01-02 Thread Bart Van Assche
On Sat, Jan 2, 2010 at 5:05 PM, Chris Worley worl...@gmail.com wrote: Would this fix effect the Window's SRP initiator too? I'm not familiar with the WinOF codebase. But at first sight it looks like the WinOF SRP initiator doesn't support processing SRP_CRED_REQ information units either, which