Re: NFS over RDMA crashing

2014-03-12 Thread Jeff Layton
On Sat, 08 Mar 2014 14:13:44 -0600 Steve Wise sw...@opengridcomputing.com wrote: On 3/8/2014 1:20 PM, Steve Wise wrote: I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages,

Re: NFS over RDMA crashing

2014-03-12 Thread Trond Myklebust
On Mar 12, 2014, at 9:33, Jeff Layton jlay...@redhat.com wrote: On Sat, 08 Mar 2014 14:13:44 -0600 Steve Wise sw...@opengridcomputing.com wrote: On 3/8/2014 1:20 PM, Steve Wise wrote: I removed your change and started debugging original crash that happens on top-o-tree. Seems like

Re: NFS over RDMA crashing

2014-03-12 Thread Tom Tucker
Hi Trond, I think this patch is still 'off-by-one'. We'll take a look at this today. Thanks, Tom On 3/12/14 9:05 AM, Trond Myklebust wrote: On Mar 12, 2014, at 9:33, Jeff Layton jlay...@redhat.com wrote: On Sat, 08 Mar 2014 14:13:44 -0600 Steve Wise sw...@opengridcomputing.com wrote: On

Re: NFS over RDMA crashing

2014-03-12 Thread Jeffrey Layton
On Wed, 12 Mar 2014 10:05:24 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Mar 12, 2014, at 9:33, Jeff Layton jlay...@redhat.com wrote: On Sat, 08 Mar 2014 14:13:44 -0600 Steve Wise sw...@opengridcomputing.com wrote: On 3/8/2014 1:20 PM, Steve Wise wrote: I

Re: NFS over RDMA crashing

2014-03-12 Thread Trond Myklebust
On Mar 12, 2014, at 10:28, Jeffrey Layton jlay...@redhat.com wrote: On Wed, 12 Mar 2014 10:05:24 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Mar 12, 2014, at 9:33, Jeff Layton jlay...@redhat.com wrote: On Sat, 08 Mar 2014 14:13:44 -0600 Steve Wise

Re: NFS over RDMA crashing

2014-03-12 Thread Jeffrey Layton
On Wed, 12 Mar 2014 11:03:52 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Mar 12, 2014, at 10:28, Jeffrey Layton jlay...@redhat.com wrote: On Wed, 12 Mar 2014 10:05:24 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Mar 12, 2014, at 9:33, Jeff

Re: NFS over RDMA crashing

2014-03-08 Thread Steve Wise
On 3/7/2014 2:41 PM, Steve Wise wrote: Does this help? They must have added this for some reason, but I'm not seeing how it could have ever done anything --b. diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..e8f25ec 100644 ---

Re: NFS over RDMA crashing

2014-03-08 Thread Steve Wise
I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages, yes? I added a BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON(). Look crash svc_rqst.rq_next_page

Re: NFS over RDMA crashing

2014-03-08 Thread Steve Wise
On 3/8/2014 1:20 PM, Steve Wise wrote: I removed your change and started debugging original crash that happens on top-o-tree. Seems like rq_next_pages is screwed up. It should always be = rq_respages, yes? I added a BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON(). Look

RE: NFS over RDMA crashing

2014-03-07 Thread Steve Wise
...@opengridcomputing.com; linux- r...@vger.kernel.org; Or Gerlitz Subject: Re: NFS over RDMA crashing On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck

RE: NFS over RDMA crashing

2014-03-07 Thread Steve Wise
Does this help? They must have added this for some reason, but I'm not seeing how it could have ever done anything --b. diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..e8f25ec 100644 ---

RE: NFS over RDMA crashing

2013-02-18 Thread Yan Burman
-Original Message- From: J. Bruce Fields [mailto:bfie...@fieldses.org] Sent: Friday, February 15, 2013 17:28 To: Yan Burman Cc: linux-...@vger.kernel.org; sw...@opengridcomputing.com; linux- r...@vger.kernel.org; Or Gerlitz Subject: Re: NFS over RDMA crashing On Mon, Feb 11

Re: NFS over RDMA crashing

2013-02-15 Thread J. Bruce Fields
; Or Gerlitz Subject: Re: NFS over RDMA crashing On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable

RE: NFS over RDMA crashing

2013-02-11 Thread Yan Burman
-Original Message- From: J. Bruce Fields [mailto:bfie...@fieldses.org] Sent: Thursday, February 07, 2013 18:42 To: Yan Burman Cc: linux-...@vger.kernel.org; sw...@opengridcomputing.com; linux- r...@vger.kernel.org; Or Gerlitz Subject: Re: NFS over RDMA crashing On Wed, Feb 06

Re: NFS over RDMA crashing

2013-02-11 Thread J. Bruce Fields
; Or Gerlitz Subject: Re: NFS over RDMA crashing On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable

RE: NFS over RDMA crashing

2013-02-07 Thread Yan Burman
-Original Message- From: Jeff Becker [mailto:jeffrey.c.bec...@nasa.gov] Sent: Wednesday, February 06, 2013 19:07 To: Steve Wise Cc: Yan Burman; bfie...@fieldses.org; linux-...@vger.kernel.org; linux- r...@vger.kernel.org; Or Gerlitz; Tom Tucker Subject: Re: NFS over RDMA crashing

Re: NFS over RDMA crashing

2013-02-07 Thread J. Bruce Fields
On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP:

Re: NFS over RDMA crashing

2013-02-07 Thread Tom Tucker
On 2/6/13 3:28 PM, Steve Wise wrote: On 2/6/2013 4:24 PM, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8

NFS over RDMA crashing

2013-02-06 Thread Yan Burman
Hi. I have been trying to create a setup with NFS/RDMA, but I am getting crashes. I am using Mellanox ConnectX 3 HCA with SRIOV enabled with two KVM VMs with RHEL 6.3 getting one VF each. My test case is trying to use one VM's storage from another using NFS over RDMA (192.168.20.210 server,

Re: NFS over RDMA crashing

2013-02-06 Thread Steve Wise
On 2/6/2013 9:48 AM, Yan Burman wrote: When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). +tom tucker I'd try going back a few kernels,

Re: NFS over RDMA crashing

2013-02-06 Thread Jeff Becker
Hi. In case you're interested, I did the NFS/RDMA backports for OFED. I tested that NFS/RDMA in OFED 3.5 works on kernel 3.5, and also the RHEL 6.3 kernel. However, I did not test it with SRIOV. If you test it (OFED-3.5-rc6 was released last week), I'd like to know how it goes. Thanks. Jeff

Re: NFS over RDMA crashing

2013-02-06 Thread J. Bruce Fields
On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD

Re: NFS over RDMA crashing

2013-02-06 Thread Steve Wise
On 2/6/2013 4:24 PM, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [a05f3dfb]