Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Doug Ledford
On 07/10/2015 07:34 PM, Jason Gunthorpe wrote: > On Fri, Jul 10, 2015 at 06:27:59PM -0400, Doug Ledford wrote: > >> 1) An RDMA connection exists or can be created (TOE and iSCSI offload do >> not do this, so something else would have to be listening and accepting >> incoming RDMA connections) >> 2

Re: [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls

2015-07-10 Thread Jason Gunthorpe
On Fri, Jul 10, 2015 at 10:53:40AM -0400, Chuck Lever wrote: > It is certainly possible to examine the device’s max_sge field > in rpcrdma_ep_create() and fail transport creation if the > device’s max_sge is less than RPC_MAX_IOVS. I just want to draw Sagi's attention to this problem, when consid

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Doug Ledford
On 07/10/2015 03:54 PM, Jason Gunthorpe wrote: > On Fri, Jul 10, 2015 at 02:42:45PM -0400, Tom Talpey wrote: > For the proposed iSER patch the problem is very acute, iser makes a single PD and phys MR at boot time for each device. This means there is a single machine wide unchanging

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Doug Ledford
On 07/10/2015 02:42 PM, Tom Talpey wrote: > However, it's an extremely bad idea to codify writable privileged rmr's > in the API as best practice. So under no circumstance should that become > the long term plan. Agree 100%. Which is why I requested a big warning in the dmesg output that pointed

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Doug Ledford
On 07/10/2015 04:57 PM, Jason Gunthorpe wrote: > On Fri, Jul 10, 2015 at 01:56:36PM -0400, Doug Ledford wrote: > >> Are there security issues? Yes. Are we going to solve them in this >> patch set? No. Especially since those security issues extend beyond >> iSER + iWARP. > > I think my biggest

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Jason Gunthorpe
On Fri, Jul 10, 2015 at 01:56:36PM -0400, Doug Ledford wrote: > Are there security issues? Yes. Are we going to solve them in this > patch set? No. Especially since those security issues extend beyond > iSER + iWARP. I think my biggest concern is we don't inadvertently open a security hole on

Re: [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls

2015-07-10 Thread Chuck Lever
> On Jul 10, 2015, at 4:43 PM, Anna Schumaker wrote: > > Hi Chuck, > > On 07/09/2015 04:43 PM, Chuck Lever wrote: >> Only the RPC/RDMA header is sent when making an RDMA_NOMSG call. >> That header resides in the first element of the iovec array >> passed to rpcrdma_ep_post(). >> >> Instead of

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Jason Gunthorpe
On Fri, Jul 10, 2015 at 01:54:20PM -0600, Jason Gunthorpe wrote: > On Fri, Jul 10, 2015 at 02:42:45PM -0400, Tom Talpey wrote: > > > >>For the proposed iSER patch the problem is very acute, iser makes a > > >>single PD and phys MR at boot time for each device. This means there > > >>is a single ma

Re: [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls

2015-07-10 Thread Anna Schumaker
Hi Chuck, On 07/09/2015 04:43 PM, Chuck Lever wrote: > Only the RPC/RDMA header is sent when making an RDMA_NOMSG call. > That header resides in the first element of the iovec array > passed to rpcrdma_ep_post(). > > Instead of special casing the iovec element with the pad, just > sync all the el

Re: [PATCH v1 05/12] xprtrdma: Account for RPC/RDMA header size when deciding to inline

2015-07-10 Thread Chuck Lever
On Jul 10, 2015, at 4:08 PM, Anna Schumaker wrote: > Hi Chuck, > > On 07/09/2015 04:42 PM, Chuck Lever wrote: >> When marshaling RPC/RDMA requests, ensure the combined size of >> RPC/RDMA header and RPC header do not exceed the inline threshold. >> Endpoints typically reject RPC/RDMA messages t

Re: [PATCH v1 05/12] xprtrdma: Account for RPC/RDMA header size when deciding to inline

2015-07-10 Thread Anna Schumaker
Hi Chuck, On 07/09/2015 04:42 PM, Chuck Lever wrote: > When marshaling RPC/RDMA requests, ensure the combined size of > RPC/RDMA header and RPC header do not exceed the inline threshold. > Endpoints typically reject RPC/RDMA messages that exceed the size > of their receive buffers. > > Signed-off

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Jason Gunthorpe
On Fri, Jul 10, 2015 at 02:42:45PM -0400, Tom Talpey wrote: > >>For the proposed iSER patch the problem is very acute, iser makes a > >>single PD and phys MR at boot time for each device. This means there > >>is a single machine wide unchanging rkey that allows remote physical > >>memory access. A

Re: [PATCH v1 02/12] xprtrdma: Raise maximum payload size to one megabyte

2015-07-10 Thread Anna Schumaker
On 07/10/2015 03:33 PM, Chuck Lever wrote: > > On Jul 10, 2015, at 3:21 PM, Anna Schumaker wrote: > >> Hi Chuck, >> >> On 07/09/2015 04:41 PM, Chuck Lever wrote: >>> The point of larger rsize and wsize is to reduce the per-byte cost >>> of memory registration and deregistration. Modern HCAs can

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Christoph Hellwig
On Thu, Jul 09, 2015 at 09:52:59AM -0400, Chuck Lever wrote: > There is one remaining kernel user of ib_reg_phys_mr() in 4.2: Lustre. It's in the staging tree, which proper in-tree code doesn't have to cater for. So as soon as sunrpc is done using the interface we can and should kill it off. -- T

Re: [PATCH v1 02/12] xprtrdma: Raise maximum payload size to one megabyte

2015-07-10 Thread Chuck Lever
On Jul 10, 2015, at 3:21 PM, Anna Schumaker wrote: > Hi Chuck, > > On 07/09/2015 04:41 PM, Chuck Lever wrote: >> The point of larger rsize and wsize is to reduce the per-byte cost >> of memory registration and deregistration. Modern HCAs can typically >> handle a megabyte or more with a single

Re: [PATCH v1 02/12] xprtrdma: Raise maximum payload size to one megabyte

2015-07-10 Thread Anna Schumaker
Hi Chuck, On 07/09/2015 04:41 PM, Chuck Lever wrote: > The point of larger rsize and wsize is to reduce the per-byte cost > of memory registration and deregistration. Modern HCAs can typically > handle a megabyte or more with a single registration operation. > > Signed-off-by: Chuck Lever > ---

[BUG] mellanox IB driver fails to load on large config

2015-07-10 Thread andrew banman
I'm seeing a large number of allocation errors originating from the Mellanox IB driver when booting the 4.2-rc1 kernel on a 4096cpu 32TB memory system: 8<--- mlx4_ib_alloc_eqs: Can't allocate EQ 64; reverting to legacy mlx4_ib_alloc_eqs: Can't allocate EQ 65; reverting to legacy mlx4_ib_alloc_e

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Tom Talpey
On 7/10/2015 1:56 PM, Doug Ledford wrote: On 07/10/2015 12:11 PM, Jason Gunthorpe wrote: On Fri, Jul 10, 2015 at 09:22:24AM -0400, Tom Talpey wrote: and it is enabled only when the RDMA Read is active. ??? How is that done? ib_get_dma_mr is defined to return a remote usable rkey that is valid

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Chuck Lever
On Jul 10, 2015, at 1:56 PM, Doug Ledford wrote: > On 07/10/2015 12:11 PM, Jason Gunthorpe wrote: >> On Fri, Jul 10, 2015 at 09:22:24AM -0400, Tom Talpey wrote: >>> and it is enabled only when the RDMA Read is active. >> >> ??? How is that done? ib_get_dma_mr is defined to return a remote >> us

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Doug Ledford
On 07/10/2015 12:11 PM, Jason Gunthorpe wrote: > On Fri, Jul 10, 2015 at 09:22:24AM -0400, Tom Talpey wrote: >> and it is enabled only when the RDMA Read is active. > > ??? How is that done? ib_get_dma_mr is defined to return a remote > usable rkey that is valid from when it it returns until the M

Re: kernel memory registration (was: RDMA/core: Transport-independent access flags)

2015-07-10 Thread Jason Gunthorpe
On Fri, Jul 10, 2015 at 11:55:29AM +0300, Sagi Grimberg wrote: > IMHO, memory registration is memory registration. The fact that we are > distinguishing between local and remote might be a sign that this might > be a wrong direction to take. Sorry. I belive they are very different, yes, if you si

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Jason Gunthorpe
On Fri, Jul 10, 2015 at 09:22:24AM -0400, Tom Talpey wrote: > and it is enabled only when the RDMA Read is active. ??? How is that done? ib_get_dma_mr is defined to return a remote usable rkey that is valid from when it it returns until the MR is destroyed. NFS creates one mr early on, it does not

Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum

2015-07-10 Thread J. Bruce Fields
On Fri, Jul 10, 2015 at 11:59:20AM -0400, Chuck Lever wrote: > > On Jul 10, 2015, at 11:54 AM, J. Bruce Fields wrote: > > > On Fri, Jul 10, 2015 at 10:59:48AM -0400, Chuck Lever wrote: > >> > >> On Jul 10, 2015, at 10:18 AM, bfie...@fieldses.org wrote: > >> > >>> On Thu, Jul 09, 2015 at 04:45:

Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum

2015-07-10 Thread Chuck Lever
On Jul 10, 2015, at 11:54 AM, J. Bruce Fields wrote: > On Fri, Jul 10, 2015 at 10:59:48AM -0400, Chuck Lever wrote: >> >> On Jul 10, 2015, at 10:18 AM, bfie...@fieldses.org wrote: >> >>> On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote: Increased to 1 megabyte. >>> >>> Why not

Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum

2015-07-10 Thread J. Bruce Fields
On Fri, Jul 10, 2015 at 10:59:48AM -0400, Chuck Lever wrote: > > On Jul 10, 2015, at 10:18 AM, bfie...@fieldses.org wrote: > > > On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote: > >> Increased to 1 megabyte. > > > > Why not more or less? > > > > Why do we even have this constant, wh

Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum

2015-07-10 Thread Chuck Lever
On Jul 10, 2015, at 10:18 AM, bfie...@fieldses.org wrote: > On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote: >> Increased to 1 megabyte. > > Why not more or less? > > Why do we even have this constant, why shouldn't we just use > RPCSVC_MAXPAYLOAD? The payload size maximum for RDMA

Re: [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls

2015-07-10 Thread Chuck Lever
On Jul 10, 2015, at 10:11 AM, Devesh Sharma wrote: > On Fri, Jul 10, 2015 at 6:28 PM, Tom Talpey wrote: >> On 7/10/2015 7:29 AM, Devesh Sharma wrote: >>> >>> we need to honor the max limits of device by checking >>> dev_attr.max_sge? a vendor may not support 4 sges. >> >> >> iWARP requires a

Re: [PATCH v1 03/12] xprtrdma: Increase default credit limit

2015-07-10 Thread Devesh Sharma
Yes, we are covered here, I took reference of 4.1-rc4 and that series was pulled in 4.1-rc7. I will update my test-bench and re-validate the numbers. -Regards On Fri, Jul 10, 2015 at 8:03 PM, Chuck Lever wrote: > > On Jul 10, 2015, at 6:45 AM, Devesh Sharma > wrote: > >> Increasing the defau

Re: [PATCH v1 03/12] xprtrdma: Increase default credit limit

2015-07-10 Thread Chuck Lever
On Jul 10, 2015, at 6:45 AM, Devesh Sharma wrote: > Increasing the default slot table entries will increase the MR > requirements per mount. Yes, but: > Currently, with 32 as default Client ends up allocating 2178 frmrs > (ref: kernel 4.1-rc4) for a single mount. With 128 frmr requirement > fo

Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum

2015-07-10 Thread J. Bruce Fields
On Fri, Jul 10, 2015 at 10:18:14AM -0400, J. Bruce Fields wrote: > On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote: > > Increased to 1 megabyte. > > Why not more or less? > > Why do we even have this constant, why shouldn't we just use > RPCSVC_MAXPAYLOAD? (That one question aside th

Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum

2015-07-10 Thread J. Bruce Fields
On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote: > Increased to 1 megabyte. Why not more or less? Why do we even have this constant, why shouldn't we just use RPCSVC_MAXPAYLOAD? --b. > > Signed-off-by: Chuck Lever > --- > > include/linux/sunrpc/svc_rdma.h |2 +- > 1 files ch

Re: [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls

2015-07-10 Thread Devesh Sharma
On Fri, Jul 10, 2015 at 6:28 PM, Tom Talpey wrote: > On 7/10/2015 7:29 AM, Devesh Sharma wrote: >> >> we need to honor the max limits of device by checking >> dev_attr.max_sge? a vendor may not support 4 sges. > > > iWARP requires a minimum of 4 send SGEs (draft-hilland-verbs 8.1.3.2) > >An RI

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-10 Thread Tom Talpey
On 7/9/2015 6:53 PM, Jason Gunthorpe wrote: On Thu, Jul 09, 2015 at 06:18:26PM -0400, Doug Ledford wrote: A lot of this discussion is interesting and has gone off in an area that I think is worth pursuing in the long run. However, in the short run, this patch was a minor cleanup/simplification

Re: [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls

2015-07-10 Thread Tom Talpey
On 7/10/2015 7:29 AM, Devesh Sharma wrote: we need to honor the max limits of device by checking dev_attr.max_sge? a vendor may not support 4 sges. iWARP requires a minimum of 4 send SGEs (draft-hilland-verbs 8.1.3.2) An RI MUST support at least four Scatter/Gather Elements per Scatter/G

Re: [PATCH] IB/core: Destroy ocrdma_dev_id IDR on module exit

2015-07-10 Thread Devesh Sharma
We missed to ack this patch. Thanks Dough and Johannes. acked-by: Devesh Sharma On Thu, Jul 9, 2015 at 3:13 AM, Doug Ledford wrote: > On 07/08/2015 11:23 AM, Johannes Thumshirn wrote: >> Destroy ocrdma_dev_id IDR on module exit, reclaiming the allocated memory. >> > > Thanks, applied. > > > --

Re: [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls

2015-07-10 Thread Devesh Sharma
we need to honor the max limits of device by checking dev_attr.max_sge? a vendor may not support 4 sges. On Fri, Jul 10, 2015 at 2:13 AM, Chuck Lever wrote: > Only the RPC/RDMA header is sent when making an RDMA_NOMSG call. > That header resides in the first element of the iovec array > passed to

Re: [PATCH v1 06/12] xprtrdma: Always provide a write list when sending NFS READ

2015-07-10 Thread Devesh Sharma
Looks good Reveiwed-By: Devesh Sharma On Fri, Jul 10, 2015 at 2:12 AM, Chuck Lever wrote: > The client has been setting up a reply chunk for NFS READs that are > smaller than the inline threshold. This is not efficient: both the > server and client CPUs have to copy the reply's data payload int

Re: [PATCH v1 05/12] xprtrdma: Account for RPC/RDMA header size when deciding to inline

2015-07-10 Thread Devesh Sharma
Looks good Reveiwed-By: Devesh Sharma On Fri, Jul 10, 2015 at 2:12 AM, Chuck Lever wrote: > When marshaling RPC/RDMA requests, ensure the combined size of > RPC/RDMA header and RPC header do not exceed the inline threshold. > Endpoints typically reject RPC/RDMA messages that exceed the size > o

Re: [PATCH v1 04/12] xprtrdma: Remove last ib_reg_phys_mr() call site

2015-07-10 Thread Devesh Sharma
Looks good. Reviewed-By: Devesh Sharma On Fri, Jul 10, 2015 at 2:12 AM, Chuck Lever wrote: > All HCA providers have an ib_get_dma_mr() verb. Thus > rpcrdma_ia_open() will either grab the device's local_dma_key if one > is available, or it will call ib_get_dma_mr() which is a 100% > guaranteed f

Re: [PATCH v1 03/12] xprtrdma: Increase default credit limit

2015-07-10 Thread Devesh Sharma
Increasing the default slot table entries will increase the MR requirements per mount. Currently, with 32 as default Client ends up allocating 2178 frmrs (ref: kernel 4.1-rc4) for a single mount. With 128 frmr requirement for startup would be 8448. 8K+ MRs per mount just for start-up, I am a litt

Re: [PATCH v1 02/12] xprtrdma: Raise maximum payload size to one megabyte

2015-07-10 Thread Devesh Sharma
Looks good Reviewed-By: Devesh Sharma On Fri, Jul 10, 2015 at 2:11 AM, Chuck Lever wrote: > The point of larger rsize and wsize is to reduce the per-byte cost > of memory registration and deregistration. Modern HCAs can typically > handle a megabyte or more with a single registration operation.

Kernel fast memory registration API proposal [RFC]

2015-07-10 Thread Sagi Grimberg
Hi, Given the last discussions on our in-kernel memory registration API I thought I'd propose another approach to address this. As I said before, I think the stack needs to consolidate on a single memory registration scheme. That scheme is the standard FRWR. As you know, MRs have a consumers re

kernel memory registration (was: RDMA/core: Transport-independent access flags)

2015-07-10 Thread Sagi Grimberg
On 7/9/2015 8:01 PM, Jason Gunthorpe wrote: On Thu, Jul 09, 2015 at 02:02:03PM +0300, Sagi Grimberg wrote: We have protocol that involves remote memory keys transfer in their standards so I don't see how we can remove it altogether from ULPs. This is why I've been talking about local and remo