Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Sagi Grimberg
On 7/15/2015 5:32 PM, Chuck Lever wrote: On Jul 15, 2015, at 4:01 AM, Sagi Grimberg wrote: On 7/14/2015 8:09 PM, Jason Gunthorpe wrote: On Tue, Jul 14, 2015 at 07:55:39PM +0300, Sagi Grimberg wrote: But, if people think that it's better to have an API that does implicit posting always with

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-15 Thread Jason Gunthorpe
On Wed, Jul 15, 2015 at 01:12:57PM -0600, Jason Gunthorpe wrote: > I might find time to type this in, but I won't be able to find time to > do any testing on the ULPs.. Here is the typing, I'll look more carefully at it later and send it via email: https://github.com/jgunthorpe/linux/commits/rem

Re: [BUG] mellanox IB driver fails to load on large config

2015-07-15 Thread Or Gerlitz
On 7/14/2015 11:28 PM, Alex Thorlton wrote: We see the same exact messages on 4.1-rc8. does this solves the problem? diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index ad31e47..c8ae3b9 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -

Re: [Ksummit-discuss] [TECH TOPIC] IRQ affinity

2015-07-15 Thread Michael S. Tsirkin
On Wed, Jul 15, 2015 at 02:48:00PM -0400, Matthew Wilcox wrote: > On Wed, Jul 15, 2015 at 11:25:55AM -0600, Jens Axboe wrote: > > On 07/15/2015 11:19 AM, Keith Busch wrote: > > >On Wed, 15 Jul 2015, Bart Van Assche wrote: > > >>* With blk-mq and scsi-mq optimal performance can only be achieved if >

Re: [PULL REQUEST] Please pull rdma.git

2015-07-15 Thread Linus Torvalds
On Tue, Jul 14, 2015 at 12:42 PM, Doug Ledford wrote: > > This comprises a number of various fixes (including the ones you've > requested): Hmm. I've pulled this, but quite frankly, I don't think this was appropriate for post-merge-window. It's not at all obvious that that "rdma_cap_ib_switch hel

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Jason Gunthorpe
On Wed, Jul 15, 2015 at 05:25:11PM -0400, Chuck Lever wrote: > NFS READ and WRITE data payloads are mapped with ib_map_phys_mr() > just before the RPC is sent, and those payloads are unmapped > with ib_unmap_fmr() as soon as the client sees the server’s RPC > reply. Okay.. but.. ib_unmap_fmr is t

Re: [PATCH v2 07/14] xprtrdma: Remove logic that constructs RDMA_MSGP type calls

2015-07-15 Thread Chuck Lever
Thanks Devesh, will note that in the next versions of these series. On Jul 15, 2015, at 2:26 PM, Devesh Sharma wrote: > with MAX_IOVS set to 2 iozone passes with ocrdma device. My testing > includes both the series of svcrdma and xprtrdma. > > On Wed, Jul 15, 2015 at 12:31 AM, Chuck Lever wro

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Chuck Lever
On Jul 15, 2015, at 1:19 PM, Jason Gunthorpe wrote: > On Wed, Jul 15, 2015 at 10:32:55AM -0400, Chuck Lever wrote: > >> I would rather not build a non-deterministic delay into the >> unmap interface. Using a pool or having map do an implicit >> unmap are both solutions I’d rather avoid. > > C

Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-15 Thread Jason Gunthorpe
On Wed, Jul 15, 2015 at 08:27:06PM +, Liran Liss wrote: > If you want to restrict a container to a specific set of pkeys, use > cgroups. Ideally yes, but in the absence of a cgroup the set of pkeys assigned to the container via ipoib is a reasonable alternate. > This would apply both to CM MA

RE: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-15 Thread Liran Liss
> From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com] > > > What is really missing here I guess is a mechanism that would > > enforce containers to only use certain pkeys - perhaps with > > something like an RDMA cgroup. It could force containers to only > > use approved pkeys not on

RE: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Steve Wise
> -Original Message- > From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com] > Sent: Wednesday, July 15, 2015 2:10 PM > To: Steve Wise > Cc: 'Sagi Grimberg'; 'Christoph Hellwig'; linux-rdma@vger.kernel.org; 'Or > Gerlitz'; 'Oren Duer'; 'Chuck Lever'; 'Bart Van Assche'; 'Liran >

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-15 Thread Jason Gunthorpe
On Wed, Jul 15, 2015 at 05:19:26AM -0700, 'Christoph Hellwig' wrote: > On Wed, Jul 15, 2015 at 11:47:52AM +0300, Sagi Grimberg wrote: > > > struct ib_pd *ib_alloc_pd(struct ib_device *device) > > > { > > > struct ib_pd *pd; > > >+ struct ib_device_attr devattr; > > >+ int rc; > > >+ > > >+ r

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Jason Gunthorpe
On Wed, Jul 15, 2015 at 11:33:39AM +0300, Sagi Grimberg wrote: > >Call this rdma_mr to fit the scheme we use for "generic" APIs in the > >RDMA stack? > > Umm, I think this can become weird given all other primitives have > ib_ prefix. I'd prefer to keep that prefix to stay consistent, and have >

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-15 Thread Jason Gunthorpe
On Tue, Jul 14, 2015 at 11:50:57PM -0700, 'Christoph Hellwig' wrote: > On Tue, Jul 14, 2015 at 02:29:43PM -0600, Jason Gunthorpe wrote: > > local_dma_lkey appears to be global, it works with any PD. > > > > ib_get_dma_mr is tied to a PD, so it cannot replace local_dma_lkey at > > the struct device

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Jason Gunthorpe
> > - ULP calls a 'rdma_post_close_rkey' helper > > * For FRWR this posts the INVALIDATE > > Note: Some send operations automatically invalidate an rkey (and the > lkey for IB?). This is intended to avoid having to post the > invalidate WR explicitly. Namely IB_WR_READ_WITH_INV and > IB_W

RE: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Steve Wise
> -Original Message- > From: linux-rdma-ow...@vger.kernel.org > [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Jason Gunthorpe > Sent: Wednesday, July 15, 2015 1:31 PM > To: Sagi Grimberg > Cc: Christoph Hellwig; linux-rdma@vger.kernel.org; Steve Wise; Or Gerlitz; > Oren Duer; C

Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-15 Thread Jason Gunthorpe
On Wed, Jul 15, 2015 at 01:57:48PM +0300, Haggai Eran wrote: > On 13/07/2015 21:14, Jason Gunthorpe wrote: > > On Mon, Jun 22, 2015 at 03:42:37PM +0300, Haggai Eran wrote: > >> + switch (ib_event->event) { > >> + case IB_CM_REQ_RECEIVED: > >> + req->device = req_param->listen_id->dev

Re: [Ksummit-discuss] [TECH TOPIC] IRQ affinity

2015-07-15 Thread Matthew Wilcox
On Wed, Jul 15, 2015 at 11:25:55AM -0600, Jens Axboe wrote: > On 07/15/2015 11:19 AM, Keith Busch wrote: > >On Wed, 15 Jul 2015, Bart Van Assche wrote: > >>* With blk-mq and scsi-mq optimal performance can only be achieved if > >> the relationship between MSI-X vector and NUMA node does not change

RE: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Steve Wise
> -Original Message- > From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com] > Sent: Wednesday, July 15, 2015 12:19 PM > To: Chuck Lever > Cc: Sagi Grimberg; Christoph Hellwig; linux-rdma@vger.kernel.org; Steve Wise; > Or Gerlitz; Oren Duer; Bart Van Assche; Liran Liss; Hefty, >

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Jason Gunthorpe
On Wed, Jul 15, 2015 at 11:01:46AM +0300, Sagi Grimberg wrote: > On 7/14/2015 8:09 PM, Jason Gunthorpe wrote: > >On Tue, Jul 14, 2015 at 07:55:39PM +0300, Sagi Grimberg wrote: > > > >>But, if people think that it's better to have an API that does implicit > >>posting always without notification, an

Re: [PATCH v2 07/14] xprtrdma: Remove logic that constructs RDMA_MSGP type calls

2015-07-15 Thread Devesh Sharma
with MAX_IOVS set to 2 iozone passes with ocrdma device. My testing includes both the series of svcrdma and xprtrdma. On Wed, Jul 15, 2015 at 12:31 AM, Chuck Lever wrote: > > On Jul 14, 2015, at 3:00 PM, Tom Talpey wrote: > >> On 7/13/2015 12:30 PM, Chuck Lever wrote: >>> RDMA_MSGP type calls in

Re: [Ksummit-discuss] [TECH TOPIC] IRQ affinity

2015-07-15 Thread Sagi Grimberg
On 7/15/2015 8:25 PM, Jens Axboe wrote: On 07/15/2015 11:19 AM, Keith Busch wrote: On Wed, 15 Jul 2015, Bart Van Assche wrote: * With blk-mq and scsi-mq optimal performance can only be achieved if the relationship between MSI-X vector and NUMA node does not change over time. This is necessary

[PATCHv3 infiniband-diags] iblinkinfo.c: Close additional file descriptor in advance

2015-07-15 Thread Hal Rosenstock
Additional file descriptor for SMP MADs should be closed before running ibnd_discover_fabric() to avoid parallel usage of two SMP file descriptors Signed-off-by: Vladimir Koushnir Signed-off-by: Hal Rosenstock --- Change since v2: Only SMP query for NodeInfo if (!all && dr_path) - same as curre

Re: [Ksummit-discuss] [TECH TOPIC] IRQ affinity

2015-07-15 Thread Jens Axboe
On 07/15/2015 11:19 AM, Keith Busch wrote: On Wed, 15 Jul 2015, Bart Van Assche wrote: * With blk-mq and scsi-mq optimal performance can only be achieved if the relationship between MSI-X vector and NUMA node does not change over time. This is necessary to allow a blk-mq/scsi-mq driver to ens

Re: [Ksummit-discuss] [TECH TOPIC] IRQ affinity

2015-07-15 Thread Keith Busch
On Wed, 15 Jul 2015, Bart Van Assche wrote: * With blk-mq and scsi-mq optimal performance can only be achieved if the relationship between MSI-X vector and NUMA node does not change over time. This is necessary to allow a blk-mq/scsi-mq driver to ensure that interrupts are processed on the sam

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Jason Gunthorpe
On Wed, Jul 15, 2015 at 10:32:55AM -0400, Chuck Lever wrote: > I would rather not build a non-deterministic delay into the > unmap interface. Using a pool or having map do an implicit > unmap are both solutions I’d rather avoid. Can you explain how NFS is using FMR today? When does it unmap a FMR

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Jason Gunthorpe
On Wed, Jul 15, 2015 at 12:32:33AM -0700, Christoph Hellwig wrote: > int rdma_create_mr(struct ib_pd *pd, enum rdma_mr_type mr, > u32 max_pages, int flags); > > > * array from a SG list > > * @mr: memory region > > * @sg: sg list > > * @sg_nents:number of el

Re: [Ksummit-discuss] [TECH TOPIC] IRQ affinity

2015-07-15 Thread Michael S. Tsirkin
On Wed, Jul 15, 2015 at 05:07:08AM -0700, Christoph Hellwig wrote: > Many years ago we decided to move setting of IRQ to core affnities to > userspace with the irqbalance daemon. > > These days we have systems with lots of MSI-X vector, and we have > hardware and subsystem support for per-CPU I/O

[PATCHv2 infiniband-diags] iblinkinfo.c: Close additional file descriptor inadvance

2015-07-15 Thread Hal Rosenstock
Additional file descriptor for SMP MADs should be closed before running ibnd_discover_fabric() to avoid parallel usage of two SMP file descriptors Signed-off-by: Vladimir Koushnir Signed-off-by: Hal Rosenstock --- Change since v1: Per Ira's comment, moved location of SMP query for NodeInfo so p

Re: [PATCH v2 2/4 infiniband-diags] ibqueryerrors: Close global file descriptor before running ibnd_discover_fabric

2015-07-15 Thread Hal Rosenstock
On 7/3/2015 11:24 AM, ira.weiny wrote: > On Sun, Apr 26, 2015 at 03:33:50PM -0400, Hal Rosenstock wrote: >> From: Vladimir Koushnir >> Date: Sun, 26 Apr 2015 12:24:06 +0300 >> >> Global file descriptor for SMPs and GMPs should be closed before running >> ibnd_discover_fabric() to avoid parallel us

Re: [PATCH libibmad] portid.c: Preserve routepath string in str2drpath

2015-07-15 Thread ira.weiny
On Wed, Apr 29, 2015 at 07:12:07AM -0400, Hal Rosenstock wrote: > From: Vladimir Koushnir > Date: Tue, 28 Apr 2015 18:35:28 +0300 > > In str2drpath, preserve routepath string rather than > modifying it. > > Signed-off-by: Vladimir Koushnir > Signed-off-by: Hal Rosenstock Thanks applied, Ira

Re: [Ksummit-discuss] [TECH TOPIC] IRQ affinity

2015-07-15 Thread Bart Van Assche
On 07/15/2015 05:12 AM, Thomas Gleixner wrote: On Wed, 15 Jul 2015, Christoph Hellwig wrote: Many years ago we decided to move setting of IRQ to core affnities to userspace with the irqbalance daemon. These days we have systems with lots of MSI-X vector, and we have hardware and subsystem suppo

Re: [Ksummit-discuss] [TECH TOPIC] IRQ affinity

2015-07-15 Thread Marc Zyngier
On 15/07/15 13:07, Christoph Hellwig wrote: > Many years ago we decided to move setting of IRQ to core affnities to > userspace with the irqbalance daemon. > > These days we have systems with lots of MSI-X vector, and we have > hardware and subsystem support for per-CPU I/O queues in the block > l

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Chuck Lever
On Jul 15, 2015, at 10:32 AM, Chuck Lever wrote: > > On Jul 15, 2015, at 4:01 AM, Sagi Grimberg wrote: > >> On 7/14/2015 8:09 PM, Jason Gunthorpe wrote: >>> On Tue, Jul 14, 2015 at 07:55:39PM +0300, Sagi Grimberg wrote: >>> But, if people think that it's better to have an API that does

Re: [TECH TOPIC] IRQ affinity

2015-07-15 Thread Christoph Lameter
On Wed, 15 Jul 2015, Christoph Hellwig wrote: > Many years ago we decided to move setting of IRQ to core affnities to > userspace with the irqbalance daemon. > > These days we have systems with lots of MSI-X vector, and we have > hardware and subsystem support for per-CPU I/O queues in the block >

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Chuck Lever
On Jul 15, 2015, at 4:01 AM, Sagi Grimberg wrote: > On 7/14/2015 8:09 PM, Jason Gunthorpe wrote: >> On Tue, Jul 14, 2015 at 07:55:39PM +0300, Sagi Grimberg wrote: >> >>> But, if people think that it's better to have an API that does implicit >>> posting always without notification, and then sil

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-15 Thread 'Christoph Hellwig'
On Wed, Jul 15, 2015 at 11:47:52AM +0300, Sagi Grimberg wrote: > > struct ib_pd *ib_alloc_pd(struct ib_device *device) > > { > > struct ib_pd *pd; > >+struct ib_device_attr devattr; > >+int rc; > >+ > >+rc = ib_query_device(device, &devattr); > >+if (rc) > >+return

Re: [Ksummit-discuss] [TECH TOPIC] IRQ affinity

2015-07-15 Thread Thomas Gleixner
On Wed, 15 Jul 2015, Christoph Hellwig wrote: > Many years ago we decided to move setting of IRQ to core affnities to > userspace with the irqbalance daemon. > > These days we have systems with lots of MSI-X vector, and we have > hardware and subsystem support for per-CPU I/O queues in the block

[TECH TOPIC] IRQ affinity

2015-07-15 Thread Christoph Hellwig
Many years ago we decided to move setting of IRQ to core affnities to userspace with the irqbalance daemon. These days we have systems with lots of MSI-X vector, and we have hardware and subsystem support for per-CPU I/O queues in the block layer, the RDMA subsystem and probably the network stack

Re: [BUG] mellanox IB driver fails to load on large config

2015-07-15 Thread Matan Barak
On 7/14/2015 11:28 PM, Alex Thorlton wrote: On Tue, Jul 14, 2015 at 11:06:26PM +0300, Or Gerlitz wrote: On Tue, Jul 14, 2015 at 9:48 PM, Alex Thorlton wrote: On Tue, Jul 14, 2015 at 01:22:34PM -0500, andrew banman wrote: On Sat, Jul 11, 2015 at 11:20:19PM +0300, Or Gerlitz wrote: On Fri, J

Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-15 Thread Haggai Eran
On 13/07/2015 21:14, Jason Gunthorpe wrote: > On Mon, Jun 22, 2015 at 03:42:37PM +0300, Haggai Eran wrote: >> +switch (ib_event->event) { >> +case IB_CM_REQ_RECEIVED: >> +req->device = req_param->listen_id->device; >> +req->port = req_param->port; >> +

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Christoph Hellwig
On Wed, Jul 15, 2015 at 11:33:39AM +0300, Sagi Grimberg wrote: > Umm, I think this can become weird given all other primitives have > ib_ prefix. I'd prefer to keep that prefix to stay consistent, and have > an incremental change to do it for all the primitives (structs & verbs). Fine with me, we'

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Sagi Grimberg
On 7/15/2015 6:05 AM, Doug Ledford wrote: On 07/14/2015 01:08 PM, Jason Gunthorpe wrote: On Tue, Jul 14, 2015 at 07:46:50PM +0300, Sagi Grimberg wrote: Which drivers doesn't support FRWR that we need to do other things? ipath - depracated We have permission to move this to staging and then RM

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-15 Thread Sagi Grimberg
On 7/14/2015 11:29 PM, Jason Gunthorpe wrote: On Tue, Jul 14, 2015 at 12:55:11PM -0700, 'Christoph Hellwig' wrote: On Tue, Jul 14, 2015 at 02:32:31PM -0500, Steve Wise wrote: You mean "should not", yea? Ok. I'll check for iWARP. But don't tell me to remove the transport-specific hacks in th

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Sagi Grimberg
On 7/15/2015 10:32 AM, Christoph Hellwig wrote: Hi Sagi, I went over your proposal based on reviewing the ongoing MR threads and my implementation of a similar in-driver abstraction, so here are some proposed updates. struct provider_mr { u64 *page_list; // or what ever the

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Sagi Grimberg
On 7/14/2015 8:09 PM, Jason Gunthorpe wrote: On Tue, Jul 14, 2015 at 07:55:39PM +0300, Sagi Grimberg wrote: But, if people think that it's better to have an API that does implicit posting always without notification, and then silently consume error or flush completions. I can try and look at it

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Christoph Hellwig
Hi Sagi, I went over your proposal based on reviewing the ongoing MR threads and my implementation of a similar in-driver abstraction, so here are some proposed updates. > struct provider_mr { > u64 *page_list; // or what ever the HW uses > ... ... > struct ib_mr

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-15 Thread Sagi Grimberg
On 7/14/2015 8:26 PM, Jason Gunthorpe wrote: On Tue, Jul 14, 2015 at 12:05:53PM +0300, Sagi Grimberg wrote: iser has it too. I have a similar patch with a flag for iser (its behind a bulk of patches that are still pending though). Do we all agree and understand that stuff like this in driver