Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Logan Gunthorpe
On 23/11/16 02:55 PM, Jason Gunthorpe wrote: >>> Only ODP hardware allows changing the DMA address on the fly, and it >>> works at the page table level. We do not need special handling for >>> RDMA. >> >> I am aware of ODP but, noted by others, it doesn't provide a general >> solution to the

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Sagalovitch, Serguei
On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote: > Perhaps I am not following what Serguei is asking for, but I > understood the desire was for a complex GPU allocator that could > migrate pages between GPU and CPU memory under control of the GPU > driver, among other things. The

Re: [PATCH] x86: fix kaslr and memmap collision

2016-11-23 Thread Dave Chinner
On Tue, Nov 22, 2016 at 11:01:32AM -0800, Dan Williams wrote: > On Tue, Nov 22, 2016 at 10:54 AM, Kees Cook wrote: > > On Tue, Nov 22, 2016 at 9:26 AM, Dan Williams > > wrote: > >> No, you're right, we need to handle multiple ranges. Since the >

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 02:42:12PM -0800, Dan Williams wrote: > > The crucial part for this discussion is the ability to fence and block > > DMA for a specific range. This is the hardware capability that lets > > page migration happen: fence DMA, migrate page, update page > > table in HCA, unblock

Re: [PATCH] ndctl: introduce 4k allocation support for creating namespace

2016-11-23 Thread Dan Williams
Some needed changes I noticed while trying to take this onto the 'pending' branch On Mon, Oct 24, 2016 at 4:21 PM, Dave Jiang wrote: > Existing implementation defaults all pages allocated as 2M superpages. > For nfit_test dax device we need 4k pages allocated to work

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote: > > As I said, there is no possible special handling. Standard IB hardware > > does not support changing the DMA address once a MR is created. Forget > > about doing that. > > Yeah, that's essentially the point I was trying to make.

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Logan Gunthorpe
On 23/11/16 01:33 PM, Jason Gunthorpe wrote: > On Wed, Nov 23, 2016 at 02:58:38PM -0500, Serguei Sagalovitch wrote: > >>We do not want to have "highly" dynamic translation due to >>performance cost. We need to support "overcommit" but would >>like to minimize impact. To support

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 02:58:38PM -0500, Serguei Sagalovitch wrote: >We do not want to have "highly" dynamic translation due to >performance cost. We need to support "overcommit" but would >like to minimize impact. To support RDMA MRs for GPU/VRAM/PCIe >device memory (which is

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Serguei Sagalovitch
On 2016-11-23 02:32 PM, Jason Gunthorpe wrote: On Wed, Nov 23, 2016 at 02:14:40PM -0500, Serguei Sagalovitch wrote: On 2016-11-23 02:05 PM, Jason Gunthorpe wrote: As Bart says, it would be best to be combined with something like Mellanox's ODP MRs, which allows a page to be evicted and then

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 02:14:40PM -0500, Serguei Sagalovitch wrote: > > On 2016-11-23 02:05 PM, Jason Gunthorpe wrote: > >As Bart says, it would be best to be combined with something like > >Mellanox's ODP MRs, which allows a page to be evicted and then trigger > >a CPU interrupt if a DMA is

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Serguei Sagalovitch
On 2016-11-23 03:51 AM, Christian König wrote: Am 23.11.2016 um 08:49 schrieb Daniel Vetter: On Tue, Nov 22, 2016 at 01:21:03PM -0800, Dan Williams wrote: On Tue, Nov 22, 2016 at 1:03 PM, Daniel Vetter wrote: On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Serguei Sagalovitch
On 2016-11-23 02:05 PM, Jason Gunthorpe wrote: On Wed, Nov 23, 2016 at 10:13:03AM -0700, Logan Gunthorpe wrote: an MR would be very tricky. The MR may be relied upon by another host and the kernel would have to inform user-space the MR was invalid then user-space would have to tell the remote

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 10:40:47AM -0800, Dan Williams wrote: > I don't think that was designed for the case where the backing memory > is a special/static physical address range rather than anonymous > "System RAM", right? The hardware doesn't care where the memory is. ODP is just a generic

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Jason Gunthorpe
On Wed, Nov 23, 2016 at 10:13:03AM -0700, Logan Gunthorpe wrote: > an MR would be very tricky. The MR may be relied upon by another host > and the kernel would have to inform user-space the MR was invalid then > user-space would have to tell the remote application. As Bart says, it would be best

[PATCH 3/6] dax: add tracepoint infrastructure, PMD tracing

2016-11-23 Thread Ross Zwisler
Tracepoints are the standard way to capture debugging and tracing information in many parts of the kernel, including the XFS and ext4 filesystems. Create a tracepoint header for FS DAX and add the first DAX tracepoints to the PMD fault handler. This allows the tracing for DAX to be done in the

[PATCH 6/6] dax: add tracepoints to dax_pmd_insert_mapping()

2016-11-23 Thread Ross Zwisler
Add tracepoints to dax_pmd_insert_mapping(), following the same logging conventions as the tracepoints in dax_iomap_pmd_fault(). Here is an example PMD fault showing the new tracepoints: big-1544 [006] 48.153479: dax_pmd_fault: shared mapping write address 0x10505000 vm_start 0x1020

[PATCH 4/6] dax: update MAINTAINERS entries for FS DAX

2016-11-23 Thread Ross Zwisler
Add the new include/trace/events/fs_dax.h tracepoint header, update Matthew's email address and add myself as a maintainer for filesystem DAX. Signed-off-by: Ross Zwisler Suggested-by: Matthew Wilcox --- MAINTAINERS | 4 +++- 1 file

[PATCH 5/6] dax: add tracepoints to dax_pmd_load_hole()

2016-11-23 Thread Ross Zwisler
Add tracepoints to dax_pmd_load_hole(), following the same logging conventions as the tracepoints in dax_iomap_pmd_fault(). Here is an example PMD fault showing the new tracepoints: read_big-1393 [007] 32.133809: dax_pmd_fault: shared mapping read address 0x1040 vm_start 0x1020

[PATCH 1/6] dax: fix build breakage with ext4, dax and !iomap

2016-11-23 Thread Ross Zwisler
With the current Kconfig setup it is possible to have the following: CONFIG_EXT4_FS=y CONFIG_FS_DAX=y CONFIG_FS_IOMAP=n # this is in fs/Kconfig & isn't user accessible With this config we get build failures in ext4_dax_fault() because the iomap functions in fs/dax.c are missing:

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Dan Williams
On Wed, Nov 23, 2016 at 9:27 AM, Bart Van Assche wrote: > On 11/23/2016 09:13 AM, Logan Gunthorpe wrote: >> >> IMO any memory that has been registered for a P2P transaction should be >> locked from being evicted. So if there's a get_user_pages call it needs >> to be

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Bart Van Assche
On 11/23/2016 09:13 AM, Logan Gunthorpe wrote: IMO any memory that has been registered for a P2P transaction should be locked from being evicted. So if there's a get_user_pages call it needs to be pinned until the put_page. The main issue being with the RDMA case: handling an eviction when a

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Dave Hansen
On 11/22/2016 11:49 PM, Daniel Vetter wrote: > Yes, agreed. My idea with exposing vram sections using numa nodes wasn't > to reuse all the existing allocation policies directly, those won't work. > So at boot-up your default numa policy would exclude any vram nodes. > > But I think (as an -mm

Re: Enabling peer to peer device transactions for PCIe devices

2016-11-23 Thread Christian König
Am 23.11.2016 um 08:49 schrieb Daniel Vetter: On Tue, Nov 22, 2016 at 01:21:03PM -0800, Dan Williams wrote: On Tue, Nov 22, 2016 at 1:03 PM, Daniel Vetter wrote: On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch wrote: On 2016-11-22 03:10 PM,