Re: [PATCH] mm: Remove double faults once write a device pfn

2024-01-25 Thread Alistair Popple
"Zhou, Xianrong" writes: > [AMD Official Use Only - General] > >> > The vmf_insert_pfn_prot could cause unnecessary double faults on a >> > device pfn. Because currently the vmf_insert_pfn_prot does not >> > make the pfn writable so the pte entry is normally read-only or >> > di

Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-12-04 Thread Alistair Popple
Christian König writes: > Am 01.12.23 um 06:48 schrieb Zeng, Oak: >> [SNIP] >> Besides memory eviction/oversubscription, there are a few other pain points >> when I use hmm: >> >> 1) hmm doesn't support file-back memory, so it is hard to share > memory b/t process in a gpu environment. You me

Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-12-01 Thread Alistair Popple
"Zeng, Oak" writes: > See inline comments > >> -Original Message- >> From: dri-devel On Behalf Of >> zhuweixi >> Sent: Thursday, November 30, 2023 5:48 AM >> To: Christian König ; Zeng, Oak >> ; Christian König ; linux- >> m...@kvack.org; linux-ker...@vger.kernel.org; a...@linux-founda

Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-12-01 Thread Alistair Popple
zhuweixi writes: > Glad to know that there is a common demand for a new syscall like > hmadvise(). I expect it would also be useful for homogeneous NUMA > cases. Credits to cudaMemAdvise() API which brought this idea to > GMEM's design. It's not clear to me that this would need to be a new sys

Re: [PATCH v2 0/8] Fix several device private page reference counting issues

2022-10-26 Thread Alistair Popple
"Vlastimil Babka (SUSE)" writes: > On 9/28/22 14:01, Alistair Popple wrote: >> This series aims to fix a number of page reference counting issues in >> drivers dealing with device private ZONE_DEVICE pages. These result in >> use-after-free type bugs, either fro

Re: [PATCH v2 1/8] mm/memory.c: Fix race when faulting a device private page

2022-10-03 Thread Alistair Popple
Felix Kuehling writes: > On 2022-09-28 08:01, Alistair Popple wrote: >> When the CPU tries to access a device private page the migrate_to_ram() >> callback associated with the pgmap for the page is called. However no >> reference is taken on the faulting page. T

Re: [PATCH 2/7] mm: Free device private pages have zero refcount

2022-09-30 Thread Alistair Popple
Dan Williams writes: > Alistair Popple wrote: >> >> Jason Gunthorpe writes: >> >> > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote: >> >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page >> >>

Re: [PATCH v2 8/8] hmm-tests: Add test for migrate_device_range()

2022-09-29 Thread Alistair Popple
Andrew Morton writes: > On Wed, 28 Sep 2022 22:01:22 +1000 Alistair Popple wrote: > >> @@ -1401,22 +1494,7 @@ static int dmirror_device_init(struct dmirror_device >> *mdevice, int id) >> >> static void dmirror_device_remove(struct dmirror_device *mdevice

Re: [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page

2022-09-28 Thread Alistair Popple
Michael Ellerman writes: > Alistair Popple writes: >> When the CPU tries to access a device private page the migrate_to_ram() >> callback associated with the pgmap for the page is called. However no >> reference is taken on the faulting page. Therefore a concurrent >&

[PATCH v2 1/8] mm/memory.c: Fix race when faulting a device private page

2022-09-28 Thread Alistair Popple
o see if it's expected or not. Signed-off-by: Alistair Popple Cc: Jason Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michael Ellerman Cc: Felix Kuehling Cc: Lyude Paul --- arch/powerpc/kvm/book3s_hv_uvmem.c | 15 ++- drivers/gpu/drm/amd/amdkfd/kfd_migr

[PATCH v2 5/8] mm/migrate_device.c: Add migrate_device_range()

2022-09-28 Thread Alistair Popple
n free up device memory. To allow that this patch introduces the migrate_device family of functions which are functionally similar to migrate_vma but which skips the initial lookup based on mapping. Signed-off-by: Alistair Popple Cc: "Huang, Ying" Cc: Zi Yan Cc: Matthew Wilcox Cc:

[PATCH v2 4/8] mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page()

2022-09-28 Thread Alistair Popple
this isn't true for device private memory, and a future change requires similar functionality for device private memory. So refactor the code into something more sensible for migrating device memory without a vma. Signed-off-by: Alistair Popple Cc: "Huang, Ying" Cc: Zi Yan Cc: Mat

[PATCH v2 8/8] hmm-tests: Add test for migrate_device_range()

2022-09-28 Thread Alistair Popple
Signed-off-by: Alistair Popple Cc: Jason Gunthorpe Cc: Ralph Campbell Cc: John Hubbard Cc: Alex Sierra Cc: Felix Kuehling --- lib/test_hmm.c | 120 +- lib/test_hmm_uapi.h| 1 +- tools/testing/selftests/vm/hmm-tests.c

[PATCH v2 6/8] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()

2022-09-28 Thread Alistair Popple
. Refactor out the core functionality so that it is not specific to fault handling. Signed-off-by: Alistair Popple Reviewed-by: Lyude Paul Cc: Ben Skeggs Cc: Ralph Campbell Cc: John Hubbard --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 58 +-- 1 file changed, 28 insertions

[PATCH v2 3/8] mm/memremap.c: Take a pgmap reference on page allocation

2022-09-28 Thread Alistair Popple
ough pages are still mapped by the kernel which can lead to kernel crashes, particularly if a driver frees the pagemap. To fix this drivers should take a pagemap reference when allocating the page. This reference can then be returned when the page is freed. Signed-off-by: Alistair Popple Fixes: 27

[PATCH v2 7/8] nouveau/dmem: Evict device private memory during release

2022-09-28 Thread Alistair Popple
device pages have been freed which may never happen. Fix this by migrating device mappings back to normal CPU memory prior to freeing the GPU memory chunks and associated device private pages. Signed-off-by: Alistair Popple Cc: Lyude Paul Cc: Ben Skeggs Cc: Ralph Campbell Cc: John Hubbard

[PATCH v2 0/8] Fix several device private page reference counting issues

2022-09-28 Thread Alistair Popple
-gfx@lists.freedesktop.org Cc: nouv...@lists.freedesktop.org Cc: dri-de...@lists.freedesktop.org Alistair Popple (8): mm/memory.c: Fix race when faulting a device private page mm: Free device private pages have zero refcount mm/memremap.c: Take a pgmap reference on page allocation mm

[PATCH v2 2/8] mm: Free device private pages have zero refcount

2022-09-28 Thread Alistair Popple
ns such as get_page_unless_zero(). Signed-off-by: Alistair Popple Cc: Jason Gunthorpe Cc: Michael Ellerman Cc: Felix Kuehling Cc: Alex Deucher Cc: Christian König Cc: Ben Skeggs Cc: Lyude Paul Cc: Ralph Campbell Cc: Alex Sierra Cc: John Hubbard Cc: Dan Williams --- This will conflict with Dan'

Re: [PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()

2022-09-28 Thread Alistair Popple
Lyude Paul writes: > On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote: >> nouveau_dmem_fault_copy_one() is used during handling of CPU faults via >> the migrate_to_ram() callback and is used to copy data from GPU to CPU >> memory. It is currently specific to faul

Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-27 Thread Alistair Popple
Felix Kuehling writes: > On 2022-09-26 17:35, Lyude Paul wrote: >> On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote: >>> When the module is unloaded or a GPU is unbound from the module it is >>> possible for device private pages to be left mapped in curr

Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-27 Thread Alistair Popple
John Hubbard writes: > On 9/26/22 14:35, Lyude Paul wrote: >>> + for (i = 0; i < npages; i++) { >>> + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { >>> + struct page *dpage; >>> + >>> + /* >>> +* _GFP_NOFAIL because the GPU is going

Re: [PATCH 2/7] mm: Free device private pages have zero refcount

2022-09-27 Thread Alistair Popple
Jason Gunthorpe writes: > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote: >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page >> refcount") device private pages have no longer had an extra reference >> count when the page is in u

[PATCH 4/7] mm/migrate_device.c: Add migrate_device_range()

2022-09-26 Thread Alistair Popple
n free up device memory. To allow that this patch introduces the migrate_device family of functions which are functionally similar to migrate_vma but which skips the initial lookup based on mapping. Signed-off-by: Alistair Popple --- include/linux/migrate.h | 7 +++- mm/migrate_device.c

[PATCH 0/7] Fix several device private page reference counting issues

2022-09-26 Thread Alistair Popple
. Unfortunately I lack the hardware to test on either of these so would appreciate it if someone with access could test those. Alistair Popple (7): mm/memory.c: Fix race when faulting a device private page mm: Free device private pages have zero refcount mm/migrate_device.c: Refactor

[PATCH 7/7] hmm-tests: Add test for migrate_device_range()

2022-09-26 Thread Alistair Popple
Signed-off-by: Alistair Popple --- lib/test_hmm.c | 119 +- lib/test_hmm_uapi.h| 1 +- tools/testing/selftests/vm/hmm-tests.c | 49 +++- 3 files changed, 148 insertions(+), 21 deletions(-) diff --git a/lib/test_hmm.c

[PATCH 3/7] mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page()

2022-09-26 Thread Alistair Popple
this isn't true for device private memory, and a future change requires similar functionality for device private memory. So refactor the code into something more sensible for migrating device memory without a vma. Signed-off-by: Alistair Popple --- mm/migrate_device.c

[PATCH 1/7] mm/memory.c: Fix race when faulting a device private page

2022-09-26 Thread Alistair Popple
o see if it's expected or not. Signed-off-by: Alistair Popple --- arch/powerpc/kvm/book3s_hv_uvmem.c | 15 ++- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++-- drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +--

[PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-26 Thread Alistair Popple
callbacks have all been freed. Fix this by migrating any mappings back to normal CPU memory prior to freeing the GPU memory chunks and associated device private pages. Signed-off-by: Alistair Popple --- I assume the AMD driver might have a similar issue. However I can't see where device privat

[PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()

2022-09-26 Thread Alistair Popple
. Refactor out the core functionality so that it is not specific to fault handling. Signed-off-by: Alistair Popple --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 59 +-- 1 file changed, 29 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b

[PATCH 2/7] mm: Free device private pages have zero refcount

2022-09-26 Thread Alistair Popple
ns such as get_page_unless_zero(). Signed-off-by: Alistair Popple --- arch/powerpc/kvm/book3s_hv_uvmem.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 + drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + lib/test_hmm.c | 1 + mm/memremap.c| 5

[PATCH] mm/gup.c: Fix formating in check_and_migrate_movable_page()

2022-07-20 Thread Alistair Popple
Commit b05a79d4377f ("mm/gup: migrate device coherent pages when pinning instead of failing") added a badly formatted if statement. Fix it. Signed-off-by: Alistair Popple Reported-by: David Hildenbrand --- Apologies Andrew for missing this. Hopefully this fixes things. mm/gup.c |

[PATCH] mm/gup.c: Fix formating in check_and_migrate_movable_page()

2022-07-20 Thread Alistair Popple
Commit b05a79d4377f ("mm/gup: migrate device coherent pages when pinning instead of failing") added a badly formatted if statement. Fix it. Signed-off-by: Alistair Popple Reported-by: David Hildenbrand --- Apologies Andrew for missing this. Hopefully this fixes things. mm/gup.c |

[PATCH] mm/gup: migrate device coherent pages when pinning instead of failing

2022-07-16 Thread Alistair Popple
: Alistair Popple Acked-by: Felix Kuehling Signed-off-by: Christoph Hellwig --- This patch hopefully addresses all of David's comments. It replaces both my "mm: remove the vma check in migrate_vma_setup()" and "mm/gup: migrate device coherent pages when pinning instead of failing&qu

Re: [PATCH v8 06/15] mm: remove the vma check in migrate_vma_setup()

2022-07-14 Thread Alistair Popple
David Hildenbrand writes: > On 07.07.22 21:03, Alex Sierra wrote: >> From: Alistair Popple >> >> migrate_vma_setup() checks that a valid vma is passed so that the page >> tables can be walked to find the pfns associated with a given address >> range. However i

Re: [PATCH v8 07/15] mm/gup: migrate device coherent pages when pinning instead of failing

2022-07-14 Thread Alistair Popple
David Hildenbrand writes: > On 07.07.22 21:03, Alex Sierra wrote: >> From: Alistair Popple >> >> Currently any attempts to pin a device coherent page will fail. This is >> because device coherent pages need to be managed by a device driver, and >> pinning

Re: [PATCH v7 04/14] mm: add device coherent vma selection for memory migration

2022-06-30 Thread Alistair Popple
David Hildenbrand writes: > On 29.06.22 05:54, Alex Sierra wrote: >> This case is used to migrate pages from device memory, back to system >> memory. Device coherent type memory is cache coherent from device and CPU >> point of view. >> >> Signed-off-by: Alex Sierra >> Acked-by: Felix Kuehling

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-22 Thread Alistair Popple
David Hildenbrand writes: > On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote: >> >> On 6/21/2022 7:25 AM, David Hildenbrand wrote: >>> On 21.06.22 13:55, Alistair Popple wrote: >>>> David Hildenbrand writes: >>>> >>>>> On 2

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread Alistair Popple
ew. >>>>>>>> This is used on platforms that have an advanced system bus (like CAPI >>>>>>>> or CXL). Any page of a process can be migrated to such memory. However, >>>>>>>> no one should be allowed to pin such memory so that it c

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-20 Thread Alistair Popple
Oded Gabbay writes: > On Mon, Jun 20, 2022 at 3:33 AM Alistair Popple wrote: >> >> >> Oded Gabbay writes: >> >> > On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex) >> > wrote: >> >> >> >> >> >&g

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-20 Thread Alistair Popple
> >> evicted. >> >> >> >> Signed-off-by: Alex Sierra >> >> Acked-by: Felix Kuehling >> >> Reviewed-by: Alistair Popple >> >> [hch: rebased ontop of the refcount changes, >> >>removed is_dev_private_or_coherent_page]

Re: [PATCH v5 02/13] mm: handling Non-LRU pages returned by vm_normal_pages

2022-06-08 Thread Alistair Popple
I can't see any issues with this now so: Reviewed-by: Alistair Popple Alex Sierra writes: > With DEVICE_COHERENT, we'll soon have vm_normal_pages() return > device-managed anonymous pages that are not LRU pages. Although they > behave like normal pages for purposes of

Re: [PATCH v3 02/13] mm: handling Non-LRU pages returned by vm_normal_pages

2022-05-27 Thread Alistair Popple
Felix Kuehling writes: > Am 2022-05-25 um 00:11 schrieb Alistair Popple: >> Alex Sierra writes: >> >>> With DEVICE_COHERENT, we'll soon have vm_normal_pages() return >>> device-managed anonymous pages that are not LRU pages. Although they >>> b

Re: [PATCH v3 02/13] mm: handling Non-LRU pages returned by vm_normal_pages

2022-05-27 Thread Alistair Popple
"Sierra Guiza, Alejandro (Alex)" writes: > On 5/24/2022 11:11 PM, Alistair Popple wrote: >> Alex Sierra writes: >> >>> With DEVICE_COHERENT, we'll soon have vm_normal_pages() return >>> device-managed anonymous pages that are not LRU pages

Re: [PATCH v3 02/13] mm: handling Non-LRU pages returned by vm_normal_pages

2022-05-25 Thread Alistair Popple
Alex Sierra writes: > With DEVICE_COHERENT, we'll soon have vm_normal_pages() return > device-managed anonymous pages that are not LRU pages. Although they > behave like normal pages for purposes of mapping in CPU page, and for > COW. They do not support LRU lists, NUMA migration or THP. > > We

Re: [PATCH v2 11/13] mm: handling Non-LRU pages returned by vm_normal_pages

2022-05-23 Thread Alistair Popple
Technically I think this patch should be earlier in the series. As I understand it patch 1 allows DEVICE_COHERENT pages to be inserted in the page tables and therefore makes it possible for page table walkers to see non-LRU pages. Some more comments below: Alex Sierra writes: > With DEVICE_CO

Re: [PATCH v1 14/15] tools: add hmm gup tests for device coherent type

2022-05-16 Thread Alistair Popple
type(variant->device_number)) { > + ASSERT_EQ(HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL | > HMM_DMIRROR_PROT_WRITE, m[0]); > + ASSERT_EQ(HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL | > HMM_DMIRROR_PROT_WRITE, m[1]); > + } else { > +

Re: [PATCH v1 01/15] mm: add zone device coherent type memory support

2022-05-11 Thread Alistair Popple
Alex Sierra writes: [...] > diff --git a/mm/rmap.c b/mm/rmap.c > index fedb82371efe..d57102cd4b43 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1995,7 +1995,8 @@ void try_to_migrate(struct folio *folio, enum ttu_flags > flags) > TTU_SYNC))) >

Re: [PATCH v1 04/15] mm: add device coherent checker to remove migration pte

2022-05-11 Thread Alistair Popple
"Sierra Guiza, Alejandro (Alex)" writes: > @apop...@nvidia.com Could you please check this patch? It's somehow related > to migrate_device_page() for long term device coherent pages. > > Regards, > Alex Sierra >> -Original Message- >> From: amd-gfx On Behalf Of Alex >> Sierra >> Sent:

Re: [PATCH v1 04/15] mm: add device coherent checker to remove migration pte

2022-05-06 Thread Alistair Popple
"Sierra Guiza, Alejandro (Alex)" writes: > @apop...@nvidia.com Could you please check this patch? It's somehow related to > migrate_device_page() for long term device coherent pages. Sure thing. This whole series is in my queue of things to review once I make it home from LSF/MM. - Alistair >

Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-16 Thread Alistair Popple
Felix Kuehling writes: > On 2022-03-11 04:16, David Hildenbrand wrote: >> On 10.03.22 18:26, Alex Sierra wrote: >>> DEVICE_COHERENT pages introduce a subtle distinction in the way >>> "normal" pages can be used by various callers throughout the kernel. >>> They behave like normal pages for purpos

Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-16 Thread Alistair Popple
Felix Kuehling writes: > Am 2022-03-10 um 14:25 schrieb Matthew Wilcox: >> On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote: >>> @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, >>> unsigned long addr, >>>* PFNMAP mappings in order to support COWable mappi

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-18 Thread Alistair Popple
Felix Kuehling writes: > Am 2022-02-16 um 07:26 schrieb Jason Gunthorpe: >> The other place that needs careful audit is all the callers using >> vm_normal_page() - they must all be able to accept a ZONE_DEVICE page >> if we don't set pte_devmap. > > How much code are we talking about here? A quic

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-16 Thread Alistair Popple
Jason Gunthorpe writes: > On Wed, Feb 16, 2022 at 09:31:03AM +0100, David Hildenbrand wrote: >> On 16.02.22 03:36, Alistair Popple wrote: >> > On Wednesday, 16 February 2022 1:03:57 PM AEDT Jason Gunthorpe wrote: >> >> On Wed, Feb 16, 2022 at 12:23:44P

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-16 Thread Alistair Popple
Jason Gunthorpe writes: > On Tue, Feb 15, 2022 at 04:35:56PM -0500, Felix Kuehling wrote: >> >> On 2022-02-15 14:41, Jason Gunthorpe wrote: >> > On Tue, Feb 15, 2022 at 07:32:09PM +0100, Christoph Hellwig wrote: >> > > On Tue, Feb 15, 2022 at 10:45:24AM -0400, Jason Gunthorpe wrote: >> > > > > Do

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-16 Thread Alistair Popple
On Wednesday, 16 February 2022 1:03:57 PM AEDT Jason Gunthorpe wrote: > On Wed, Feb 16, 2022 at 12:23:44PM +1100, Alistair Popple wrote: > > > Device private and device coherent pages are not marked with pte_devmap and > > they > > are backed by a struct page. The only

Re: [PATCH v2 2/3] mm/gup.c: Migrate device coherent pages when pinning instead of failing

2022-02-14 Thread Alistair Popple
John Hubbard writes: > On 2/11/22 18:51, Alistair Popple wrote: […] >>> See below… >>> >>>> + } >>>> + >>>> + pages[i] = migrate_device_page(head, gup_flags); >> migrate_device_page() will return

Re: [PATCH v6 01/10] mm: add zone device coherent type memory support

2022-02-14 Thread Alistair Popple
>>> or CXL). Any page of a process can be migrated to such memory. However, >>> no one should be allowed to pin such memory so that it can always be >>> evicted. >>> >>> Signed-off-by: Alex Sierra >>> Acked-by: Felix Kuehling >>> Reviewed

Re: [PATCH v2 2/3] mm/gup.c: Migrate device coherent pages when pinning instead of failing

2022-02-14 Thread Alistair Popple
On Saturday, 12 February 2022 1:10:29 PM AEDT John Hubbard wrote: > On 2/6/22 20:26, Alistair Popple wrote: > > Currently any attempts to pin a device coherent page will fail. This is > > because device coherent pages need to be managed by a device driver, and > > pinning

Re: [PATCH v2 2/3] mm/gup.c: Migrate device coherent pages when pinning instead of failing

2022-02-11 Thread Alistair Popple
On Thursday, 10 February 2022 10:47:35 PM AEDT David Hildenbrand wrote: > On 10.02.22 12:39, Alistair Popple wrote: > > On Thursday, 10 February 2022 9:53:38 PM AEDT David Hildenbrand wrote: > >> On 07.02.22 05:26, Alistair Popple wrote: > >>> Currently any attempts

Re: [PATCH v2 2/3] mm/gup.c: Migrate device coherent pages when pinning instead of failing

2022-02-10 Thread Alistair Popple
On Thursday, 10 February 2022 9:53:38 PM AEDT David Hildenbrand wrote: > On 07.02.22 05:26, Alistair Popple wrote: > > Currently any attempts to pin a device coherent page will fail. This is > > because device coherent pages need to be managed by a device driver, and > > pinni

Re: start sorting out the ZONE_DEVICE refcount mess v2

2022-02-10 Thread Alistair Popple
On Thursday, 10 February 2022 6:28:01 PM AEDT Christoph Hellwig wrote: [...] > Changes since v1: > - add a missing memremap.h include in memcontrol.c > - include rebased versions of the device coherent support and >device coherent migration support series as well as additional >cleanup

Re: [PATCH 11/27] mm: refactor the ZONE_DEVICE handling in migrate_vma_insert_page

2022-02-10 Thread Alistair Popple
Reviewed-by: Alistair Popple On Thursday, 10 February 2022 6:28:12 PM AEDT Christoph Hellwig wrote: > Make the flow a little more clear and prepare for adding a new > ZONE_DEVICE memory type. > > Signed-off-by: Christoph Hellwig > --- > mm/migrate.c | 31 +++---

Re: [PATCH 12/27] mm: refactor the ZONE_DEVICE handling in migrate_vma_pages

2022-02-10 Thread Alistair Popple
Reviewed-by: Alistair Popple On Thursday, 10 February 2022 6:28:13 PM AEDT Christoph Hellwig wrote: > Make the flow a little more clear and prepare for adding a new > ZONE_DEVICE memory type. > > Signed-off-by: Christoph Hellwig > --- > mm/migrate.c | 27 ---

Re: [PATCH 14/27] mm: build migrate_vma_* for all configs with ZONE_DEVICE support

2022-02-10 Thread Alistair Popple
Thanks, it's also better than more stubbed functions. Reviewed-by: Alistair Popple On Thursday, 10 February 2022 6:28:15 PM AEDT Christoph Hellwig wrote: > This code will be used for device coherent memory as well in a bit, > so relax the ifdef a bit. > > Signed-off-by:

Re: [PATCH 13/27] mm: move the migrate_vma_* device migration code into it's own file

2022-02-10 Thread Alistair Popple
I got the following build error: /data/source/linux/mm/migrate_device.c: In function ‘migrate_vma_collect_pmd’: /data/source/linux/mm/migrate_device.c:242:3: error: implicit declaration of function ‘flush_tlb_range’; did you mean ‘flush_pmd_tlb_range’? [-Werror=implicit-function-declaration] 2

Re: [PATCH 6/8] mm: don't include in

2022-02-09 Thread Alistair Popple
On Thursday, 10 February 2022 4:48:36 AM AEDT Christoph Hellwig wrote: > On Mon, Feb 07, 2022 at 04:19:29PM -0500, Felix Kuehling wrote: > > > > Am 2022-02-07 um 01:32 schrieb Christoph Hellwig: > >> Move the check for the actual pgmap types that need the free at refcount > >> one behavior into the

[PATCH v2 1/3] migrate.c: Remove vma check in migrate_vma_setup()

2022-02-07 Thread Alistair Popple
x27;t required. Signed-off-by: Alistair Popple Acked-by: Felix Kuehling --- Changes for v2: - Added Felix's Acked-by mm/migrate.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index a9aed12..0d6570d 10064

[PATCH v2 2/3] mm/gup.c: Migrate device coherent pages when pinning instead of failing

2022-02-07 Thread Alistair Popple
accessible from the CPU so can be migrated just like pinning ZONE_MOVABLE pages. So instead of failing all attempts to pin them first try migrating them out of ZONE_DEVICE. Signed-off-by: Alistair Popple Acked-by: Felix Kuehling --- Changes for v2: - Added Felix's Acked-by - Fixed mi

Re: [PATCH 2/3] mm/gup.c: Migrate device coherent pages when pinning instead of failing

2022-02-07 Thread Alistair Popple
On Wednesday, 2 February 2022 2:03:01 AM AEDT Felix Kuehling wrote: > > Am 2022-02-01 um 02:05 schrieb Alistair Popple: > > Currently any attempts to pin a device coherent page will fail. This is > > because device coherent pages need to be managed by a device driver, and >

[PATCH v2 0/3] Migrate device coherent pages on get_user_pages()

2022-02-07 Thread Alistair Popple
- Rebased on to linux-next-20220204 Alex Sierra (1): tools: add hmm gup test for long term pinned device pages Alistair Popple (2): migrate.c: Remove vma check in migrate_vma_setup() mm/gup.c: Migrate device coherent pages when pinning instead of failing mm/gup.c

[PATCH v2 3/3] tools: add hmm gup test for long term pinned device pages

2022-02-07 Thread Alistair Popple
From: Alex Sierra The intention is to test device coherent type pages that have been called through get user pages with PIN_LONGTERM flag set. These pages should get migrated back to normal system memory. Signed-off-by: Alex Sierra Signed-off-by: Alistair Popple Reviewed-by: Felix Kuehling

Re: [PATCH v5 09/10] tools: update hmm-test to support device coherent type

2022-02-01 Thread Alistair Popple
Oh sorry, I had looked at this but forgotten to add my reviewed by: Reviewed-by: Alistair Popple On Tuesday, 1 February 2022 10:27:25 AM AEDT Sierra Guiza, Alejandro (Alex) wrote: > Hi Alistair, > This is the last patch to be reviewed from this series. It already has > the changes fr

[PATCH 2/3] mm/gup.c: Migrate device coherent pages when pinning instead of failing

2022-02-01 Thread Alistair Popple
accessible from the CPU so can be migrated just like pinning ZONE_MOVABLE pages. So instead of failing all attempts to pin them first try migrating them out of ZONE_DEVICE. Signed-off-by: Alistair Popple --- mm/gup.c | 105 ++-- 1 file changed

[PATCH 0/3] Migrate device coherent pages on get_user_pages()

2022-02-01 Thread Alistair Popple
ong term pinned device pages Alistair Popple (2): migrate.c: Remove vma check in migrate_vma_setup() mm/gup.c: Migrate device coherent pages when pinning instead of failing mm/gup.c | 105 +++--- mm/migrate.c | 34

[PATCH 1/3] migrate.c: Remove vma check in migrate_vma_setup()

2022-02-01 Thread Alistair Popple
x27;t required. Signed-off-by: Alistair Popple --- mm/migrate.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index d3cc358..31ba8ca 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2581,24 +2581,24 @

[PATCH 3/3] tools: add hmm gup test for long term pinned device pages

2022-02-01 Thread Alistair Popple
From: Alex Sierra The intention is to test device coherent type pages that have been called through get user pages with PIN_LONGTERM flag set. These pages should get migrated back to normal system memory. Signed-off-by: Alex Sierra Signed-off-by: Alistair Popple --- tools/testing/selftests

Re: [PATCH] mm: add device coherent vma selection for memory migration

2022-01-31 Thread Alistair Popple
Thanks for fixing. I'm guessing Andrew will want you to resend this as part of a new v6 series, but please add: Reviewed-by: Alistair Popple On Tuesday, 1 February 2022 6:48:13 AM AEDT Alex Sierra wrote: > This case is used to migrate pages from device memory, back to system > mem

Re: [PATCH v5 01/10] mm: add zone device coherent type memory support

2022-01-31 Thread Alistair Popple
Looks good, feel free to add: Reviewed-by: Alistair Popple On Saturday, 29 January 2022 7:08:16 AM AEDT Alex Sierra wrote: > Device memory that is cache coherent from device and CPU point of view. > This is used on platforms that have an advanced system bus (like CAPI > or CXL). Any

Re: [PATCH v5 02/10] mm: add device coherent vma selection for memory migration

2022-01-31 Thread Alistair Popple
On Saturday, 29 January 2022 7:08:17 AM AEDT Alex Sierra wrote: [...] > struct migrate_vma { > diff --git a/mm/migrate.c b/mm/migrate.c > index cd137aedcfe5..d3cc3589e1e8 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2264,7 +2264,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, >

Re: [PATCH v4 02/10] mm: add device coherent vma selection for memory migration

2022-01-28 Thread Alistair Popple
On Thursday, 27 January 2022 2:09:41 PM AEDT Alex Sierra wrote: [...] > diff --git a/mm/migrate.c b/mm/migrate.c > index 277562cd4cf5..2b3375e165b1 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2340,8 +2340,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > if (

Re: [PATCH v4 06/10] lib: test_hmm add ioctl to get zone device type

2022-01-28 Thread Alistair Popple
Reviewed-by: Alistair Popple On Thursday, 27 January 2022 2:09:45 PM AEDT Alex Sierra wrote: > new ioctl cmd added to query zone device type. This will be > used once the test_hmm adds zone device coherent type. > > Signed-off-by: Alex Sierra > --- > lib/te

Re: [PATCH v4 07/10] lib: test_hmm add module param for zone device type

2022-01-28 Thread Alistair Popple
Thanks for the updates, looks good now. Reviewed-by: Alistair Popple On Thursday, 27 January 2022 2:09:46 PM AEDT Alex Sierra wrote: > In order to configure device coherent in test_hmm, two module parameters > should be passed, which correspond to the SP start address of each >

Re: [PATCH v4 01/10] mm: add zone device coherent type memory support

2022-01-28 Thread Alistair Popple
On Thursday, 27 January 2022 2:09:40 PM AEDT Alex Sierra wrote: [...] > diff --git a/mm/migrate.c b/mm/migrate.c > index 1852d787e6ab..277562cd4cf5 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -362,7 +362,7 @@ static int expected_page_refs(struct address_space > *mapping, struct page *pa

Re: [PATCH v4 04/10] drm/amdkfd: add SPM support for SVM

2022-01-28 Thread Alistair Popple
On Thursday, 27 January 2022 2:09:43 PM AEDT Alex Sierra wrote: [...] > @@ -984,3 +990,4 @@ int svm_migrate_init(struct amdgpu_device *adev) > > return 0; > } > + > git-am complained about this when I applied the series. Given you have to rebase anyway it would be worth fixing this.

Re: [PATCH v4 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type

2022-01-28 Thread Alistair Popple
On Thursday, 27 January 2022 2:09:42 PM AEDT Alex Sierra wrote: > Avoid long term pinning for Coherent device type pages. This could > interfere with their own device memory manager. For now, we are just > returning error for PIN_LONGTERM Coherent device type pages. Eventually, > these type of page

Re: [PATCH v4 08/10] lib: add support for device coherent type in test_hmm

2022-01-28 Thread Alistair Popple
I haven't tested the change which checks that pages migrated back to sysmem, but it looks ok so: Reviewed-by: Alistair Popple On Thursday, 27 January 2022 2:09:47 PM AEDT Alex Sierra wrote: > Device Coherent type uses device memory that is coherently accesible by > the CPU. This cou

Re: [PATCH v3 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type

2022-01-20 Thread Alistair Popple
On Thursday, 20 January 2022 11:36:21 PM AEDT Joao Martins wrote: > On 1/10/22 22:31, Alex Sierra wrote: > > Avoid long term pinning for Coherent device type pages. This could > > interfere with their own device memory manager. For now, we are just > > returning error for PIN_LONGTERM Coherent devi

Re: [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping

2022-01-20 Thread Alistair Popple
On Wednesday, 12 January 2022 10:06:03 PM AEDT Alistair Popple wrote: > I have been looking at this in relation to the migration code and noticed we > have the following in try_to_migrate(): > > if (is_zone_device_page(page) && !is_device_private_page(page)) >

Re: [PATCH v3 08/10] lib: add support for device coherent type in test_hmm

2022-01-20 Thread Alistair Popple
On Tuesday, 11 January 2022 9:31:59 AM AEDT Alex Sierra wrote: > Device Coherent type uses device memory that is coherently accesible by > the CPU. This could be shown as SP (special purpose) memory range > at the BIOS-e820 memory enumeration. If no SP memory is supported in > system, this could be

Re: [PATCH v3 10/10] tools: update test_hmm script to support SP config

2022-01-20 Thread Alistair Popple
Looks good, Reviewed-by: Alistair Popple On Tuesday, 11 January 2022 9:32:01 AM AEDT Alex Sierra wrote: > Add two more parameters to set spm_addr_dev0 & spm_addr_dev1 > addresses. These two parameters configure the start SP > addresses for each device in test_hmm driver. > Co

Re: [PATCH v3 07/10] lib: test_hmm add module param for zone device type

2022-01-20 Thread Alistair Popple
Thanks for splitting the coherent devices into separate device nodes. Couple of comments below. On Tuesday, 11 January 2022 9:31:58 AM AEDT Alex Sierra wrote: > In order to configure device coherent in test_hmm, two module parameters > should be passed, which correspond to the SP start address of

Re: [PATCH v3 06/10] lib: test_hmm add ioctl to get zone device type

2022-01-20 Thread Alistair Popple
On Tuesday, 11 January 2022 9:31:57 AM AEDT Alex Sierra wrote: [...] > +enum { > + /* 0 is reserved to catch uninitialized type fields */ This seems unnecessary and can be dropped to start at zero. Reviewed-by: Alistair Popple > + HMM_DMIRROR_MEMORY_DEVICE_PR

Re: [PATCH v3 09/10] tools: update hmm-test to support device coherent type

2022-01-20 Thread Alistair Popple
On Tuesday, 11 January 2022 9:32:00 AM AEDT Alex Sierra wrote: > Test cases such as migrate_fault and migrate_multiple, were modified to > explicit migrate from device to sys memory without the need of page > faults, when using device coherent type. > > Snapshot test case updated to read memory de

Re: [PATCH v3 01/10] mm: add zone device coherent type memory support

2022-01-19 Thread Alistair Popple
On Tuesday, 11 January 2022 9:31:52 AM AEDT Alex Sierra wrote: > Device memory that is cache coherent from device and CPU point of view. > This is used on platforms that have an advanced system bus (like CAPI > or CXL). Any page of a process can be migrated to such memory. However, > no one should

Re: [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping

2022-01-12 Thread Alistair Popple
I have been looking at this in relation to the migration code and noticed we have the following in try_to_migrate(): if (is_zone_device_page(page) && !is_device_private_page(page)) return; Which if I'm understanding correctly means that migration of device coherent pages w

Re: [PATCH v2 03/11] mm/gup: migrate PIN_LONGTERM dev coherent pages to system

2021-12-09 Thread Alistair Popple
On Friday, 10 December 2021 3:54:31 AM AEDT Sierra Guiza, Alejandro (Alex) wrote: > > On 12/9/2021 10:29 AM, Felix Kuehling wrote: > > Am 2021-12-09 um 5:53 a.m. schrieb Alistair Popple: > >> On Thursday, 9 December 2021 5:55:26 AM AEDT Sierra Guiza, Alejandro > >&g

Re: [PATCH v2 03/11] mm/gup: migrate PIN_LONGTERM dev coherent pages to system

2021-12-09 Thread Alistair Popple
On Thursday, 9 December 2021 12:53:45 AM AEDT Jason Gunthorpe wrote: > > I think a similar problem exists for device private fault handling as well > > and > > it has been on my list of things to fix for a while. I think the solution > > is to > > call try_get_page(), except it doesn't work with

Re: [PATCH v2 03/11] mm/gup: migrate PIN_LONGTERM dev coherent pages to system

2021-12-09 Thread Alistair Popple
On Thursday, 9 December 2021 5:55:26 AM AEDT Sierra Guiza, Alejandro (Alex) wrote: > > On 12/8/2021 11:30 AM, Felix Kuehling wrote: > > Am 2021-12-08 um 11:58 a.m. schrieb Felix Kuehling: > >> Am 2021-12-08 um 6:31 a.m. schrieb Alistair Popple: > >>> On Tuesday

Re: [PATCH v2 03/11] mm/gup: migrate PIN_LONGTERM dev coherent pages to system

2021-12-09 Thread Alistair Popple
On Tuesday, 7 December 2021 5:52:43 AM AEDT Alex Sierra wrote: > Avoid long term pinning for Coherent device type pages. This could > interfere with their own device memory manager. > If caller tries to get user device coherent pages with PIN_LONGTERM flag > set, those pages will be migrated back t

Re: [PATCH v1 1/9] mm: add zone device coherent type memory support

2021-11-23 Thread Alistair Popple
On Tuesday, 23 November 2021 4:16:55 AM AEDT Felix Kuehling wrote: [...] > > Right, so long as my fix goes in I don't think there is anything wrong with > > pinning device public pages. Agree that we should avoid FOLL_LONGTERM pins > > for > > device memory though. I think the way to do that is

  1   2   >