Re: [PATCH 2/2] mm/selftests: Don't prefault in gup_longterm tests

2024-04-29 Thread David Hildenbrand
On 29.04.24 15:10, Peter Xu wrote: On Mon, Apr 29, 2024 at 09:28:15AM +0200, David Hildenbrand wrote: On 28.04.24 21:01, Peter Xu wrote: Prefault, especially with RW, makes the GUP test too easy, and may not yet reach the core of the test. For example, R/O longterm pins will just hit

Re: [PATCH 2/2] mm/selftests: Don't prefault in gup_longterm tests

2024-04-29 Thread David Hildenbrand
. This tortures more code paths at least to cover the unshare care for R/O longterm pins, in which case the first R/O GUP attempt will fault in the page R/O first, then the 2nd will go through the unshare path, checking whether an unshare is needed. Cc: David Hildenbrand Signed-off-by: Peter Xu

Re: [PATCH 1/2] mm/gup: Fix hugepd handling in hugetlb rework

2024-04-29 Thread David Hildenbrand
Reported-by: David Hildenbrand Fixes: a12083d721d7 ("mm/gup: handle hugepd for follow_page()") Signed-off-by: Peter Xu --- LGTM Reviewed-by: David Hildenbrand -- Cheers, David / dhildenb

Re: [PATCH v1 1/3] mm/gup: consistently name GUP-fast functions

2024-04-27 Thread David Hildenbrand
On 26.04.24 23:58, Peter Xu wrote: On Fri, Apr 26, 2024 at 11:33:08PM +0200, David Hildenbrand wrote: I raised this topic in the past, and IMHO we either (a) never should have added COW support; or (b) added COW support by using ordinary anonymous memory (hey, partial mappings of hugetlb pages

Re: [PATCH v1 1/3] mm/gup: consistently name GUP-fast functions

2024-04-26 Thread David Hildenbrand
th hugetlb (2048 kB) not ok 323 No leak from child into parent And it looks like it was always failing.. perhaps since the start? We Yes! commit 7dad331be7816103eba8c12caeb88fbd3599c0b9 Author: David Hildenbrand Date: Tue Sep 27 13:01:17 2022 +0200 selftests/vm: anon_cow: hug

Re: [PATCH v1 1/3] mm/gup: consistently name GUP-fast functions

2024-04-26 Thread David Hildenbrand
On 26.04.24 18:12, Peter Xu wrote: On Fri, Apr 26, 2024 at 09:44:58AM -0400, Peter Xu wrote: On Fri, Apr 26, 2024 at 09:17:47AM +0200, David Hildenbrand wrote: On 02.04.24 14:55, David Hildenbrand wrote: Let's consistently call the "fast-only" part of GUP "GUP-fast" and

Re: [PATCH v1 1/3] mm/gup: consistently name GUP-fast functions

2024-04-26 Thread David Hildenbrand
On 02.04.24 14:55, David Hildenbrand wrote: Let's consistently call the "fast-only" part of GUP "GUP-fast" and rename all relevant internal functions to start with "gup_fast", to make it clearer that this is not ordinary GUP. The current mixture of "lockless&

Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-12 Thread David Hildenbrand
On 11.04.24 18:55, Paolo Bonzini wrote: On Mon, Apr 8, 2024 at 3:56 PM Peter Xu wrote: Paolo, I may miss a bunch of details here (as I still remember some change_pte patches previously on the list..), however not sure whether we considered enable it? Asked because I remember Andrea used to

Re: [PATCH 4/4] mm: replace set_pte_at_notify() with just set_pte_at()

2024-04-08 Thread David Hildenbrand
-by: David Hildenbrand -- Cheers, David / dhildenb

Re: [PATCH 3/4] mmu_notifier: remove the .change_pte() callback

2024-04-08 Thread David Hildenbrand
be removed. For now, leave in place set_pte_at_notify() even though it is just a synonym for set_pte_at(). Signed-off-by: Paolo Bonzini --- Reviewed-by: David Hildenbrand -- Cheers, David / dhildenb

Re: [PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code

2024-04-02 Thread David Hildenbrand
On 02.04.24 19:57, Peter Xu wrote: On Tue, Apr 02, 2024 at 06:39:31PM +0200, David Hildenbrand wrote: On 02.04.24 18:20, Peter Xu wrote: On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote: On 02.04.24 16:48, Ryan Roberts wrote: Hi Peter, Hey, Ryan, Thanks for the report

Re: [PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code

2024-04-02 Thread David Hildenbrand
On 02.04.24 18:00, Matthew Wilcox wrote: On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote: The oops trigger is at mm/gup.c:778: VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); So 2M passed ok, and its failing for 32M, which is cont-pmd. I'm

Re: [PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code

2024-04-02 Thread David Hildenbrand
On 02.04.24 18:20, Peter Xu wrote: On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote: On 02.04.24 16:48, Ryan Roberts wrote: Hi Peter, Hey, Ryan, Thanks for the report! On 27/03/2024 15:23, pet...@redhat.com wrote: From: Peter Xu Now follow_page() is ready to handle

Re: [PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code

2024-04-02 Thread David Hildenbrand
On 02.04.24 16:48, Ryan Roberts wrote: Hi Peter, On 27/03/2024 15:23, pet...@redhat.com wrote: From: Peter Xu Now follow_page() is ready to handle hugetlb pages in whatever form, and over all architectures. Switch to the generic code path. Time to retire hugetlb_follow_page_mask(),

[PATCH v1 3/3] mm: use "GUP-fast" instead "fast GUP" in remaining comments

2024-04-02 Thread David Hildenbrand
Let's fixup the remaining comments to consistently call that thing "GUP-fast". With this change, we consistently call it "GUP-fast". Reviewed-by: Mike Rapoport (IBM) Signed-off-by: David Hildenbrand --- mm/filemap.c| 2 +- mm/khugepaged.c | 2 +- 2 files changed

[PATCH v1 2/3] mm/treewide: rename CONFIG_HAVE_FAST_GUP to CONFIG_HAVE_GUP_FAST

2024-04-02 Thread David Hildenbrand
Nowadays, we call it "GUP-fast", the external interface includes functions like "get_user_pages_fast()", and we renamed all internal functions to reflect that as well. Let's make the config option reflect that. Reviewed-by: Mike Rapoport (IBM) Signed-off-by: David Hilden

[PATCH v1 0/3] mm/gup: consistently call it GUP-fast

2024-04-02 Thread David Hildenbrand
r.kernel.org Cc: linux...@vger.kernel.org Cc: linux...@kvack.org Cc: linux-perf-us...@vger.kernel.org Cc: linux-fsde...@vger.kernel.org Cc: linux-ri...@lists.infradead.org Cc: x...@kernel.org David Hildenbrand (3): mm/gup: consistently name GUP-fast functions mm/treewide: rename CONFIG_H

[PATCH v1 1/3] mm/gup: consistently name GUP-fast functions

2024-04-02 Thread David Hildenbrand
fast_permitted() is already properly named With "gup_fast()", we now even have a function that is referred to in comment in mm/mmu_gather.c. Reviewed-by: Jason Gunthorpe Reviewed-by: Mike Rapoport (IBM) Signed-off-by: David Hildenbrand --- mm/gup.c | 205 ---

Re: [PATCH v4 06/13] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2024-03-28 Thread David Hildenbrand
On 27.03.24 16:23, pet...@redhat.com wrote: From: Peter Xu Hugepd format for GUP is only used in PowerPC with hugetlbfs. There are some kernel usage of hugepd (can refer to hugepd_populate_kernel() for PPC_8XX), however those pages are not candidates for GUP. Commit a6e79df92e4a ("mm/gup:

Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast

2024-03-28 Thread David Hildenbrand
On 28.03.24 08:15, Mike Rapoport wrote: On Thu, Mar 28, 2024 at 07:09:13AM +0100, Arnd Bergmann wrote: On Thu, Mar 28, 2024, at 06:51, Vineet Gupta wrote: On 3/27/24 09:22, Arnd Bergmann wrote: On Wed, Mar 27, 2024, at 16:39, David Hildenbrand wrote: On 27.03.24 16:21, Peter Xu wrote

Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast

2024-03-27 Thread David Hildenbrand
On 27.03.24 16:46, Ryan Roberts wrote: Some of them look like mm-unstable issue, For example, arm64 fails with   CC  arch/arm64/mm/extable.o In file included from ./include/linux/hugetlb.h:828, from security/commoncap.c:19: ./arch/arm64/include/asm/hugetlb.h:25:34:

Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast

2024-03-27 Thread David Hildenbrand
On 27.03.24 16:21, Peter Xu wrote: On Wed, Mar 27, 2024 at 02:05:35PM +0100, David Hildenbrand wrote: Some cleanups around function names, comments and the config option of "GUP-fast" -- GUP without "lock" safety belts on. With this cleanup it's easy to judge which fu

Re: [PATCH RFC 1/3] mm/gup: consistently name GUP-fast functions

2024-03-27 Thread David Hildenbrand
On 27.03.24 14:52, Jason Gunthorpe wrote: On Wed, Mar 27, 2024 at 02:05:36PM +0100, David Hildenbrand wrote: Let's consistently call the "fast-only" part of GUP "GUP-fast" and rename all relevant internal functions to start with "gup_fast", to make it clearer

[PATCH RFC 3/3] mm: use "GUP-fast" instead "fast GUP" in remaining comments

2024-03-27 Thread David Hildenbrand
Let's fixup the remaining comments to consistently call that thing "GUP-fast". With this change, we consistently call it "GUP-fast". Signed-off-by: David Hildenbrand --- mm/filemap.c| 2 +- mm/khugepaged.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --

[PATCH RFC 2/3] mm/treewide: rename CONFIG_HAVE_FAST_GUP to CONFIG_HAVE_GUP_FAST

2024-03-27 Thread David Hildenbrand
Nowadays, we call it "GUP-fast", the external interface includes functions like "get_user_pages_fast()", and we renamed all internal functions to reflect that as well. Let's make the config option reflect that. Signed-off-by: David Hildenbrand --- arch/arm/Kconfig

[PATCH RFC 1/3] mm/gup: consistently name GUP-fast functions

2024-03-27 Thread David Hildenbrand
d -> gup_fast_devmap_pmd_leaf() * __gup_device_huge() -> gup_fast_devmap_leaf() Helper functions: * unpin_user_pages_lockless() -> gup_fast_unpin_user_pages() * gup_fast_folio_allowed() is already properly named * gup_fast_permitted() is alre

[PATCH RFC 0/3] mm/gup: consistently call it GUP-fast

2024-03-27 Thread David Hildenbrand
inux-m...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: linux...@kvack.org Cc: linux-perf-us...@vger.kernel.org Cc: linux-fsde...@vger.kernel.org Cc: x...@kernel.org David Hildenbrand (3): mm/gup: consistently name GUP-fast f

Re: [PATCH 2/4] mm: pgalloc: support address-conditional pmd allocation

2024-02-21 Thread David Hildenbrand
On 21.02.24 08:13, Christophe Leroy wrote: Le 20/02/2024 à 21:32, Maxwell Bland a écrit : [Vous ne recevez pas souvent de courriers de mbl...@motorola.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ] While other descriptors (e.g. pud) allow

Re: [PATCH v6 06/18] mm: Tidy up pte_next_pfn() definition

2024-02-15 Thread David Hildenbrand
) pte_advance_pfn(pte, 1) -#endif #ifndef set_ptes /** Acked-by: David Hildenbrand -- Cheers, David / dhildenb

Re: [PATCH v6 05/18] x86/mm: Convert pte_next_pfn() to pte_advance_pfn()

2024-02-15 Thread David Hildenbrand
FN_PTE_SHIFT)); + return __pte(pte_val(pte) - (nr << PFN_PTE_SHIFT)); + return __pte(pte_val(pte) + (nr << PFN_PTE_SHIFT)); } -#define pte_next_pfn pte_next_pfn +#define pte_advance_pfnpte_advance_pfn static inline int pte_present(pte_t a) { Reviewed-by: David Hildenb

Re: [PATCH v6 04/18] arm64/mm: Convert pte_next_pfn() to pte_advance_pfn()

2024-02-15 Thread David Hildenbrand
= pte_next_pfn(pte); + pte = pte_advance_pfn(pte, 1); Acked-by: David Hildenbrand -- Cheers, David / dhildenb

Re: [PATCH v6 03/18] mm: Introduce pte_advance_pfn() and use for pte_next_pfn()

2024-02-15 Thread David Hildenbrand
def set_ptes /** * set_ptes - Map consecutive pages to a contiguous range of addresses. Acked-by: David Hildenbrand -- Cheers, David / dhildenb

Re: [PATCH v3 12/15] mm/memory: pass PTE to copy_present_pte()

2024-02-14 Thread David Hildenbrand
On 29.01.24 13:46, David Hildenbrand wrote: We already read it, let's just forward it. This patch is based on work by Ryan Roberts. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- mm/memory.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm

[PATCH v3 02/10] mm/memory: handle !page case in zap_present_pte() separately

2024-02-14 Thread David Hildenbrand
detect shadow stack entries. But for shadow stack entries, the HW dirty bit (in combination with non-writable PTEs) is set by software. So for the arch_check_zapped_pte() check, we don't have to sync against HW setting the HW dirty bit concurrently, it is always set. Reviewed-by: Ryan Roberts

[PATCH v3 10/10] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-02-14 Thread David Hildenbrand
force inlining of a specialized variant using __always_inline with nr=1. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- include/linux/pgtable.h | 70 +++ mm/memory.c | 92 + 2 files changed, 136

[PATCH v3 09/10] mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing

2024-02-14 Thread David Hildenbrand
-by: Ryan Roberts Signed-off-by: David Hildenbrand --- mm/mmu_gather.c | 58 - 1 file changed, 43 insertions(+), 15 deletions(-) diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index d175c0f1e2c8..99b3e9408aa0 100644 --- a/mm/mmu_gather.c +++ b/mm

[PATCH v3 08/10] mm/mmu_gather: add __tlb_remove_folio_pages()

2024-02-14 Thread David Hildenbrand
wed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- arch/s390/include/asm/tlb.h | 17 +++ include/asm-generic/tlb.h | 8 + include/linux/mm_types.h| 20 mm/mmu_gather.c | 61 +++-- mm/swap.c

[PATCH v3 07/10] mm/mmu_gather: add tlb_remove_tlb_entries()

2024-02-14 Thread David Hildenbrand
() a macro). Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- arch/powerpc/include/asm/tlb.h | 2 ++ include/asm-generic/tlb.h | 20 2 files changed, 22 insertions(+) diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h index b3de6102a907

[PATCH v3 06/10] mm/mmu_gather: define ENCODED_PAGE_FLAG_DELAY_RMAP

2024-02-14 Thread David Hildenbrand
d-by: Ryan Roberts Signed-off-by: David Hildenbrand --- include/linux/mm_types.h | 17 +++-- mm/mmu_gather.c | 5 +++-- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8b611e13153e..1b89eec0d6df 100644 ---

[PATCH v3 05/10] mm/mmu_gather: pass "delay_rmap" instead of encoded page to __tlb_remove_page_size()

2024-02-14 Thread David Hildenbrand
that the next encoded page pointer in an array is actually "nr_pages". So pass page + delay_rmap flag instead of an encoded page, to handle the encoding internally. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- arch/s390/include/asm/tlb.h | 13 ++--- include/asm-gen

[PATCH v3 04/10] mm/memory: factor out zapping folio pte into zap_present_folio_pte()

2024-02-14 Thread David Hildenbrand
Let's prepare for further changes by factoring it out into a separate function. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- mm/memory.c | 53 - 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/mm/memory.c b

[PATCH v3 03/10] mm/memory: further separate anon and pagecache folio handling in zap_present_pte()

2024-02-14 Thread David Hildenbrand
and RSS. While at it, only call zap_install_uffd_wp_if_needed() if there is even any chance that pte_install_uffd_wp_if_needed() would do *something*. That is, just don't bother if uffd-wp does not apply. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- mm/memory.c | 16

[PATCH v3 01/10] mm/memory: factor out zapping of present pte into zap_present_pte()

2024-02-14 Thread David Hildenbrand
Let's prepare for further changes by factoring out processing of present PTEs. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- mm/memory.c | 94 ++--- 1 file changed, 53 insertions(+), 41 deletions(-) diff --git a/mm/memory.c b/mm

[PATCH v3 00/10] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-02-14 Thread David Hildenbrand
erts Cc: Catalin Marinas Cc: Yin Fengwei Cc: Michal Hocko Cc: Will Deacon Cc: "Aneesh Kumar K.V" Cc: Nick Piggin Cc: Peter Zijlstra Cc: Michael Ellerman Cc: Christophe Leroy Cc: "Naveen N. Rao" Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Alexander Gordeev Cc: C

Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-13 Thread David Hildenbrand
On 13.02.24 15:02, Ryan Roberts wrote: On 13/02/2024 13:45, David Hildenbrand wrote: On 13.02.24 14:33, Ard Biesheuvel wrote: On Tue, 13 Feb 2024 at 14:21, Ryan Roberts wrote: On 13/02/2024 13:13, David Hildenbrand wrote: On 13.02.24 14:06, Ryan Roberts wrote: On 13/02/2024 12:19, David

Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-13 Thread David Hildenbrand
On 13.02.24 14:33, Ard Biesheuvel wrote: On Tue, 13 Feb 2024 at 14:21, Ryan Roberts wrote: On 13/02/2024 13:13, David Hildenbrand wrote: On 13.02.24 14:06, Ryan Roberts wrote: On 13/02/2024 12:19, David Hildenbrand wrote: On 13.02.24 13:06, Ryan Roberts wrote: On 12/02/2024 20:38, Ryan

Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-13 Thread David Hildenbrand
On 13.02.24 14:20, Ryan Roberts wrote: On 13/02/2024 13:13, David Hildenbrand wrote: On 13.02.24 14:06, Ryan Roberts wrote: On 13/02/2024 12:19, David Hildenbrand wrote: On 13.02.24 13:06, Ryan Roberts wrote: On 12/02/2024 20:38, Ryan Roberts wrote: [...] +static inline bool mm_is_user

Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-13 Thread David Hildenbrand
On 13.02.24 14:06, Ryan Roberts wrote: On 13/02/2024 12:19, David Hildenbrand wrote: On 13.02.24 13:06, Ryan Roberts wrote: On 12/02/2024 20:38, Ryan Roberts wrote: [...] +static inline bool mm_is_user(struct mm_struct *mm) +{ +    /* + * Don't attempt to apply the contig bit to kernel

Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-13 Thread David Hildenbrand
On 13.02.24 13:06, Ryan Roberts wrote: On 12/02/2024 20:38, Ryan Roberts wrote: [...] +static inline bool mm_is_user(struct mm_struct *mm) +{ + /* +* Don't attempt to apply the contig bit to kernel mappings, because +* dynamically adding/removing the contig bit can cause

Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-13 Thread David Hildenbrand
On 12.02.24 21:38, Ryan Roberts wrote: [...] +static inline bool mm_is_user(struct mm_struct *mm) +{ + /* +* Don't attempt to apply the contig bit to kernel mappings, because +* dynamically adding/removing the contig bit can cause page faults. +* These racing

Re: [PATCH v5 03/25] mm: Make pte_next_pfn() a wrapper around pte_advance_pfn()

2024-02-13 Thread David Hildenbrand
On 12.02.24 22:34, Ryan Roberts wrote: On 12/02/2024 14:29, David Hildenbrand wrote: On 12.02.24 15:10, Ryan Roberts wrote: On 12/02/2024 12:14, David Hildenbrand wrote: On 02.02.24 09:07, Ryan Roberts wrote: The goal is to be able to advance a PTE by an arbitrary number of PFNs. So

Re: [PATCH v5 22/25] mm: Add pte_batch_hint() to reduce scanning in folio_pte_batch()

2024-02-12 Thread David Hildenbrand
On 12.02.24 16:47, Ryan Roberts wrote: On 12/02/2024 13:43, David Hildenbrand wrote: On 02.02.24 09:07, Ryan Roberts wrote: Some architectures (e.g. arm64) can tell from looking at a pte, if some follow-on ptes also map contiguous physical memory with the same pgprot. (for arm64

Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-12 Thread David Hildenbrand
On 12.02.24 16:34, Ryan Roberts wrote: On 12/02/2024 15:26, David Hildenbrand wrote: On 12.02.24 15:45, Ryan Roberts wrote: On 12/02/2024 13:54, David Hildenbrand wrote: If so, I wonder if we could instead do that comparison modulo the access/dirty bits, I think that would work

Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-12 Thread David Hildenbrand
On 12.02.24 15:45, Ryan Roberts wrote: On 12/02/2024 13:54, David Hildenbrand wrote: If so, I wonder if we could instead do that comparison modulo the access/dirty bits, I think that would work - but will need to think a bit more on it. and leave ptep_get_lockless() only reading a single

Re: [PATCH v5 03/25] mm: Make pte_next_pfn() a wrapper around pte_advance_pfn()

2024-02-12 Thread David Hildenbrand
On 12.02.24 15:10, Ryan Roberts wrote: On 12/02/2024 12:14, David Hildenbrand wrote: On 02.02.24 09:07, Ryan Roberts wrote: The goal is to be able to advance a PTE by an arbitrary number of PFNs. So introduce a new API that takes a nr param. We are going to remove pte_next_pfn() and replace

Re: [PATCH] mm/hugetlb: Move page order check inside hugetlb_cma_reserve()

2024-02-12 Thread David Hildenbrand
AGE_ORDER); cma_reserve_called = true; if (!hugetlb_cma_size) Reviewed-by: David Hildenbrand -- Cheers, David / dhildenb

Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-12 Thread David Hildenbrand
If so, I wonder if we could instead do that comparison modulo the access/dirty bits, I think that would work - but will need to think a bit more on it. and leave ptep_get_lockless() only reading a single entry? I think we will need to do something a bit less fragile. ptep_get() does collect

Re: [PATCH v5 23/25] arm64/mm: Implement pte_batch_hint()

2024-02-12 Thread David Hildenbrand
; + + return CONT_PTES - (((unsigned long)ptep >> 3) & (CONT_PTES - 1)); +} + /* * The below functions constitute the public API that arm64 presents to the * core-mm to manipulate PTE entries within their page tables (or at least this Reviewed-by: David Hildenbrand -- Cheers, David / dhildenb

Re: [PATCH v5 22/25] mm: Add pte_batch_hint() to reduce scanning in folio_pte_batch()

2024-02-12 Thread David Hildenbrand
(expected_pte, 1); - ptep++; + nr = pte_batch_hint(ptep, pte); + expected_pte = pte_advance_pfn(expected_pte, nr); + ptep += nr; } - return ptep - start_ptep; + return min(ptep - start_ptep, max_nr); } Acked-by: David Hildenb

Re: [PATCH v5 18/25] arm64/mm: Split __flush_tlb_range() to elide trailing DSB

2024-02-12 Thread David Hildenbrand
On 12.02.24 14:05, Ryan Roberts wrote: On 12/02/2024 12:44, David Hildenbrand wrote: On 02.02.24 09:07, Ryan Roberts wrote: Split __flush_tlb_range() into __flush_tlb_range_nosync() + __flush_tlb_range(), in the same way as the existing flush_tlb_page() arrangement. This allows calling

Re: [PATCH v5 18/25] arm64/mm: Split __flush_tlb_range() to elide trailing DSB

2024-02-12 Thread David Hildenbrand
dsb(ish); mmu_notifier_arch_invalidate_secondary_tlbs(); So I *suspect* having that DSB before mmu_notifier_arch_invalidate_secondary_tlbs() is fine. Hopefully, nothing in there relies on that placement. Maybe wort spelling out in the patch description Reviewed-by: David Hildenbrand -- Cheers, David / dhildenb

Re: [PATCH v5 03/25] mm: Make pte_next_pfn() a wrapper around pte_advance_pfn()

2024-02-12 Thread David Hildenbrand
On 02.02.24 09:07, Ryan Roberts wrote: The goal is to be able to advance a PTE by an arbitrary number of PFNs. So introduce a new API that takes a nr param. We are going to remove pte_next_pfn() and replace it with pte_advance_pfn(). As a first step, implement pte_next_pfn() as a wrapper around

Re: [PATCH v5 01/25] mm: Clarify the spec for set_ptes()

2024-02-12 Thread David Hildenbrand
* May be overridden by the architecture, or the architecture can define * set_pte() and PFN_PTE_SHIFT. * Acked-by: David Hildenbrand -- Cheers, David / dhildenb

Re: [PATCH v2 09/10] mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing

2024-02-12 Thread David Hildenbrand
On 12.02.24 12:21, Ryan Roberts wrote: On 12/02/2024 11:05, David Hildenbrand wrote: On 12.02.24 11:56, David Hildenbrand wrote: On 12.02.24 11:32, Ryan Roberts wrote: On 12/02/2024 10:11, David Hildenbrand wrote: Hi Ryan, -static void tlb_batch_pages_flush(struct mmu_gather *tlb) +static

Re: [PATCH v2 09/10] mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing

2024-02-12 Thread David Hildenbrand
On 12.02.24 11:56, David Hildenbrand wrote: On 12.02.24 11:32, Ryan Roberts wrote: On 12/02/2024 10:11, David Hildenbrand wrote: Hi Ryan, -static void tlb_batch_pages_flush(struct mmu_gather *tlb) +static void __tlb_batch_free_encoded_pages(struct mmu_gather_batch *batch)   { -    struct

Re: [PATCH v2 09/10] mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing

2024-02-12 Thread David Hildenbrand
On 12.02.24 11:32, Ryan Roberts wrote: On 12/02/2024 10:11, David Hildenbrand wrote: Hi Ryan, -static void tlb_batch_pages_flush(struct mmu_gather *tlb) +static void __tlb_batch_free_encoded_pages(struct mmu_gather_batch *batch)   { -    struct mmu_gather_batch *batch; - -    for (batch

Re: [PATCH v2 09/10] mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing

2024-02-12 Thread David Hildenbrand
Hi Ryan, -static void tlb_batch_pages_flush(struct mmu_gather *tlb) +static void __tlb_batch_free_encoded_pages(struct mmu_gather_batch *batch) { - struct mmu_gather_batch *batch; - - for (batch = >local; batch && batch->nr; batch = batch->next) { - struct

Re: [PATCH v2 08/10] mm/mmu_gather: add __tlb_remove_folio_pages()

2024-02-12 Thread David Hildenbrand
On 12.02.24 09:51, Ryan Roberts wrote: On 09/02/2024 22:15, David Hildenbrand wrote: Add __tlb_remove_folio_pages(), which will remove multiple consecutive pages that belong to the same large folio, instead of only a single page. We'll be using this function when optimizing unmapping/zapping

Re: [PATCH v3 01/15] arm64/mm: Make set_ptes() robust when OAs cross 48-bit boundary

2024-02-09 Thread David Hildenbrand
On 08.02.24 07:10, Mike Rapoport wrote: On Mon, Jan 29, 2024 at 01:46:35PM +0100, David Hildenbrand wrote: From: Ryan Roberts Since the high bits [51:48] of an OA are not stored contiguously in the PTE, there is a theoretical bug in set_ptes(), which just adds PAGE_SIZE to the pte to get

Re: [PATCH v5 00/25] Transparent Contiguous PTEs for User Mappings

2024-02-09 Thread David Hildenbrand
1) Convert READ_ONCE() -> ptep_get() 2) Convert set_pte_at() -> set_ptes() 3) All the "New layer" renames and addition of the trivial wrappers Yep that makes sense. I'll start prepping that today. I'll hold off reposting until I have your comments on 19-25. I'm also hoping that David will

[PATCH v2 09/10] mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing

2024-02-09 Thread David Hildenbrand
running into soft lockups, something else is already completely bogus. In the future, we might want to detect if handling cond_resched() is required at all, and just not do any of that with full preemption enabled. Signed-off-by: David Hildenbrand --- mm/mmu_gather.c | 50

[PATCH v2 10/10] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-02-09 Thread David Hildenbrand
force inlining of a specialized variant using __always_inline with nr=1. Signed-off-by: David Hildenbrand --- include/linux/pgtable.h | 70 +++ mm/memory.c | 92 + 2 files changed, 136 insertions(+), 26 deletions

[PATCH v2 08/10] mm/mmu_gather: add __tlb_remove_folio_pages()

2024-02-09 Thread David Hildenbrand
nts. As long as page freeing time primarily only depends on the number of involved folios, there is no effective change for !preempt configurations. However, we'll adjust tlb_batch_pages_flush() separately to handle corner cases where page freeing time grows proportionally with the actual memory size.

[PATCH v2 07/10] mm/mmu_gather: add tlb_remove_tlb_entries()

2024-02-09 Thread David Hildenbrand
() a macro). Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- arch/powerpc/include/asm/tlb.h | 2 ++ include/asm-generic/tlb.h | 20 2 files changed, 22 insertions(+) diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h index b3de6102a907

[PATCH v2 06/10] mm/mmu_gather: define ENCODED_PAGE_FLAG_DELAY_RMAP

2024-02-09 Thread David Hildenbrand
d-by: Ryan Roberts Signed-off-by: David Hildenbrand --- include/linux/mm_types.h | 17 +++-- mm/mmu_gather.c | 5 +++-- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8b611e13153e..1b89eec0d6df 100644 ---

[PATCH v2 05/10] mm/mmu_gather: pass "delay_rmap" instead of encoded page to __tlb_remove_page_size()

2024-02-09 Thread David Hildenbrand
that the next encoded page pointer in an array is actually "nr_pages". So pass page + delay_rmap flag instead of an encoded page, to handle the encoding internally. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- arch/s390/include/asm/tlb.h | 13 ++--- include/asm-gen

[PATCH v2 04/10] mm/memory: factor out zapping folio pte into zap_present_folio_pte()

2024-02-09 Thread David Hildenbrand
Let's prepare for further changes by factoring it out into a separate function. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- mm/memory.c | 53 - 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/mm/memory.c b

[PATCH v2 03/10] mm/memory: further separate anon and pagecache folio handling in zap_present_pte()

2024-02-09 Thread David Hildenbrand
and RSS. While at it, only call zap_install_uffd_wp_if_needed() if there is even any chance that pte_install_uffd_wp_if_needed() would do *something*. That is, just don't bother if uffd-wp does not apply. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- mm/memory.c | 16

[PATCH v2 02/10] mm/memory: handle !page case in zap_present_pte() separately

2024-02-09 Thread David Hildenbrand
detect shadow stack entries. But for shadow stack entries, the HW dirty bit (in combination with non-writable PTEs) is set by software. So for the arch_check_zapped_pte() check, we don't have to sync against HW setting the HW dirty bit concurrently, it is always set. Reviewed-by: Ryan Roberts

[PATCH v2 01/10] mm/memory: factor out zapping of present pte into zap_present_pte()

2024-02-09 Thread David Hildenbrand
Let's prepare for further changes by factoring out processing of present PTEs. Signed-off-by: David Hildenbrand --- mm/memory.c | 94 ++--- 1 file changed, 53 insertions(+), 41 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index

[PATCH v2 00/10] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-02-09 Thread David Hildenbrand
umar K.V" Cc: Nick Piggin Cc: Peter Zijlstra Cc: Michael Ellerman Cc: Christophe Leroy Cc: "Naveen N. Rao" Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Alexander Gordeev Cc: Christian Borntraeger Cc: Sven Schnelle Cc: Arnd Bergmann Cc: linux-a...@vger.kernel.org Cc: li

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
dontneed should hopefully/likely see a speedup. Yes, but that's almost exactly the same path as munmap, so I'm sure it really adds much for this particular series. Right, that's why I'm not including these measurements. dontneed vs. munmap is more about measuring the overhead of VMA

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 16:02, Ryan Roberts wrote: On 31/01/2024 14:29, David Hildenbrand wrote: Note that regarding NUMA effects, I mean when some memory access within the same socket is faster/slower even with only a single node. On AMD EPYC that's possible, depending on which core you are running

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
Note that regarding NUMA effects, I mean when some memory access within the same socket is faster/slower even with only a single node. On AMD EPYC that's possible, depending on which core you are running and on which memory controller the memory you want to access is located. If both are in

Re: [PATCH v1 0/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 15:08, Michal Hocko wrote: On Wed 31-01-24 10:26:13, Ryan Roberts wrote: IIRC there is an option to zero memory when it is freed back to the buddy? So that could be a place where time is proportional to size rather than proportional to folio count? But I think that option is

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
Nope: looks the same. I've taken my test harness out of the picture and done everything manually from the ground up, with the old tests and the new. Headline is that I see similar numbers from both. I took me a while to get really reproducible numbers on Intel. Most importantly: * Set a fixed

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
I'm also surprised about the dontneed vs. munmap numbers. You mean the ones for Altra that I posted? (I didn't post any for M2). The altra numbers look ok to me; dontneed has no change, and munmap has no change for order-0 and is massively improved for order-9. I would expect that dontneed

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 13:37, Ryan Roberts wrote: On 31/01/2024 11:49, Ryan Roberts wrote: On 31/01/2024 11:28, David Hildenbrand wrote: On 31.01.24 12:16, Ryan Roberts wrote: On 31/01/2024 11:06, David Hildenbrand wrote: On 31.01.24 11:43, Ryan Roberts wrote: On 29/01/2024 12:46, David Hildenbrand

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 12:16, Ryan Roberts wrote: On 31/01/2024 11:06, David Hildenbrand wrote: On 31.01.24 11:43, Ryan Roberts wrote: On 29/01/2024 12:46, David Hildenbrand wrote: Now that the rmap overhaul[1] is upstream that provides a clean interface for rmap batching, let's implement PTE batching

Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
-    folio_remove_rmap_pte(folio, page, vma); +    folio_remove_rmap_ptes(folio, page, nr, vma); + +    /* Only sanity-check the first page in a batch. */   if (unlikely(page_mapcount(page) < 0))   print_bad_pte(vma, addr, ptent, page); Is there a case for

Re: [PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 11:43, Ryan Roberts wrote: On 29/01/2024 12:46, David Hildenbrand wrote: Now that the rmap overhaul[1] is upstream that provides a clean interface for rmap batching, let's implement PTE batching during fork when processing PTE-mapped THPs. This series is partially based on Ryan's

Re: [PATCH v1 0/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 03:20, Yin Fengwei wrote: On 1/29/24 22:32, David Hildenbrand wrote: This series is based on [1] and must be applied on top of it. Similar to what we did with fork(), let's implement PTE batching during unmap/zap when processing PTE-mapped THPs. We collect consecutive PTEs that map

Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 03:30, Yin Fengwei wrote: On 1/29/24 22:32, David Hildenbrand wrote: +static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm, + unsigned long addr, pte_t *ptep, unsigned int nr, int full) +{ + pte_t pte, tmp_pte; + + pte

Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
+ +#ifndef clear_full_ptes +/** + * clear_full_ptes - Clear PTEs that map consecutive pages of the same folio. I know its implied from "pages of the same folio" (and even more so for the above variant due to mention of access/dirty), but I wonder if its useful to explicitly state that "all

Re: [PATCH v1 0/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-31 Thread David Hildenbrand
On 31.01.24 03:20, Yin Fengwei wrote: On 1/29/24 22:32, David Hildenbrand wrote: This series is based on [1] and must be applied on top of it. Similar to what we did with fork(), let's implement PTE batching during unmap/zap when processing PTE-mapped THPs. We collect consecutive PTEs that map

Re: [PATCH v1 7/9] mm/mmu_gather: add __tlb_remove_folio_pages()

2024-01-30 Thread David Hildenbrand
On 30.01.24 10:21, Ryan Roberts wrote: On 29/01/2024 14:32, David Hildenbrand wrote: Add __tlb_remove_folio_pages(), which will remove multiple consecutive pages that belong to the same large folio, instead of only a single page. We'll be using this function when optimizing unmapping/zapping

Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-30 Thread David Hildenbrand
Re-reading the docs myself: +#ifndef get_and_clear_full_ptes +/** + * get_and_clear_full_ptes - Clear PTEs that map consecutive pages of the same + * folio, collecting dirty/accessed bits. + * @mm: Address space the pages are mapped into. + * @addr: Address the first

Re: [PATCH v1 1/9] mm/memory: factor out zapping of present pte into zap_present_pte()

2024-01-30 Thread David Hildenbrand
On 30.01.24 09:46, Ryan Roberts wrote: On 30/01/2024 08:41, David Hildenbrand wrote: On 30.01.24 09:13, Ryan Roberts wrote: On 29/01/2024 14:32, David Hildenbrand wrote: Let's prepare for further changes by factoring out processing of present PTEs. Signed-off-by: David Hildenbrand ---   mm

Re: [PATCH v1 3/9] mm/memory: further separate anon and pagecache folio handling in zap_present_pte()

2024-01-30 Thread David Hildenbrand
On 30.01.24 09:45, Ryan Roberts wrote: On 30/01/2024 08:37, David Hildenbrand wrote: On 30.01.24 09:31, Ryan Roberts wrote: On 29/01/2024 14:32, David Hildenbrand wrote: We don't need up-to-date accessed-dirty information for anon folios and can simply work with the ptent we already have

Re: [PATCH v1 1/9] mm/memory: factor out zapping of present pte into zap_present_pte()

2024-01-30 Thread David Hildenbrand
On 30.01.24 09:13, Ryan Roberts wrote: On 29/01/2024 14:32, David Hildenbrand wrote: Let's prepare for further changes by factoring out processing of present PTEs. Signed-off-by: David Hildenbrand --- mm/memory.c | 92 ++--- 1 file changed

  1   2   3   4   5   6   7   8   9   10   >