On 29.04.24 15:10, Peter Xu wrote:
On Mon, Apr 29, 2024 at 09:28:15AM +0200, David Hildenbrand wrote:
On 28.04.24 21:01, Peter Xu wrote:
Prefault, especially with RW, makes the GUP test too easy, and may not yet
reach the core of the test.
For example, R/O longterm pins will just hit
. This tortures more code paths at least to
cover the unshare care for R/O longterm pins, in which case the first R/O
GUP attempt will fault in the page R/O first, then the 2nd will go through
the unshare path, checking whether an unshare is needed.
Cc: David Hildenbrand
Signed-off-by: Peter Xu
Reported-by: David Hildenbrand
Fixes: a12083d721d7 ("mm/gup: handle hugepd for follow_page()")
Signed-off-by: Peter Xu
---
LGTM
Reviewed-by: David Hildenbrand
--
Cheers,
David / dhildenb
On 26.04.24 23:58, Peter Xu wrote:
On Fri, Apr 26, 2024 at 11:33:08PM +0200, David Hildenbrand wrote:
I raised this topic in the past, and IMHO we either (a) never should have
added COW support; or (b) added COW support by using ordinary anonymous
memory (hey, partial mappings of hugetlb pages
th hugetlb (2048 kB)
not ok 323 No leak from child into parent
And it looks like it was always failing.. perhaps since the start? We
Yes!
commit 7dad331be7816103eba8c12caeb88fbd3599c0b9
Author: David Hildenbrand
Date: Tue Sep 27 13:01:17 2022 +0200
selftests/vm: anon_cow: hug
On 26.04.24 18:12, Peter Xu wrote:
On Fri, Apr 26, 2024 at 09:44:58AM -0400, Peter Xu wrote:
On Fri, Apr 26, 2024 at 09:17:47AM +0200, David Hildenbrand wrote:
On 02.04.24 14:55, David Hildenbrand wrote:
Let's consistently call the "fast-only" part of GUP "GUP-fast" and
On 02.04.24 14:55, David Hildenbrand wrote:
Let's consistently call the "fast-only" part of GUP "GUP-fast" and rename
all relevant internal functions to start with "gup_fast", to make it
clearer that this is not ordinary GUP. The current mixture of
"lockless&
On 11.04.24 18:55, Paolo Bonzini wrote:
On Mon, Apr 8, 2024 at 3:56 PM Peter Xu wrote:
Paolo,
I may miss a bunch of details here (as I still remember some change_pte
patches previously on the list..), however not sure whether we considered
enable it? Asked because I remember Andrea used to
-by: David Hildenbrand
--
Cheers,
David / dhildenb
be removed. For now, leave in place
set_pte_at_notify() even though it is just a synonym for set_pte_at().
Signed-off-by: Paolo Bonzini
---
Reviewed-by: David Hildenbrand
--
Cheers,
David / dhildenb
On 02.04.24 19:57, Peter Xu wrote:
On Tue, Apr 02, 2024 at 06:39:31PM +0200, David Hildenbrand wrote:
On 02.04.24 18:20, Peter Xu wrote:
On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
On 02.04.24 16:48, Ryan Roberts wrote:
Hi Peter,
Hey, Ryan,
Thanks for the report
On 02.04.24 18:00, Matthew Wilcox wrote:
On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
The oops trigger is at mm/gup.c:778:
VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
So 2M passed ok, and its failing for 32M, which is cont-pmd. I'm
On 02.04.24 18:20, Peter Xu wrote:
On Tue, Apr 02, 2024 at 05:26:28PM +0200, David Hildenbrand wrote:
On 02.04.24 16:48, Ryan Roberts wrote:
Hi Peter,
Hey, Ryan,
Thanks for the report!
On 27/03/2024 15:23, pet...@redhat.com wrote:
From: Peter Xu
Now follow_page() is ready to handle
On 02.04.24 16:48, Ryan Roberts wrote:
Hi Peter,
On 27/03/2024 15:23, pet...@redhat.com wrote:
From: Peter Xu
Now follow_page() is ready to handle hugetlb pages in whatever form, and
over all architectures. Switch to the generic code path.
Time to retire hugetlb_follow_page_mask(),
Let's fixup the remaining comments to consistently call that thing
"GUP-fast". With this change, we consistently call it "GUP-fast".
Reviewed-by: Mike Rapoport (IBM)
Signed-off-by: David Hildenbrand
---
mm/filemap.c| 2 +-
mm/khugepaged.c | 2 +-
2 files changed
Nowadays, we call it "GUP-fast", the external interface includes
functions like "get_user_pages_fast()", and we renamed all internal
functions to reflect that as well.
Let's make the config option reflect that.
Reviewed-by: Mike Rapoport (IBM)
Signed-off-by: David Hilden
r.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-perf-us...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: x...@kernel.org
David Hildenbrand (3):
mm/gup: consistently name GUP-fast functions
mm/treewide: rename CONFIG_H
fast_permitted() is already properly named
With "gup_fast()", we now even have a function that is referred to in
comment in mm/mmu_gather.c.
Reviewed-by: Jason Gunthorpe
Reviewed-by: Mike Rapoport (IBM)
Signed-off-by: David Hildenbrand
---
mm/gup.c | 205 ---
On 27.03.24 16:23, pet...@redhat.com wrote:
From: Peter Xu
Hugepd format for GUP is only used in PowerPC with hugetlbfs. There are
some kernel usage of hugepd (can refer to hugepd_populate_kernel() for
PPC_8XX), however those pages are not candidates for GUP.
Commit a6e79df92e4a ("mm/gup:
On 28.03.24 08:15, Mike Rapoport wrote:
On Thu, Mar 28, 2024 at 07:09:13AM +0100, Arnd Bergmann wrote:
On Thu, Mar 28, 2024, at 06:51, Vineet Gupta wrote:
On 3/27/24 09:22, Arnd Bergmann wrote:
On Wed, Mar 27, 2024, at 16:39, David Hildenbrand wrote:
On 27.03.24 16:21, Peter Xu wrote
On 27.03.24 16:46, Ryan Roberts wrote:
Some of them look like mm-unstable issue, For example, arm64 fails with
CC arch/arm64/mm/extable.o
In file included from ./include/linux/hugetlb.h:828,
from security/commoncap.c:19:
./arch/arm64/include/asm/hugetlb.h:25:34:
On 27.03.24 16:21, Peter Xu wrote:
On Wed, Mar 27, 2024 at 02:05:35PM +0100, David Hildenbrand wrote:
Some cleanups around function names, comments and the config option of
"GUP-fast" -- GUP without "lock" safety belts on.
With this cleanup it's easy to judge which fu
On 27.03.24 14:52, Jason Gunthorpe wrote:
On Wed, Mar 27, 2024 at 02:05:36PM +0100, David Hildenbrand wrote:
Let's consistently call the "fast-only" part of GUP "GUP-fast" and rename
all relevant internal functions to start with "gup_fast", to make it
clearer
Let's fixup the remaining comments to consistently call that thing
"GUP-fast". With this change, we consistently call it "GUP-fast".
Signed-off-by: David Hildenbrand
---
mm/filemap.c| 2 +-
mm/khugepaged.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --
Nowadays, we call it "GUP-fast", the external interface includes
functions like "get_user_pages_fast()", and we renamed all internal
functions to reflect that as well.
Let's make the config option reflect that.
Signed-off-by: David Hildenbrand
---
arch/arm/Kconfig
d -> gup_fast_devmap_pmd_leaf()
* __gup_device_huge() -> gup_fast_devmap_leaf()
Helper functions:
* unpin_user_pages_lockless() -> gup_fast_unpin_user_pages()
* gup_fast_folio_allowed() is already properly named
* gup_fast_permitted() is alre
inux-m...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-perf-us...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: x...@kernel.org
David Hildenbrand (3):
mm/gup: consistently name GUP-fast f
On 21.02.24 08:13, Christophe Leroy wrote:
Le 20/02/2024 à 21:32, Maxwell Bland a écrit :
[Vous ne recevez pas souvent de courriers de mbl...@motorola.com. Découvrez
pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
While other descriptors (e.g. pud) allow
) pte_advance_pfn(pte, 1)
-#endif
#ifndef set_ptes
/**
Acked-by: David Hildenbrand
--
Cheers,
David / dhildenb
FN_PTE_SHIFT));
+ return __pte(pte_val(pte) - (nr << PFN_PTE_SHIFT));
+ return __pte(pte_val(pte) + (nr << PFN_PTE_SHIFT));
}
-#define pte_next_pfn pte_next_pfn
+#define pte_advance_pfnpte_advance_pfn
static inline int pte_present(pte_t a)
{
Reviewed-by: David Hildenb
= pte_next_pfn(pte);
+ pte = pte_advance_pfn(pte, 1);
Acked-by: David Hildenbrand
--
Cheers,
David / dhildenb
def set_ptes
/**
* set_ptes - Map consecutive pages to a contiguous range of addresses.
Acked-by: David Hildenbrand
--
Cheers,
David / dhildenb
On 29.01.24 13:46, David Hildenbrand wrote:
We already read it, let's just forward it.
This patch is based on work by Ryan Roberts.
Reviewed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
mm/memory.c | 7 +++
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm
detect shadow stack entries. But for shadow stack entries, the HW dirty
bit (in combination with non-writable PTEs) is set by software. So for the
arch_check_zapped_pte() check, we don't have to sync against HW setting
the HW dirty bit concurrently, it is always set.
Reviewed-by: Ryan Roberts
force inlining of a specialized
variant using __always_inline with nr=1.
Reviewed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
include/linux/pgtable.h | 70 +++
mm/memory.c | 92 +
2 files changed, 136
-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
mm/mmu_gather.c | 58 -
1 file changed, 43 insertions(+), 15 deletions(-)
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index d175c0f1e2c8..99b3e9408aa0 100644
--- a/mm/mmu_gather.c
+++ b/mm
wed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
arch/s390/include/asm/tlb.h | 17 +++
include/asm-generic/tlb.h | 8 +
include/linux/mm_types.h| 20
mm/mmu_gather.c | 61 +++--
mm/swap.c
() a
macro).
Reviewed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
arch/powerpc/include/asm/tlb.h | 2 ++
include/asm-generic/tlb.h | 20
2 files changed, 22 insertions(+)
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index b3de6102a907
d-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
include/linux/mm_types.h | 17 +++--
mm/mmu_gather.c | 5 +++--
2 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8b611e13153e..1b89eec0d6df 100644
---
that the next encoded page
pointer in an array is actually "nr_pages". So pass page + delay_rmap flag
instead of an encoded page, to handle the encoding internally.
Reviewed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
arch/s390/include/asm/tlb.h | 13 ++---
include/asm-gen
Let's prepare for further changes by factoring it out into a separate
function.
Reviewed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
mm/memory.c | 53 -
1 file changed, 32 insertions(+), 21 deletions(-)
diff --git a/mm/memory.c b
and RSS.
While at it, only call zap_install_uffd_wp_if_needed() if there is even
any chance that pte_install_uffd_wp_if_needed() would do *something*.
That is, just don't bother if uffd-wp does not apply.
Reviewed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
mm/memory.c | 16
Let's prepare for further changes by factoring out processing of present
PTEs.
Reviewed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
mm/memory.c | 94 ++---
1 file changed, 53 insertions(+), 41 deletions(-)
diff --git a/mm/memory.c b/mm
erts
Cc: Catalin Marinas
Cc: Yin Fengwei
Cc: Michal Hocko
Cc: Will Deacon
Cc: "Aneesh Kumar K.V"
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc: Michael Ellerman
Cc: Christophe Leroy
Cc: "Naveen N. Rao"
Cc: Heiko Carstens
Cc: Vasily Gorbik
Cc: Alexander Gordeev
Cc: C
On 13.02.24 15:02, Ryan Roberts wrote:
On 13/02/2024 13:45, David Hildenbrand wrote:
On 13.02.24 14:33, Ard Biesheuvel wrote:
On Tue, 13 Feb 2024 at 14:21, Ryan Roberts wrote:
On 13/02/2024 13:13, David Hildenbrand wrote:
On 13.02.24 14:06, Ryan Roberts wrote:
On 13/02/2024 12:19, David
On 13.02.24 14:33, Ard Biesheuvel wrote:
On Tue, 13 Feb 2024 at 14:21, Ryan Roberts wrote:
On 13/02/2024 13:13, David Hildenbrand wrote:
On 13.02.24 14:06, Ryan Roberts wrote:
On 13/02/2024 12:19, David Hildenbrand wrote:
On 13.02.24 13:06, Ryan Roberts wrote:
On 12/02/2024 20:38, Ryan
On 13.02.24 14:20, Ryan Roberts wrote:
On 13/02/2024 13:13, David Hildenbrand wrote:
On 13.02.24 14:06, Ryan Roberts wrote:
On 13/02/2024 12:19, David Hildenbrand wrote:
On 13.02.24 13:06, Ryan Roberts wrote:
On 12/02/2024 20:38, Ryan Roberts wrote:
[...]
+static inline bool mm_is_user
On 13.02.24 14:06, Ryan Roberts wrote:
On 13/02/2024 12:19, David Hildenbrand wrote:
On 13.02.24 13:06, Ryan Roberts wrote:
On 12/02/2024 20:38, Ryan Roberts wrote:
[...]
+static inline bool mm_is_user(struct mm_struct *mm)
+{
+ /*
+ * Don't attempt to apply the contig bit to kernel
On 13.02.24 13:06, Ryan Roberts wrote:
On 12/02/2024 20:38, Ryan Roberts wrote:
[...]
+static inline bool mm_is_user(struct mm_struct *mm)
+{
+ /*
+* Don't attempt to apply the contig bit to kernel mappings, because
+* dynamically adding/removing the contig bit can cause
On 12.02.24 21:38, Ryan Roberts wrote:
[...]
+static inline bool mm_is_user(struct mm_struct *mm)
+{
+ /*
+* Don't attempt to apply the contig bit to kernel mappings, because
+* dynamically adding/removing the contig bit can cause page faults.
+* These racing
On 12.02.24 22:34, Ryan Roberts wrote:
On 12/02/2024 14:29, David Hildenbrand wrote:
On 12.02.24 15:10, Ryan Roberts wrote:
On 12/02/2024 12:14, David Hildenbrand wrote:
On 02.02.24 09:07, Ryan Roberts wrote:
The goal is to be able to advance a PTE by an arbitrary number of PFNs.
So
On 12.02.24 16:47, Ryan Roberts wrote:
On 12/02/2024 13:43, David Hildenbrand wrote:
On 02.02.24 09:07, Ryan Roberts wrote:
Some architectures (e.g. arm64) can tell from looking at a pte, if some
follow-on ptes also map contiguous physical memory with the same pgprot.
(for arm64
On 12.02.24 16:34, Ryan Roberts wrote:
On 12/02/2024 15:26, David Hildenbrand wrote:
On 12.02.24 15:45, Ryan Roberts wrote:
On 12/02/2024 13:54, David Hildenbrand wrote:
If so, I wonder if we could instead do that comparison modulo the access/dirty
bits,
I think that would work
On 12.02.24 15:45, Ryan Roberts wrote:
On 12/02/2024 13:54, David Hildenbrand wrote:
If so, I wonder if we could instead do that comparison modulo the access/dirty
bits,
I think that would work - but will need to think a bit more on it.
and leave ptep_get_lockless() only reading a single
On 12.02.24 15:10, Ryan Roberts wrote:
On 12/02/2024 12:14, David Hildenbrand wrote:
On 02.02.24 09:07, Ryan Roberts wrote:
The goal is to be able to advance a PTE by an arbitrary number of PFNs.
So introduce a new API that takes a nr param.
We are going to remove pte_next_pfn() and replace
AGE_ORDER);
cma_reserve_called = true;
if (!hugetlb_cma_size)
Reviewed-by: David Hildenbrand
--
Cheers,
David / dhildenb
If so, I wonder if we could instead do that comparison modulo the access/dirty
bits,
I think that would work - but will need to think a bit more on it.
and leave ptep_get_lockless() only reading a single entry?
I think we will need to do something a bit less fragile. ptep_get() does collect
;
+
+ return CONT_PTES - (((unsigned long)ptep >> 3) & (CONT_PTES - 1));
+}
+
/*
* The below functions constitute the public API that arm64 presents to the
* core-mm to manipulate PTE entries within their page tables (or at least
this
Reviewed-by: David Hildenbrand
--
Cheers,
David / dhildenb
(expected_pte, 1);
- ptep++;
+ nr = pte_batch_hint(ptep, pte);
+ expected_pte = pte_advance_pfn(expected_pte, nr);
+ ptep += nr;
}
- return ptep - start_ptep;
+ return min(ptep - start_ptep, max_nr);
}
Acked-by: David Hildenb
On 12.02.24 14:05, Ryan Roberts wrote:
On 12/02/2024 12:44, David Hildenbrand wrote:
On 02.02.24 09:07, Ryan Roberts wrote:
Split __flush_tlb_range() into __flush_tlb_range_nosync() +
__flush_tlb_range(), in the same way as the existing flush_tlb_page()
arrangement. This allows calling
dsb(ish);
mmu_notifier_arch_invalidate_secondary_tlbs();
So I *suspect* having that DSB before
mmu_notifier_arch_invalidate_secondary_tlbs() is fine. Hopefully,
nothing in there relies on that placement.
Maybe wort spelling out in the patch description
Reviewed-by: David Hildenbrand
--
Cheers,
David / dhildenb
On 02.02.24 09:07, Ryan Roberts wrote:
The goal is to be able to advance a PTE by an arbitrary number of PFNs.
So introduce a new API that takes a nr param.
We are going to remove pte_next_pfn() and replace it with
pte_advance_pfn(). As a first step, implement pte_next_pfn() as a
wrapper around
* May be overridden by the architecture, or the architecture can define
* set_pte() and PFN_PTE_SHIFT.
*
Acked-by: David Hildenbrand
--
Cheers,
David / dhildenb
On 12.02.24 12:21, Ryan Roberts wrote:
On 12/02/2024 11:05, David Hildenbrand wrote:
On 12.02.24 11:56, David Hildenbrand wrote:
On 12.02.24 11:32, Ryan Roberts wrote:
On 12/02/2024 10:11, David Hildenbrand wrote:
Hi Ryan,
-static void tlb_batch_pages_flush(struct mmu_gather *tlb)
+static
On 12.02.24 11:56, David Hildenbrand wrote:
On 12.02.24 11:32, Ryan Roberts wrote:
On 12/02/2024 10:11, David Hildenbrand wrote:
Hi Ryan,
-static void tlb_batch_pages_flush(struct mmu_gather *tlb)
+static void __tlb_batch_free_encoded_pages(struct mmu_gather_batch *batch)
{
- struct
On 12.02.24 11:32, Ryan Roberts wrote:
On 12/02/2024 10:11, David Hildenbrand wrote:
Hi Ryan,
-static void tlb_batch_pages_flush(struct mmu_gather *tlb)
+static void __tlb_batch_free_encoded_pages(struct mmu_gather_batch *batch)
{
- struct mmu_gather_batch *batch;
-
- for (batch
Hi Ryan,
-static void tlb_batch_pages_flush(struct mmu_gather *tlb)
+static void __tlb_batch_free_encoded_pages(struct mmu_gather_batch *batch)
{
- struct mmu_gather_batch *batch;
-
- for (batch = >local; batch && batch->nr; batch = batch->next) {
- struct
On 12.02.24 09:51, Ryan Roberts wrote:
On 09/02/2024 22:15, David Hildenbrand wrote:
Add __tlb_remove_folio_pages(), which will remove multiple consecutive
pages that belong to the same large folio, instead of only a single
page. We'll be using this function when optimizing unmapping/zapping
On 08.02.24 07:10, Mike Rapoport wrote:
On Mon, Jan 29, 2024 at 01:46:35PM +0100, David Hildenbrand wrote:
From: Ryan Roberts
Since the high bits [51:48] of an OA are not stored contiguously in the
PTE, there is a theoretical bug in set_ptes(), which just adds PAGE_SIZE
to the pte to get
1) Convert READ_ONCE() -> ptep_get()
2) Convert set_pte_at() -> set_ptes()
3) All the "New layer" renames and addition of the trivial wrappers
Yep that makes sense. I'll start prepping that today. I'll hold off reposting
until I have your comments on 19-25. I'm also hoping that David will
running into
soft lockups, something else is already completely bogus.
In the future, we might want to detect if handling cond_resched() is
required at all, and just not do any of that with full preemption enabled.
Signed-off-by: David Hildenbrand
---
mm/mmu_gather.c | 50
force inlining of a specialized
variant using __always_inline with nr=1.
Signed-off-by: David Hildenbrand
---
include/linux/pgtable.h | 70 +++
mm/memory.c | 92 +
2 files changed, 136 insertions(+), 26 deletions
nts. As long as page freeing time primarily only depends on the
number of involved folios, there is no effective change for !preempt
configurations. However, we'll adjust tlb_batch_pages_flush() separately to
handle corner cases where page freeing time grows proportionally with the
actual memory size.
() a
macro).
Reviewed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
arch/powerpc/include/asm/tlb.h | 2 ++
include/asm-generic/tlb.h | 20
2 files changed, 22 insertions(+)
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index b3de6102a907
d-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
include/linux/mm_types.h | 17 +++--
mm/mmu_gather.c | 5 +++--
2 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8b611e13153e..1b89eec0d6df 100644
---
that the next encoded page
pointer in an array is actually "nr_pages". So pass page + delay_rmap flag
instead of an encoded page, to handle the encoding internally.
Reviewed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
arch/s390/include/asm/tlb.h | 13 ++---
include/asm-gen
Let's prepare for further changes by factoring it out into a separate
function.
Reviewed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
mm/memory.c | 53 -
1 file changed, 32 insertions(+), 21 deletions(-)
diff --git a/mm/memory.c b
and RSS.
While at it, only call zap_install_uffd_wp_if_needed() if there is even
any chance that pte_install_uffd_wp_if_needed() would do *something*.
That is, just don't bother if uffd-wp does not apply.
Reviewed-by: Ryan Roberts
Signed-off-by: David Hildenbrand
---
mm/memory.c | 16
detect shadow stack entries. But for shadow stack entries, the HW dirty
bit (in combination with non-writable PTEs) is set by software. So for the
arch_check_zapped_pte() check, we don't have to sync against HW setting
the HW dirty bit concurrently, it is always set.
Reviewed-by: Ryan Roberts
Let's prepare for further changes by factoring out processing of present
PTEs.
Signed-off-by: David Hildenbrand
---
mm/memory.c | 94 ++---
1 file changed, 53 insertions(+), 41 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index
umar K.V"
Cc: Nick Piggin
Cc: Peter Zijlstra
Cc: Michael Ellerman
Cc: Christophe Leroy
Cc: "Naveen N. Rao"
Cc: Heiko Carstens
Cc: Vasily Gorbik
Cc: Alexander Gordeev
Cc: Christian Borntraeger
Cc: Sven Schnelle
Cc: Arnd Bergmann
Cc: linux-a...@vger.kernel.org
Cc: li
dontneed should hopefully/likely see a speedup.
Yes, but that's almost exactly the same path as munmap, so I'm sure it really
adds much for this particular series.
Right, that's why I'm not including these measurements. dontneed vs.
munmap is more about measuring the overhead of VMA
On 31.01.24 16:02, Ryan Roberts wrote:
On 31/01/2024 14:29, David Hildenbrand wrote:
Note that regarding NUMA effects, I mean when some memory access within the same
socket is faster/slower even with only a single node. On AMD EPYC that's
possible, depending on which core you are running
Note that regarding NUMA effects, I mean when some memory access within the same
socket is faster/slower even with only a single node. On AMD EPYC that's
possible, depending on which core you are running and on which memory controller
the memory you want to access is located. If both are in
On 31.01.24 15:08, Michal Hocko wrote:
On Wed 31-01-24 10:26:13, Ryan Roberts wrote:
IIRC there is an option to zero memory when it is freed back to the buddy? So
that could be a place where time is proportional to size rather than
proportional to folio count? But I think that option is
Nope: looks the same. I've taken my test harness out of the picture and done
everything manually from the ground up, with the old tests and the new. Headline
is that I see similar numbers from both.
I took me a while to get really reproducible numbers on Intel. Most importantly:
* Set a fixed
I'm also surprised about the dontneed vs. munmap numbers.
You mean the ones for Altra that I posted? (I didn't post any for M2). The altra
numbers look ok to me; dontneed has no change, and munmap has no change for
order-0 and is massively improved for order-9.
I would expect that dontneed
On 31.01.24 13:37, Ryan Roberts wrote:
On 31/01/2024 11:49, Ryan Roberts wrote:
On 31/01/2024 11:28, David Hildenbrand wrote:
On 31.01.24 12:16, Ryan Roberts wrote:
On 31/01/2024 11:06, David Hildenbrand wrote:
On 31.01.24 11:43, Ryan Roberts wrote:
On 29/01/2024 12:46, David Hildenbrand
On 31.01.24 12:16, Ryan Roberts wrote:
On 31/01/2024 11:06, David Hildenbrand wrote:
On 31.01.24 11:43, Ryan Roberts wrote:
On 29/01/2024 12:46, David Hildenbrand wrote:
Now that the rmap overhaul[1] is upstream that provides a clean interface
for rmap batching, let's implement PTE batching
- folio_remove_rmap_pte(folio, page, vma);
+ folio_remove_rmap_ptes(folio, page, nr, vma);
+
+ /* Only sanity-check the first page in a batch. */
if (unlikely(page_mapcount(page) < 0))
print_bad_pte(vma, addr, ptent, page);
Is there a case for
On 31.01.24 11:43, Ryan Roberts wrote:
On 29/01/2024 12:46, David Hildenbrand wrote:
Now that the rmap overhaul[1] is upstream that provides a clean interface
for rmap batching, let's implement PTE batching during fork when processing
PTE-mapped THPs.
This series is partially based on Ryan's
On 31.01.24 03:20, Yin Fengwei wrote:
On 1/29/24 22:32, David Hildenbrand wrote:
This series is based on [1] and must be applied on top of it.
Similar to what we did with fork(), let's implement PTE batching
during unmap/zap when processing PTE-mapped THPs.
We collect consecutive PTEs that map
On 31.01.24 03:30, Yin Fengwei wrote:
On 1/29/24 22:32, David Hildenbrand wrote:
+static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep, unsigned int nr, int full)
+{
+ pte_t pte, tmp_pte;
+
+ pte
+
+#ifndef clear_full_ptes
+/**
+ * clear_full_ptes - Clear PTEs that map consecutive pages of the same folio.
I know its implied from "pages of the same folio" (and even more so for the
above variant due to mention of access/dirty), but I wonder if its useful to
explicitly state that "all
On 31.01.24 03:20, Yin Fengwei wrote:
On 1/29/24 22:32, David Hildenbrand wrote:
This series is based on [1] and must be applied on top of it.
Similar to what we did with fork(), let's implement PTE batching
during unmap/zap when processing PTE-mapped THPs.
We collect consecutive PTEs that map
On 30.01.24 10:21, Ryan Roberts wrote:
On 29/01/2024 14:32, David Hildenbrand wrote:
Add __tlb_remove_folio_pages(), which will remove multiple consecutive
pages that belong to the same large folio, instead of only a single
page. We'll be using this function when optimizing unmapping/zapping
Re-reading the docs myself:
+#ifndef get_and_clear_full_ptes
+/**
+ * get_and_clear_full_ptes - Clear PTEs that map consecutive pages of the same
+ * folio, collecting dirty/accessed bits.
+ * @mm: Address space the pages are mapped into.
+ * @addr: Address the first
On 30.01.24 09:46, Ryan Roberts wrote:
On 30/01/2024 08:41, David Hildenbrand wrote:
On 30.01.24 09:13, Ryan Roberts wrote:
On 29/01/2024 14:32, David Hildenbrand wrote:
Let's prepare for further changes by factoring out processing of present
PTEs.
Signed-off-by: David Hildenbrand
---
mm
On 30.01.24 09:45, Ryan Roberts wrote:
On 30/01/2024 08:37, David Hildenbrand wrote:
On 30.01.24 09:31, Ryan Roberts wrote:
On 29/01/2024 14:32, David Hildenbrand wrote:
We don't need up-to-date accessed-dirty information for anon folios and can
simply work with the ptent we already have
On 30.01.24 09:13, Ryan Roberts wrote:
On 29/01/2024 14:32, David Hildenbrand wrote:
Let's prepare for further changes by factoring out processing of present
PTEs.
Signed-off-by: David Hildenbrand
---
mm/memory.c | 92 ++---
1 file changed
1 - 100 of 1026 matches
Mail list logo