On Thu, Apr 11, 2024 at 06:55:44PM +0200, Paolo Bonzini wrote:
> On Mon, Apr 8, 2024 at 3:56 PM Peter Xu wrote:
> > Paolo,
> >
> > I may miss a bunch of details here (as I still remember some change_pte
> > patches previously on the list..), however not sure wheth
ecause I remember Andrea used to have a custom tree
maintaining that part:
https://github.com/aagit/aa/commit/c761078df7a77d13ddfaeebe56a0f4bc128b1968
Maybe it can't be enabled for some reason that I overlooked in the current
tree, or we just decided to not to?
Thanks,
--
Peter Xu
th shared and private VMAs.
> */
> -static int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> - struct vm_area_struct *dst_vma,
> - unsigned long dst_addr, struct page *page,
> - bool newly_allocated, bool wp_copy)
> +int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> + struct vm_area_struct *dst_vma,
> + unsigned long dst_addr, struct page *page,
> + bool newly_allocated, bool wp_copy)
> {
> int ret;
> pte_t _dst_pte, *dst_pte;
> --
> 2.31.1.368.gbe11c130af-goog
>
--
Peter Xu
On Tue, Apr 20, 2021 at 06:24:50PM +0200, Paolo Bonzini wrote:
> On 20/04/21 17:32, Peter Xu wrote:
> > On Tue, Apr 20, 2021 at 10:37:39AM -0400, Peter Xu wrote:
> > > On Tue, Apr 20, 2021 at 04:16:14AM -0400, Paolo Bonzini wrote:
> > > > The main thread could sta
on receiving a SIG_USR1 without a handler (when vcpu runs far slower than main).
Signed-off-by: Peter Xu
---
tools/testing/selftests/kvm/dirty_log_test.c | 8
1 file changed, 8 insertions(+)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c
b/tools/testing/selftests/kvm
3641.23742-1-pet...@redhat.com/
[2] https://lore.kernel.org/lkml/20210417140956.GV4440@xz-x1/
Cc: Paolo Bonzini
Cc: Sean Christopherson
Cc: Andrew Jones
Signed-off-by: Peter Xu
---
tools/testing/selftests/kvm/dirty_log_test.c | 62
1 file changed, 51 insertions(+), 11 deleti
/20210420081614.684787-1-pbonz...@redhat.com/
Peter Xu (2):
KVM: selftests: Sync data verify of dirty logging with guest sync
KVM: selftests: Wait for vcpu thread before signal setup
tools/testing/selftests/kvm/dirty_log_test.c | 70 +---
1 file changed, 59 insertions(+), 11 deletions
On Tue, Apr 20, 2021 at 10:37:39AM -0400, Peter Xu wrote:
> On Tue, Apr 20, 2021 at 04:16:14AM -0400, Paolo Bonzini wrote:
> > The main thread could start to send SIG_IPI at any time, even before signal
> > blocked on vcpu thread. Therefore, start the vcpu thread with the sig
_log_test could fail directly
> on receiving a SIGUSR1 without a handler (when vcpu runs far slower than
> main).
>
> Reported-by: Peter Xu
> Cc: sta...@vger.kernel.org
> Signed-off-by: Paolo Bonzini
Yes, indeed better! :)
Reviewed-by: Peter Xu
--
Peter Xu
On Tue, Apr 20, 2021 at 10:07:16AM +0200, Paolo Bonzini wrote:
> On 18/04/21 14:43, Peter Xu wrote:
> > 8<-
> > diff --git a/tools/testing/selftests/kvm/dirty_log_test.c
> > b/tools/testing/selftests/kvm/dirty_log_test.c
> > index 25230e799bc4..d3050d1c2cd0
it
seems still the only place to set the new flag HK_FLAG_MANAGED_IRQ. If one day
we'll finally obsolete isolcpus= we may need to think about where to put it?
When I looked at it, I also noticed I see no caller to set HK_FLAG_SCHED at
all. Is it really used anywhere?
Regarding this patch...
On Sat, Apr 17, 2021 at 10:36:01AM -0400, Peter Xu wrote:
> This fixes a bug that can trigger with e.g. "taskset -c 0 ./dirty_log_test" or
> when the testing host is very busy.
>
> A similar previous attempt is done [1] but that is not enough, the reason is
&g
on receiving a SIG_USR1 without a handler (when vcpu runs far slower than main).
Signed-off-by: Peter Xu
---
tools/testing/selftests/kvm/dirty_log_test.c | 8
1 file changed, 8 insertions(+)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c
b/tools/testing/selftests/kvm
3641.23742-1-pet...@redhat.com/
[2] https://lore.kernel.org/lkml/20210417140956.GV4440@xz-x1/
Cc: Paolo Bonzini
Cc: Sean Christopherson
Cc: Andrew Jones
Signed-off-by: Peter Xu
---
tools/testing/selftests/kvm/dirty_log_test.c | 60
1 file changed, 50 insertions(+), 10 deleti
this patch:
(1) while :; do taskset -c 1 ./dirty_log_test; done
(2) taskset -c 1 bash -c "while :; do :; done"
Review comments are greatly welcomed.
Thanks,
[1] https://lore.kernel.org/lkml/20210413213641.23742-1-pet...@redhat.com/
Peter Xu (2):
KVM: selftests: Sync data verif
ay. I tested longer yesterday but
haven't updated this patch yet. More below.
On Sat, Apr 17, 2021 at 02:59:48PM +0200, Paolo Bonzini wrote:
> On 13/04/21 23:36, Peter Xu wrote:
> > This patch closes this race by allowing the main thread to give the vcpu
> > thread
> > chan
hat any subsequently armed timers on
> CLOCK_REALTIME and CLOCK_TAI are evaluated with the correct offsets.
>
> Signed-off-by: Marcelo Tosatti
>
> ---
>
> v5:
> - Add missing hrtimer_update_base (Peter Xu).
>
> v4:
>- Drop unused code (Thomas).
>
&g
the pte (in 4/9) will do its
> shmem_getpage_gfp(), and that will bring in the swap if user
> did not already do so: so I was wrong to claim more robustness
> the other way, this placement should be fine. I think.
>
> > if (xa_is_value(page)) {
> > error = shmem_swapin_page(inode, index, ,
> > sgp, gfp, vma, fault_type);
> > --
> > 2.31.1.295.g9ea45b61b8-goog
>
--
Peter Xu
d this specific race condition.
Cc: Andrew Jones
Cc: Paolo Bonzini
Cc: Vitaly Kuznetsov
Cc: Sean Christopherson
Signed-off-by: Peter Xu
---
v2:
- drop one unnecessary check on "!matched"
---
tools/testing/selftests/kvm/dirty_log_test.c | 53 +++-
1 file changed, 52
d this specific race condition.
Cc: Andrew Jones
Cc: Paolo Bonzini
Cc: Vitaly Kuznetsov
Cc: Sean Christopherson
Signed-off-by: Peter Xu
---
tools/testing/selftests/kvm/dirty_log_test.c | 54 +++-
1 file changed, 53 insertions(+), 1 deletion(-)
diff --git a/tools/testing/sel
UFFDIO_CONTINUE ioctl for shmem-backed
> minor faults, though, so userspace doesn't yet have a way to resolve
> such faults.
>
> Signed-off-by: Axel Rasmussen
Everything looks right to me, but it'll be great if Andrea or Hugh will have a
look too.
Acked-by: Peter Xu
--
Peter Xu
;);
> +
> + if (is_src)
> + area_src_alias = area_alias;
> + else
> + area_dst_alias = area_alias;
> +}
It would be nice if shmem_allocate_area() could merge with
hugetlb_allocate_area() somehow, but not that urgent.
Reviewed-by: Peter Xu
--
Peter Xu
argv[] so we actually print out the
> hugetlb file path.
>
> Signed-off-by: Axel Rasmussen
Reviewed-by: Peter Xu
--
Peter Xu
init() at the entry of each test, and clear() after
finish one test?
> +
> uffdio_register.range.start = (unsigned long) area_dst;
> uffdio_register.range.len = nr_pages * page_size;
> uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
The rest looks good to me. Thanks,
--
Peter Xu
On Mon, Apr 12, 2021 at 09:40:22PM -0700, Axel Rasmussen wrote:
> On Mon, Apr 12, 2021 at 4:17 PM Peter Xu wrote:
> >
> > On Thu, Apr 08, 2021 at 04:43:22PM -0700, Axel Rasmussen wrote:
> > > +/*
> > > + * Install PTEs, to map dst_addr (within dst_vma) to page.
On Mon, Apr 12, 2021 at 05:51:14PM -0700, Hugh Dickins wrote:
> On Mon, 12 Apr 2021, Peter Xu wrote:
> > On Tue, Apr 06, 2021 at 11:14:30PM -0700, Hugh Dickins wrote:
> > > > +static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pm
. Then it'll further passed into shmem_mcopy_atomic_pte()
now after this patch (as shmem_mfill_zeropage_pte() probably only did one thing
good which is to clear src_addr). Not a big deal, though.
All the rest looks sane to me.
Reviewed-by: Peter Xu
I'll wait to look at the selftests since
unsigned long address, unsigned int flags);
> #ifdef CONFIG_USERFAULTFD
> +enum mcopy_atomic_mode;
(I'm not 100% sure, but.. maybe this can be moved even out of ifdef? Then you
can define it once at the top rather than twice?)
Reviewed-by: Peter Xu
--
Peter Xu
WP and MINOR modes are conditionally enabled on specific memory types. This
patch avoids dumping tons of zeros for those cases when the modes are not
supported at all.
Reviewed-by: Axel Rasmussen
Signed-off-by: Peter Xu
---
tools/testing/selftests/vm/userfaultfd.c | 30
latency of
resolving thread. It may not mean an issue with uffd.
Neither do I saw this error triggered either in the past runs. Even if it
triggers, it'll be drown in all the rest of test logs. Remove it.
Reviewed-by: Axel Rasmussen
Signed-off-by: Peter Xu
---
tools/testing/selftests/vm
Introduce err()/_err() and replace all the different ways to fail the program,
mostly "fprintf" and "perror" with tons of exit() calls. Always stop the test
program at any failure.
Reviewed-by: Axel Rasmussen
Signed-off-by: Peter Xu
---
tools/testing/selftests/vm
Userfaultfd selftest does not need to handle kernel initiated fault. Set user
mode so it can be run even if unprivileged_userfaultfd=0 (which is the default).
Reviewed-by: Axel Rasmussen
Signed-off-by: Peter Xu
---
tools/testing/selftests/vm/userfaultfd.c | 2 +-
1 file changed, 1 insertion
the fault flag - just do it
unconditionally.
Reviewed-by: Axel Rasmussen
Signed-off-by: Peter Xu
---
tools/testing/selftests/vm/userfaultfd.c | 55 +---
1 file changed, 1 insertion(+), 54 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c
b/tools/testing
selftest on
fault handling, to use an err() macro instead of either fprintf() or perror()
then another exit() call.
The huge cleanup is done in the last patch. The first 4 patches are some other
standalone cleanups for the same file, so I put them together.
Please review, thanks.
Peter Xu (5
;
put_page(page);
page = NULL;
hindex = index;
}
I think it won't happen for your case since the page should be uptodate already
(the other thread should check and modify the page before CONTINUE), but still
raise this up, since if the page was allocated it smells better to still
install the fallocated page (do we need to clear the page and SetUptodate)?
--
Peter Xu
irty(pte))
pte = pte_mkdirty(pte);
pte = clear_pte_bit(pte, __pgprot(PTE_WRITE));
pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
return pte;
}
So arm64 will explicitly set the dirty bit (from the HW dirty bit) when
wr-protect. It seems to prove that at least for arm64 it's very valid to have
!write && dirty pte.
Thanks,
--
Peter Xu
icts in my tree?
>
> It's true that we haven't tested the hugetlbfs minor faults patch
> extensively *with the shmem one also applied*, but it has had more
> thorough review than the shmem one at this point (e.g. by Mike
> Kravetz), and they're rather separate code paths (I'd be surprised if
> one breaks the other).
Yes I think the hugetlb part should have got more review done. IMHO it's a
matter of whether Mike would still like to do a more thorough review, or seems
okay to keep them.
I can repost the selftest series later if needed, as long as I figured which is
the suitable base commit. Those selftest patches are definitely not urgent for
this release, so we can wait for the next release.
Thanks,
--
Peter Xu
awkward to
swapin here. Maybe move this chunk to right after pagecache_get_page()
returns? Then no need to touch the rest.
> +
> + if (swapped)
> + return 0;
> +
> if (page)
> hindex = page->index;
> if (page && sgp == SGP_WRITE)
> --
> 2.31.1.295.g9ea45b61b8-goog
>
--
Peter Xu
/git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=mapcount_deshare=7c3a31caa34ac6ac4a4ec0559b1307b5edfc0821
[4]
https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=mapcount_deshare=599aa62474f51a470408b28fd4365320a5357aca
--
Peter Xu
gt; } else {
> VM_WARN_ON_ONCE(wp_copy);
> err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
> src_addr, mode, page);
> }
>
> +out:
> return err;
> }
>
> diff --git a/tools/testing/selftests/vm/userfaultfd.c
> b/tools/testing/selftests/vm/userfaultfd.c
> index f6c86b036d0f..d8541a59dae5 100644
> --- a/tools/testing/selftests/vm/userfaultfd.c
> +++ b/tools/testing/selftests/vm/userfaultfd.c
> @@ -485,6 +485,7 @@ static void wp_range(int ufd, __u64 start, __u64 len,
> bool wp)
> static void continue_range(int ufd, __u64 start, __u64 len)
> {
> struct uffdio_continue req;
> + int ret;
>
> req.range.start = start;
> req.range.len = len;
> @@ -493,6 +494,17 @@ static void continue_range(int ufd, __u64 start, __u64
> len)
> if (ioctl(ufd, UFFDIO_CONTINUE, ))
> err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
> (uint64_t)start);
> +
> + /*
> + * Error handling within the kernel for continue is subtly different
> + * from copy or zeropage, so it may be a source of bugs. Trigger an
> + * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
> + */
> + req.mapped = 0;
> + ret = ioctl(ufd, UFFDIO_CONTINUE, );
> + if (ret >= 0 || req.mapped != -EEXIST)
> + err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d,
> mapped=%" PRId64,
> + ret, req.mapped);
> }
>
> static void *locking_thread(void *arg)
> --
> 2.31.0.208.g409f899ff0-goog
>
--
Peter Xu
and the map? */
> > if (page_mapcount(page) == 1 && page_count(page) > 2)
> > goto keep_locked;
> >
> > in the pre-pinning days.
> >
> > But I really think that there are a number of other commits you're
> > missing too, bec
nt, since we _know_ the page cache is
there.. So I'm thinking maybe you need to handle the continue request in
mfill_atomic_pte() before the VM_SHARED check so as to cover both cases.
--
Peter Xu
set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
/* No need to invalidate - it was non-present before */
update_mmu_cache(dst_vma, dst_addr, dst_pte);
pte_unmap_unlock(dst_pte, ptl);
return 0;
}
Then at the entry of shmem_mcopy_atomic_pte():
if (is_continue) {
page = find_lock_page(mapping, pgoff);
if (!page)
return -EFAULT;
ret = shmem_install_uffd_pte(...,
is_continue && !(dst_vma->vm_flags & VM_SHARED));
unlock_page(page);
if (ret)
put_page(page);
return ret;
}
Do you think this would be cleaner?
--
Peter Xu
atic void continue_range(int ufd, __u64 start, __u64
> len)
> if (ioctl(ufd, UFFDIO_CONTINUE, ))
> err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
> (uint64_t)start);
> +
> + /*
> + * Error handling within the kernel for continue is subtly different
> + * from copy or zeropage, so it may be a source of bugs. Trigger an
> + * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
> + */
> + req.mapped = 0;
> + ret = ioctl(ufd, UFFDIO_CONTINUE, );
> + if (ret >= 0 || req.mapped != -EEXIST)
> + err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d,
> mapped=%" PRId64,
> + ret, req.mapped);
> }
>
> static void *locking_thread(void *arg)
> --
> 2.31.0.291.g576ba9dcdaf-goog
>
--
Peter Xu
Userfaultfd write-protect mode is supported starting from Linux 5.7.
Acked-by: Mike Rapoport
Signed-off-by: Peter Xu
---
man2/ioctl_userfaultfd.2 | 84 ++--
1 file changed, 81 insertions(+), 3 deletions(-)
diff --git a/man2/ioctl_userfaultfd.2 b/man2
UFFD_FEATURE_THREAD_ID is supported in Linux 4.14.
Acked-by: Mike Rapoport
Signed-off-by: Peter Xu
---
man2/ioctl_userfaultfd.2 | 5 +
1 file changed, 5 insertions(+)
diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 47ae5f473..d4a8375b8 100644
--- a/man2
ly after the whole
hugetlbfs/shmem minor mode reaches the linux master branch.
Please review, thanks.
Peter Xu (4):
userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
userfaultfd.2: Add write-protect mode
ioctl_userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
ioctl_userfaultfd.2: Add write-protec
UFFD_FEATURE_THREAD_ID is supported since Linux 4.14.
Acked-by: Mike Rapoport
Signed-off-by: Peter Xu
---
man2/userfaultfd.2 | 13 +
1 file changed, 13 insertions(+)
diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index e7dc9f813..5c41e4816 100644
--- a/man2/userfaultfd.2
Write-protect mode is supported starting from Linux 5.7.
Acked-by: Mike Rapoport
Signed-off-by: Peter Xu
---
man2/userfaultfd.2 | 108 +++--
1 file changed, 104 insertions(+), 4 deletions(-)
diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index
On Thu, Mar 25, 2021 at 10:32:20PM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Peter,
>
> On 3/23/21 8:16 PM, Peter Xu wrote:
> > On Tue, Mar 23, 2021 at 07:11:04PM +0100, Alejandro Colomar (man-pages)
> > wrote:
> > > > +.TP
> > > > +.B UFFDIO_
only if both WRITE|SHARED set
for the vma flags. E.g., shmem_mcopy_atomic_pte() of a normal uffdio-copy will
fill in the page cache into pte, however what if this mapping is privately
mapped? IMHO we can't apply write bit otherwise the process will be writting
to the page cache directly.
However I think that question will be irrelevant to this patch.
Thanks,
--
Peter Xu
ifferent commit ID here:
commit 63c826b1372c4930f89b8a55092699fa7f0d6f4e
Author: Axel Rasmussen
Date: Thu Mar 18 10:20:43 2021 -0400
userfaultfd: support minor fault handling for shmem
Axel, did you fetched the commit ID from your local tree, perhaps? Since I
should have fetched from hnaz/linux-mm and I can see Andrew's sign-off too.
Thanks,
--
Peter Xu
rn -EFAULT;
But I didn't check other places, generally I'd return -EFAULT if I can't find a
proper other replacement which has a clearer meaning.
I don't think this is really helpful to user app too because no user app would
start to read this -EFAULT to do anything useful.. how about I drop it too if
you think the description is confusing?
Thanks,
--
Peter Xu
On Tue, Mar 23, 2021 at 07:19:12PM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Peter,
>
> Please see a few more comments below.
>
> Thanks,
>
> Alex
>
> On 3/22/21 11:08 PM, Peter Xu wrote:
> > Write-protect mode is supported starting from Linux 5.7.
On Tue, Mar 23, 2021 at 02:11:29AM +, Matthew Wilcox wrote:
> On Mon, Mar 22, 2021 at 08:48:56PM -0400, Peter Xu wrote:
> > +/* Whether to check page->mapping when zapping */
> > +#define ZAP_FLAG_CHECK_MAPPING BIT(0)
> > +
> > /*
>
On Tue, Mar 23, 2021 at 10:34:45AM +0800, Miaohe Lin wrote:
> Hi:
> On 2021/3/23 8:48, Peter Xu wrote:
> > pte_unmap_same() will always unmap the pte pointer. After the unmap,
> > vmf->pte
> > will not be valid any more. We should clear it.
> >
> > It wa
On Tue, Mar 23, 2021 at 10:27:34AM +0200, Mike Rapoport wrote:
> On Mon, Mar 22, 2021 at 06:08:45PM -0400, Peter Xu wrote:
> > UFFD_FEATURE_THREAD_ID is supported since Linux 4.14.
> >
> > Signed-off-by: Peter Xu
> > ---
> > man2/userfaultfd.2 | 13
uming it's a zero page.
QEMU plans to fix it using pre-faults as UFFDIO_COPY will complicate the live
snapshot framework, but UFFD_FEATURE_WP_UNALLOCATED should be more efficient.
It's just that we still needs to keep the old behavior.
I'll see whether I can prepare a patch for it shortly, with some test case too.
Thanks,
--
Peter Xu
On Mon, Mar 22, 2021 at 08:48:49PM -0400, Peter Xu wrote:
> This patchset is based on tag v5.12-rc3-mmots-2021-03-17-22-26. To run the
> selftest, need to apply the two patches to fix minor mode page leak:
>
> https://lore.kernel.org/lkml/20210322175132.36659-1-pet...@redhat.
/userfaultfd.h header files, because it may cause kernel header
update to easily break userspace.
Signed-off-by: Peter Xu
---
tools/testing/selftests/vm/userfaultfd.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c
b/tools/testing
swap
pte too just like a none pte.
Note that we also need to teach UFFDIO_COPY about this special pte across the
code path so that we can safely install a new page at this special pte as long
as we know it's a stall entry.
Signed-off-by: Peter Xu
---
fs/userfaultfd.c | 5 -
mm/hugetlb.c
with
_UFFDIO_WRITEPROTECT too because all existing types now support write
protection mode.
Since vma_can_userfault() will be used elsewhere, move into userfaultfd_k.h.
Signed-off-by: Peter Xu
---
fs/userfaultfd.c | 18 --
include/linux/userfaultfd_k.h| 14 ++
include
ze fetcher.
Signed-off-by: Peter Xu
---
mm/hugetlb.c | 29 +
1 file changed, 25 insertions(+), 4 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 448ef745d5ee..d4acf9d9d087 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5110,7 +5110,7 @@ uns
taken hugetlb fault mutex so that no concurrent page fault would trigger.
While the call to hugetlb_vmdelete_list() in hugetlbfs_punch_hole() is not
safe. That's why the previous call will be with ZAP_FLAG_DROP_FILE_UFFD_WP,
while the latter one won't be able to.
Signed-off-by: Peter Xu
---
fs
This is to let hugetlbfs be prepared to also recognize swap special ptes just
like uffd-wp special swap ptes.
Signed-off-by: Peter Xu
---
mm/hugetlb.c | 23 +--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fd3e87517e10
This starts from passing cp_flags into hugetlb_change_protection() so hugetlb
will be able to handle MM_CP_UFFD_WP[_RESOLVE] requests.
huge_pte_clear_uffd_wp() is introduced to handle the case where the
UFFDIO_WRITEPROTECT is requested upon migrating huge page entries.
Signed-off-by: Peter Xu
if UFFDIO_COPY_MODE_WP is provided, so that the core mm will
know this page contains valid data and never drop it.
Signed-off-by: Peter Xu
---
include/asm-generic/hugetlb.h | 5 +
include/linux/hugetlb.h | 6 --
mm/hugetlb.c | 22 +-
mm/userfaultfd.c
userfaultfd itself on either UFFDIO_COPY or handling page faults, so that
everything will still work as expected.
Signed-off-by: Peter Xu
---
fs/userfaultfd.c | 15 +++
mm/shmem.c | 13 -
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/fs/userfaultfd.c b
Hook up hugetlbfs_fault() with the capability to handle userfaultfd-wp faults.
We do this slightly earlier than hugetlb_cow() so that we can avoid taking some
extra locks that we definitely don't need.
Signed-off-by: Peter Xu
---
mm/hugetlb.c | 19 +++
1 file changed, 19
It should be handled similarly like other uffd-wp wr-protected ptes: we should
pass it over when the dst_vma has VM_UFFD_WP armed, otherwise drop it.
Signed-off-by: Peter Xu
---
mm/memory.c | 15 ++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm
ING, while even_cow==true means a none
zap flag to pass in (though in most cases we have had even_cow==false).
No functional change intended.
Signed-off-by: Peter Xu
---
fs/dax.c | 10 ++
include/linux/mm.h | 4 ++--
mm/khugepaged.c| 3 ++-
mm/memory.c| 15 -
t.
Note that this patch only covers the small pages (pte level) but not covering
any of the transparent huge pages yet. But this will be a base for thps too.
Signed-off-by: Peter Xu
---
mm/mprotect.c | 48
1 file changed, 48 insertions(+)
diff --git a/mm/mp
table lock.
Signed-off-by: Peter Xu
---
mm/mprotect.c | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 6b63e3544b47..51c954afa406 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -296,8 +296,16 @@ static inline unsigned long change_pmd_ra
, or
punching a hole in a shmem file. For the latter, we can only drop the uffd-wp
bit when holding the page lock. It means the unmap_mapping_range() in
shmem_fallocate() still reuqires to zap without ZAP_FLAG_DROP_FILE_UFFD_WP
because that's still racy with the page faults.
Signed-off-by: Peter
, it'll be very
easy to grep this information by simply grepping the flag.
It'll also make life easier when we want to e.g. pass in zap_flags into the
callers like unmap_mapping_pages() (instead of adding new booleans besides the
even_cows parameter).
Signed-off-by: Peter Xu
---
inc
tries"),
but introduce ZAP_FLAG_SKIP_SWAP flag, which means the opposite of previous
"details" parameter: the caller should explicitly set this to skip swap
entries, otherwise swap entries will always be considered (which is still the
major case here).
Cc: Kirill A. Shutemov
Sig
k: https://lore.kernel.org/lkml/20201126222359.8120-1-pet...@redhat.com/
Link: https://lore.kernel.org/lkml/20201130230603.46187-1-pet...@redhat.com/
Suggested-by: Andrea Arcangeli
Suggested-by: Hugh Dickins
Signed-off-by: Peter Xu
---
arch/x86/include/asm/pgtable.h
-around with the new flag could confuse all the rest of
pages when installing ptes from page cache when there's a cache hit.
Signed-off-by: Peter Xu
---
include/linux/mm.h| 2 +
include/linux/userfaultfd_k.h | 11
mm/memory.c | 103
zap_details and let them
simply be parameters of unmap_mapping_range_tree(), which is inlined.
Signed-off-by: Peter Xu
---
include/linux/mm.h | 2 --
mm/memory.c| 20 ++--
2 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index
operly (e.g., in do_swap_page()) when we see a special swap pte - we
should never call do_swap_page() upon those ptes, but just to bail out early if
it happens.
Signed-off-by: Peter Xu
---
arch/arm64/kernel/mte.c | 2 +-
fs/proc/task_mmu.c | 14 --
include/linux/swapops.h | 39 +++
f->pte first. Or, alloc_set_pte() will make sure to allocate a new
pte even after calling pte_unmap_same().
Since we'll need to modify vmf->pte, directly pass in vmf into pte_unmap_same()
and then we can also avoid the long parameter list.
Signed-off-by: Peter Xu
---
mm/memory.c | 13 +++
or uffd-wp, that could lead to data loss if without the
dirty bit set.
Note that shmem_mfill_zeropage_pte() will always call shmem_mfill_atomic_pte()
with wp_copy==false because UFFDIO_ZEROCOPY does not support
UFFDIO_COPY_MODE_WP.
Signed-off-by: Peter Xu
---
include/linux/shmem_fs.h | 5 +++--
efault umap only supports anonymous. So to
test it we need to build [3] then [2].
Any comment would be greatly welcomed. Thanks,
[1] https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs
[2] https://github.com/LLNL/umap-apps
[3] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs
Pe
UFFD_FEATURE_THREAD_ID is supported since Linux 4.14.
Signed-off-by: Peter Xu
---
man2/userfaultfd.2 | 13 +
1 file changed, 13 insertions(+)
diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index e7dc9f813..555e37409 100644
--- a/man2/userfaultfd.2
+++ b/man2/userfaultfd.2
Write-protect mode is supported starting from Linux 5.7.
Signed-off-by: Peter Xu
---
man2/userfaultfd.2 | 104 -
1 file changed, 102 insertions(+), 2 deletions(-)
diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index 555e37409..8ad4a71b5 100644
Userfaultfd write-protect mode is supported starting from Linux 5.7.
Signed-off-by: Peter Xu
---
man2/ioctl_userfaultfd.2 | 84 ++--
1 file changed, 81 insertions(+), 3 deletions(-)
diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index
UFFD_FEATURE_THREAD_ID is supported in Linux 4.14.
Signed-off-by: Peter Xu
---
man2/ioctl_userfaultfd.2 | 5 +
1 file changed, 5 insertions(+)
diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 47ae5f473..d4a8375b8 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2
o features missing in current manpage, namely:
(1) Userfaultfd Thread-ID feature
(2) Userfaultfd write protect mode
There's also a 3rd one which was just contributed from Axel - Axel, I think it
would be great if you can add that part too, probably after the whole
hugetlbfs/shmem minor mode re
3.
Thanks for looking, I'll repost shortly.
--
Peter Xu
g for shmem")
> Signed-off-by: Axel Rasmussen
Reviewed-by: Peter Xu
--
Peter Xu
: Mike Kravetz
Cc: Mike Rapoport
Cc: Andrew Morton
Fixes: f2bf15fb0969 ("userfaultfd: add minor fault registration mode")
Signed-off-by: Peter Xu
---
mm/hugetlb.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 408dbc08298a..56b78a206913 10064
tain the same semantic
as UFFDIO_ZEROCOPY so less data copy too (UFFDIO_ZEROCOPY does not support
UFFDIO_COPY_MODE_WP so far). However we need to be careful on mixture use of
these, e.g., I think UFFD_FEATURE_WP_UNALLOCATED at least shouldn't be allowed
with UFFDIO_REGISTER_MODE_MISSING, otherwise the
n bool
> mm/huge_memory.c: rework the function do_huge_pmd_numa_page() slightly
> mm/huge_memory.c: remove redundant PageCompound() check
> mm/huge_memory.c: remove unused macro
> TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG
> mm/huge_memory.c: use helper function migration_entry_to_page()
Reviewed-by: Peter Xu
--
Peter Xu
er in
that case in vma_to_resize() we'll bail out even earlier than line 676 when
checking against the size:
https://elixir.bootlin.com/linux/v5.12-rc3/source/mm/mremap.c#L667
So IIUC we'll still need the change as Hugh suggested previously.
Thanks,
--
Peter Xu
vma->vm_flags & VM_SHARED))
> - return ERR_PTR(-EINVAL);
> -
> if (is_vm_hugetlb_page(vma))
> return ERR_PTR(-EINVAL);
The code change seems to be not aligned with what the commit message said. Did
you perhaps forget to add the checks against VM_DONTEXPAND | VM_PFNMAP? I'm
guessing that (instead of commit message to be touched up) because you still
attached the revert patch, then that check seems to be needed. Thanks,
--
Peter Xu
On Wed, Mar 17, 2021 at 10:18:40AM +0800, Miaohe Lin wrote:
> Hi:
> On 2021/3/17 4:40, Peter Xu wrote:
> > On Tue, Mar 16, 2021 at 08:40:02AM -0400, Miaohe Lin wrote:
> >> +static inline void split_huge_pmd_if_needed(struct vm_area_struct *vma,
>
be use ALIGN/ALIGN_DOWN too against HPAGE_PMD_SIZE?
> + split_huge_pmd_address(vma, address, false, NULL);
> +}
--
Peter Xu
though
> vma_is_anonymous() will no longer protect it.
>
> Was there an mremap(2) man page update for MREMAP_DONTUNMAP?
> Whether or not there was before, it ought to get one now.
I'm curious whether it's okay to expand MREMAP_DONTUNMAP to PFNMAP too..
E.g. vfio maps device MMIO regions with both VM_DONTEXPAND|VM_PFNMAP, to me it
makes sense to allow the userspace to get such MMIO region remapped/duplicated
somewhere else as long as the size won't change. With the strict check as
above we kill all those possibilities.
Though in that case we'll still need commits like cd544fd1dc92 to protect any
customized ->mremap() when they're not supported.
Thanks,
--
Peter Xu
On Thu, Mar 11, 2021 at 11:35:24AM +, Christoph Hellwig wrote:
> On Wed, Mar 10, 2021 at 03:06:07PM -0500, Peter Xu wrote:
> > On Wed, Mar 10, 2021 at 02:40:11PM -0400, Jason Gunthorpe wrote:
> > > On Wed, Mar 10, 2021 at 11:34:06AM -0700, Alex Williamson wrote:
> > &g
UFFD_FEATURE_THREAD_ID is supported in Linux 4.14.
Signed-off-by: Peter Xu
---
man2/ioctl_userfaultfd.2 | 5 +
1 file changed, 5 insertions(+)
diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 47ae5f473..d4a8375b8 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2
1 - 100 of 1381 matches
Mail list logo