On Wed, Jun 11, 2025 at 01:09:32PM +0100, Nikita Kalyazin wrote:
>
>
> On 10/06/2025 23:22, Peter Xu wrote:
> > On Fri, Apr 04, 2025 at 03:43:47PM +, Nikita Kalyazin wrote:
> > > Remove shmem-specific code from UFFDIO_CONTINUE implementation for
> > > non-hu
lly not useful to
non-userfault users, meanwhile we also don't need to hand-cook the vm_fault
struct below just to suite the current fault() interfacing.
Thanks,
--
Peter Xu
goto out_unlock;
I thought it would fail guest-memfd already on a CONTINUE request, and it
doesn't seem to be touched yet in this series.
I'm not yet sure how the test worked out without hitting things like it.
Highly likely I missed something. Some explanations would be welcomed..
Thanks,
> vmf->page = folio_file_page(folio, vmf->pgoff);
>
> out_folio:
> --
> 2.47.1
>
--
Peter Xu
On Thu, Mar 13, 2025 at 03:25:16PM +, Nikita Kalyazin wrote:
>
>
> On 12/03/2025 19:32, Peter Xu wrote:
> > On Wed, Mar 12, 2025 at 05:07:25PM +, Nikita Kalyazin wrote:
> > > However if MISSING is not registered, the kernel will auto-populate with a
> > &g
also prefault by writing
zeros in a loop after mmap().
Thanks,
--
Peter Xu
ould raise this up here anyway at least as a pure
question.
> + kvm_gmem_hugetlb_filemap_remove_folio(folio);
> + mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> +
> + num_freed++;
> + }
> + folio_batch_release(&fbatch);
> + cond_resched();
> + }
> +
> + return num_freed;
> +}
--
Peter Xu
-be-posted too). I also have a
QEMU branch ready that can boot with it (I didn't yet test more things).
https://github.com/xzpeter/qemu/commits/peter-gmem-v0.2/
For example, besides guest-memfd alone, we definitely also need guest-memfd
being trappable by userfaultfd, as what you are trying to do here, one way
or another.
Thanks,
--
Peter Xu
mmap()ed VAs to NIC as
buffers (e.g. in recvmsg(), for example, as part of iovec[]), and as long
as the mmap()ed ranges are not registered by KVM memslots, there's no
concern on non-atomic copy.
Thanks,
--
Peter Xu
ing trying to access it will be trapped.
[1]
--
Peter Xu
On Tue, Mar 11, 2025 at 04:56:47PM +, Nikita Kalyazin wrote:
>
>
> On 10/03/2025 19:57, Peter Xu wrote:
> > On Mon, Mar 10, 2025 at 06:12:22PM +, Nikita Kalyazin wrote:
> > >
> > >
> > > On 05/03/2025 20:29, Peter Xu wrote:
> > >
On Mon, Mar 10, 2025 at 06:12:22PM +, Nikita Kalyazin wrote:
>
>
> On 05/03/2025 20:29, Peter Xu wrote:
> > On Wed, Mar 05, 2025 at 11:35:27AM -0800, James Houghton wrote:
> > > I think it might be useful to implement an fs-generic MINOR mode. The
> > > fau
and when folio lock is frequently taken elsewhere too.
It might boil down to how many more FSes would support minor fault, and
whether we would care about such difference at last to shmem users. If gmem
is the only one after existing ones, IIUC there's still option we implement
it in gmem code. After all, I expect the change should be very under
control (<20 LOCs?)..
--
Peter Xu
rrently it uses a lot of mm functions that are not yet
exported, so AFAIU it will only build if kvm is builtin.
Thanks,
--
Peter Xu
e I used that to allow gmem report huge page supports on faults.
Said that, above only existed in my own tree so far, so I also don't know
whether something like that could be accepted (even if it'll work for you).
Thanks,
--
Peter Xu
m private info.
> inode->i_op = &kvm_gmem_iops;
> inode->i_mapping->a_ops = &kvm_gmem_aops;
> @@ -1097,6 +1178,8 @@ static struct inode
> *kvm_gmem_inode_make_secure_inode(const char *name,
>
> return inode;
>
> +free_private:
> + kfree(private);
> out:
> iput(inode);
>
> --
> 2.46.0.598.g6f2099f65c-goog
>
--
Peter Xu
riginal constraints in place.
>
> Fixes: 2e47a445d7b3 ("selftests/mm: run_vmtests.sh: fix hugetlb mem size
> calculation")
> Signed-off-by: Rafael Aquini
Oops.. thanks!
Reviewed-by: Peter Xu
--
Peter Xu
On Thu, Feb 13, 2025 at 07:52:43AM +, Ackerley Tng wrote:
> Peter Xu writes:
>
> > On Tue, Sep 10, 2024 at 11:43:46PM +, Ackerley Tng wrote:
> >> +static struct folio *kvm_gmem_hugetlb_alloc_folio(struct hstate *h,
> >> +
n the comment too in that path:
move_normal_pud():
/*
* The destination pud shouldn't be established, free_pgtables()
* should have released it.
*/
if (WARN_ON_ONCE(!pud_none(*new_pud)))
return false;
PMD path has similar implications.
Thanks,
--
Peter Xu
gt; {
> struct kvm_gmem_hugetlb *hgmem;
>
> + /* TODO: Check if even_cows should be 0 or 1 */
> + unmap_mapping_range(inode->i_mapping, 0, LLONG_MAX, 0);
Setting to 0 is ok in both places: even_cows only applies to MAP_PRIVATE,
which gmemfd doesn't support. So feel free to drop the two comment lines.
Thanks,
--
Peter Xu
rotect API to
> userfaultfd ioctl")
> Cc: sta...@vger.kernel.org
Nothing I see wrong:
Reviewed-by: Peter Xu
One trivial thing: some multiple-line comments is following the net/ coding
style rather than mm/, but well.. I don't think it's a huge deal.
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#commenting
Thanks again.
--
Peter Xu
esn't have any acks. I don't suppose you would
> be able to do a quick review to calm the nerves??
Heh, I fully trusted you, and I appreciated your help too. I'll need to run
for 1-2 hours, but I'll read it this afternoon.
Side note: no review is as good as tests on reliability POV if that was the
concern, but I'll try my best.
Thanks,
--
Peter Xu
subpool() -> unlock_or_release_subpool().
> +
> + spin_lock(&inode->i_lock);
> + inode->i_blocks -= blocks_per_huge_page(h) * num_freed;
> + spin_unlock(&inode->i_lock);
> +}
--
Peter Xu
hugepage_subpool_put_pages(spool, 1);
> +
> +err_cancel_charge:
> + if (memcg_charge_was_prepared)
> + mem_cgroup_cancel_charge(memcg, pages_per_huge_page(h));
> +
> +err:
> + folio = ERR_PTR(-ENOMEM);
> + goto out;
> +}
--
Peter Xu
t; ./tools/testing/selftests/mm/uffd-unit-tests.c:1485:30-31: WARNING: Use
> ARRAY_SIZE
>
> Fixes: 16a45b57cbf2 ("selftests/mm: add framework for uffd-unit-test")
> Cc: Andrew Morton
> Cc: Shuah Khan
> Cc: Peter Xu
> Cc: linux...@kvack.org
> Cc: linux-kselft...@vger.ke
s all over the
places over cgroup/pool/meminfo/etc.
--
Peter Xu
On Thu, Oct 17, 2024 at 01:47:13PM -0300, Jason Gunthorpe wrote:
> On Thu, Oct 17, 2024 at 10:58:29AM -0400, Peter Xu wrote:
>
> > My question was more torwards whether gmemfd could still expose the
> > possibility to be used in VA forms to other modules that may not support
On Wed, Oct 16, 2024 at 08:54:24PM -0300, Jason Gunthorpe wrote:
> On Wed, Oct 16, 2024 at 07:49:31PM -0400, Peter Xu wrote:
> > On Wed, Oct 16, 2024 at 07:51:57PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Oct 16, 2024 at 04:16:17PM -0400, Peter Xu wrote:
> > > >
On Wed, Oct 16, 2024 at 07:51:57PM -0300, Jason Gunthorpe wrote:
> On Wed, Oct 16, 2024 at 04:16:17PM -0400, Peter Xu wrote:
> >
> > Is there chance that when !CoCo will be supported, then external modules
> > (e.g. VFIO) can reuse the old user mappings, just like befor
On Wed, Oct 16, 2024 at 10:45:43AM +0200, David Hildenbrand wrote:
> On 16.10.24 01:42, Ackerley Tng wrote:
> > Peter Xu writes:
> >
> > > On Fri, Oct 11, 2024 at 11:32:11PM +, Ackerley Tng wrote:
> > > > Peter Xu writes:
> > > >
> > &g
On Fri, Oct 11, 2024 at 11:32:11PM +, Ackerley Tng wrote:
> Peter Xu writes:
>
> > On Tue, Sep 10, 2024 at 11:43:57PM +, Ackerley Tng wrote:
> >> The faultability xarray is stored on the inode since faultability is a
> >> property of the guest_memfd's
-CoCo
context for 1G?
I saw that you also mentioned you have working QEMU prototypes ready in
another email. It'll be great if you can push your kernel/QEMU's latest
tree (including all dependency patches) somewhere so anyone can have a
closer look, or play with it.
Thanks,
--
Peter Xu
On Thu, Apr 11, 2024 at 06:55:44PM +0200, Paolo Bonzini wrote:
> On Mon, Apr 8, 2024 at 3:56 PM Peter Xu wrote:
> > Paolo,
> >
> > I may miss a bunch of details here (as I still remember some change_pte
> > patches previously on the list..), however not sure whether
ked because I remember Andrea used to have a custom tree
maintaining that part:
https://github.com/aagit/aa/commit/c761078df7a77d13ddfaeebe56a0f4bc128b1968
Maybe it can't be enabled for some reason that I overlooked in the current
tree, or we just decided to not to?
Thanks,
--
Peter Xu
ORMAL and _CONTINUE for both
> shmem
> + * and anon, and for both shared and private VMAs.
> */
> -static int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> - struct vm_area_struct *dst_vma,
> - unsigned long dst_addr, struct page *page,
> - bool newly_allocated, bool wp_copy)
> +int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> + struct vm_area_struct *dst_vma,
> + unsigned long dst_addr, struct page *page,
> + bool newly_allocated, bool wp_copy)
> {
> int ret;
> pte_t _dst_pte, *dst_pte;
> --
> 2.31.1.368.gbe11c130af-goog
>
--
Peter Xu
On Tue, Apr 20, 2021 at 06:24:50PM +0200, Paolo Bonzini wrote:
> On 20/04/21 17:32, Peter Xu wrote:
> > On Tue, Apr 20, 2021 at 10:37:39AM -0400, Peter Xu wrote:
> > > On Tue, Apr 20, 2021 at 04:16:14AM -0400, Paolo Bonzini wrote:
> > > > The main thread could sta
on receiving a SIG_USR1 without a handler (when vcpu runs far slower than main).
Signed-off-by: Peter Xu
---
tools/testing/selftests/kvm/dirty_log_test.c | 8
1 file changed, 8 insertions(+)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c
b/tools/testing/selftests/kvm
.org/lkml/20210413213641.23742-1-pet...@redhat.com/
[2] https://lore.kernel.org/lkml/20210417140956.GV4440@xz-x1/
Cc: Paolo Bonzini
Cc: Sean Christopherson
Cc: Andrew Jones
Signed-off-by: Peter Xu
---
tools/testing/selftests/kvm/dirty_log_test.c | 62
1 file changed, 51 inse
.kernel.org/kvm/20210420081614.684787-1-pbonz...@redhat.com/
Peter Xu (2):
KVM: selftests: Sync data verify of dirty logging with guest sync
KVM: selftests: Wait for vcpu thread before signal setup
tools/testing/selftests/kvm/dirty_log_test.c | 70 +---
1 file changed, 59 insertions(+
On Tue, Apr 20, 2021 at 10:37:39AM -0400, Peter Xu wrote:
> On Tue, Apr 20, 2021 at 04:16:14AM -0400, Paolo Bonzini wrote:
> > The main thread could start to send SIG_IPI at any time, even before signal
> > blocked on vcpu thread. Therefore, start the vcpu thread with the sig
_log_test could fail directly
> on receiving a SIGUSR1 without a handler (when vcpu runs far slower than
> main).
>
> Reported-by: Peter Xu
> Cc: sta...@vger.kernel.org
> Signed-off-by: Paolo Bonzini
Yes, indeed better! :)
Reviewed-by: Peter Xu
--
Peter Xu
On Tue, Apr 20, 2021 at 10:07:16AM +0200, Paolo Bonzini wrote:
> On 18/04/21 14:43, Peter Xu wrote:
> > 8<-
> > diff --git a/tools/testing/selftests/kvm/dirty_log_test.c
> > b/tools/testing/selftests/kvm/dirty_log_test.c
> > index 25230e799bc4..d3050d1c2cd0
puset. However it
seems still the only place to set the new flag HK_FLAG_MANAGED_IRQ. If one day
we'll finally obsolete isolcpus= we may need to think about where to put it?
When I looked at it, I also noticed I see no caller to set HK_FLAG_SCHED at
all. Is it really used anywhere?
Reg
On Sat, Apr 17, 2021 at 10:36:01AM -0400, Peter Xu wrote:
> This fixes a bug that can trigger with e.g. "taskset -c 0 ./dirty_log_test" or
> when the testing host is very busy.
>
> A similar previous attempt is done [1] but that is not enough, the reason is
> stated in
on receiving a SIG_USR1 without a handler (when vcpu runs far slower than main).
Signed-off-by: Peter Xu
---
tools/testing/selftests/kvm/dirty_log_test.c | 8
1 file changed, 8 insertions(+)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c
b/tools/testing/selftests/kvm
.org/lkml/20210413213641.23742-1-pet...@redhat.com/
[2] https://lore.kernel.org/lkml/20210417140956.GV4440@xz-x1/
Cc: Paolo Bonzini
Cc: Sean Christopherson
Cc: Andrew Jones
Signed-off-by: Peter Xu
---
tools/testing/selftests/kvm/dirty_log_test.c | 60
1 file changed, 50 inse
test this patch:
(1) while :; do taskset -c 1 ./dirty_log_test; done
(2) taskset -c 1 bash -c "while :; do :; done"
Review comments are greatly welcomed.
Thanks,
[1] https://lore.kernel.org/lkml/20210413213641.23742-1-pet...@redhat.com/
Peter Xu (2):
KVM: selftests: Sync data
ay. I tested longer yesterday but
haven't updated this patch yet. More below.
On Sat, Apr 17, 2021 at 02:59:48PM +0200, Paolo Bonzini wrote:
> On 13/04/21 23:36, Peter Xu wrote:
> > This patch closes this race by allowing the main thread to give the vcpu
> > thread
> >
that any subsequently armed timers on
> CLOCK_REALTIME and CLOCK_TAI are evaluated with the correct offsets.
>
> Signed-off-by: Marcelo Tosatti
>
> ---
>
> v5:
> - Add missing hrtimer_update_base (Peter Xu).
>
> v4:
>- Drop unused code (Thomas).
>
But I might be slowly
> realizing that the ioctl to add the pte (in 4/9) will do its
> shmem_getpage_gfp(), and that will bring in the swap if user
> did not already do so: so I was wrong to claim more robustness
> the other way, this placement should be fine. I think.
>
> > if (xa_is_value(page)) {
> > error = shmem_swapin_page(inode, index, &page,
> > sgp, gfp, vma, fault_type);
> > --
> > 2.31.1.295.g9ea45b61b8-goog
>
--
Peter Xu
help
avoid this specific race condition.
Cc: Andrew Jones
Cc: Paolo Bonzini
Cc: Vitaly Kuznetsov
Cc: Sean Christopherson
Signed-off-by: Peter Xu
---
v2:
- drop one unnecessary check on "!matched"
---
tools/testing/selftests/kvm/dirty_log_test.c | 53 +++-
1 file chan
help
avoid this specific race condition.
Cc: Andrew Jones
Cc: Paolo Bonzini
Cc: Vitaly Kuznetsov
Cc: Sean Christopherson
Signed-off-by: Peter Xu
---
tools/testing/selftests/kvm/dirty_log_test.c | 54 +++-
1 file changed, 53 insertions(+), 1 deletion(-)
diff --git a/tools/t
the UFFDIO_CONTINUE ioctl for shmem-backed
> minor faults, though, so userspace doesn't yet have a way to resolve
> such faults.
>
> Signed-off-by: Axel Rasmussen
Everything looks right to me, but it'll be great if Andrea or Hugh will have a
look too.
Acked-by: Peter Xu
--
Peter Xu
mfd alias failed");
> +
> + if (is_src)
> + area_src_alias = area_alias;
> + else
> + area_dst_alias = area_alias;
> +}
It would be nice if shmem_allocate_area() could merge with
hugetlb_allocate_area() somehow, but not that urgent.
Reviewed-by: Peter Xu
--
Peter Xu
ss in the right argv[] so we actually print out the
> hugetlb file path.
>
> Signed-off-by: Axel Rasmussen
Reviewed-by: Peter Xu
--
Peter Xu
Would it look even nicer to init() at the entry of each test, and clear() after
finish one test?
> +
> uffdio_register.range.start = (unsigned long) area_dst;
> uffdio_register.range.len = nr_pages * page_size;
> uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
The rest looks good to me. Thanks,
--
Peter Xu
On Mon, Apr 12, 2021 at 09:40:22PM -0700, Axel Rasmussen wrote:
> On Mon, Apr 12, 2021 at 4:17 PM Peter Xu wrote:
> >
> > On Thu, Apr 08, 2021 at 04:43:22PM -0700, Axel Rasmussen wrote:
> > > +/*
> > > + * Install PTEs, to map dst_addr (within dst_vma) to page.
On Mon, Apr 12, 2021 at 05:51:14PM -0700, Hugh Dickins wrote:
> On Mon, 12 Apr 2021, Peter Xu wrote:
> > On Tue, Apr 06, 2021 at 11:14:30PM -0700, Hugh Dickins wrote:
> > > > +static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pm
copy_atomic()... Then it'll further passed into shmem_mcopy_atomic_pte()
now after this patch (as shmem_mfill_zeropage_pte() probably only did one thing
good which is to clear src_addr). Not a big deal, though.
All the rest looks sane to me.
Reviewed-by: Peter Xu
I'll wait to lo
unsigned long address, unsigned int flags);
> #ifdef CONFIG_USERFAULTFD
> +enum mcopy_atomic_mode;
(I'm not 100% sure, but.. maybe this can be moved even out of ifdef? Then you
can define it once at the top rather than twice?)
Reviewed-by: Peter Xu
--
Peter Xu
WP and MINOR modes are conditionally enabled on specific memory types. This
patch avoids dumping tons of zeros for those cases when the modes are not
supported at all.
Reviewed-by: Axel Rasmussen
Signed-off-by: Peter Xu
---
tools/testing/selftests/vm/userfaultfd.c | 30
edule latency of
resolving thread. It may not mean an issue with uffd.
Neither do I saw this error triggered either in the past runs. Even if it
triggers, it'll be drown in all the rest of test logs. Remove it.
Reviewed-by: Axel Rasmussen
Signed-off-by: Peter Xu
---
tools/testing/se
Introduce err()/_err() and replace all the different ways to fail the program,
mostly "fprintf" and "perror" with tons of exit() calls. Always stop the test
program at any failure.
Reviewed-by: Axel Rasmussen
Signed-off-by: Peter Xu
---
tools/testing/selftests/vm
Userfaultfd selftest does not need to handle kernel initiated fault. Set user
mode so it can be run even if unprivileged_userfaultfd=0 (which is the default).
Reviewed-by: Axel Rasmussen
Signed-off-by: Peter Xu
---
tools/testing/selftests/vm/userfaultfd.c | 2 +-
1 file changed, 1 insertion
conditionally check the fault flag - just do it
unconditionally.
Reviewed-by: Axel Rasmussen
Signed-off-by: Peter Xu
---
tools/testing/selftests/vm/userfaultfd.c | 55 +---
1 file changed, 1 insertion(+), 54 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultf
serfaultfd selftest on
fault handling, to use an err() macro instead of either fprintf() or perror()
then another exit() call.
The huge cleanup is done in the last patch. The first 4 patches are some other
standalone cleanups for the same file, so I put them together.
Please review, thanks.
P
clear;
unlock_page(page);
put_page(page);
page = NULL;
hindex = index;
}
I think it won't happen for your case since the page should be uptodate already
(the other thread should check and modify the page before CONTINUE), but still
raise this up, since if the page was allocated it smells better to still
install the fallocated page (do we need to clear the page and SetUptodate)?
--
Peter Xu
t.
*/
if (pte_hw_dirty(pte))
pte = pte_mkdirty(pte);
pte = clear_pte_bit(pte, __pgprot(PTE_WRITE));
pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
return pte;
}
So arm64 will explicitly set the dirty bit (from the HW dirty bit) when
wr-protect. It seems to prove that at least for arm64 it's very valid to have
!write && dirty pte.
Thanks,
--
Peter Xu
n your tree, without the shmem
> series? And then I'll resolve any conflicts in my tree?
>
> It's true that we haven't tested the hugetlbfs minor faults patch
> extensively *with the shmem one also applied*, but it has had more
> thorough review than the shmem one at this point (e.g. by Mike
> Kravetz), and they're rather separate code paths (I'd be surprised if
> one breaks the other).
Yes I think the hugetlb part should have got more review done. IMHO it's a
matter of whether Mike would still like to do a more thorough review, or seems
okay to keep them.
I can repost the selftest series later if needed, as long as I figured which is
the suitable base commit. Those selftest patches are definitely not urgent for
this release, so we can wait for the next release.
Thanks,
--
Peter Xu
#x27;s indeed a bit awkward to
swapin here. Maybe move this chunk to right after pagecache_get_page()
returns? Then no need to touch the rest.
> +
> + if (swapped)
> + return 0;
> +
> if (page)
> hindex = page->index;
> if (page && sgp == SGP_WRITE)
> --
> 2.31.1.295.g9ea45b61b8-goog
>
--
Peter Xu
og/?h=mapcount_deshare
[3]
https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=mapcount_deshare&id=7c3a31caa34ac6ac4a4ec0559b1307b5edfc0821
[4]
https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=mapcount_deshare&id=599aa62474f51a470408b28fd4365320a5357aca
--
Peter Xu
case MCOPY_ATOMIC_CONTINUE:
> - err = -EINVAL;
> - break;
> - }
> } else {
> VM_WARN_ON_ONCE(wp_copy);
> err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
>src_addr, mode, page);
> }
>
> +out:
> return err;
> }
>
> diff --git a/tools/testing/selftests/vm/userfaultfd.c
> b/tools/testing/selftests/vm/userfaultfd.c
> index f6c86b036d0f..d8541a59dae5 100644
> --- a/tools/testing/selftests/vm/userfaultfd.c
> +++ b/tools/testing/selftests/vm/userfaultfd.c
> @@ -485,6 +485,7 @@ static void wp_range(int ufd, __u64 start, __u64 len,
> bool wp)
> static void continue_range(int ufd, __u64 start, __u64 len)
> {
> struct uffdio_continue req;
> + int ret;
>
> req.range.start = start;
> req.range.len = len;
> @@ -493,6 +494,17 @@ static void continue_range(int ufd, __u64 start, __u64
> len)
> if (ioctl(ufd, UFFDIO_CONTINUE, &req))
> err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
> (uint64_t)start);
> +
> + /*
> + * Error handling within the kernel for continue is subtly different
> + * from copy or zeropage, so it may be a source of bugs. Trigger an
> + * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
> + */
> + req.mapped = 0;
> + ret = ioctl(ufd, UFFDIO_CONTINUE, &req);
> + if (ret >= 0 || req.mapped != -EEXIST)
> + err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d,
> mapped=%" PRId64,
> + ret, req.mapped);
> }
>
> static void *locking_thread(void *arg)
> --
> 2.31.0.208.g409f899ff0-goog
>
--
Peter Xu
gle mapper, more references than us and the map? */
> > if (page_mapcount(page) == 1 && page_count(page) > 2)
> > goto keep_locked;
> >
> > in the pre-pinning days.
> >
> > But I really think that there are a number of other commit
TINUE is slightly different, since we _know_ the page cache is
there.. So I'm thinking maybe you need to handle the continue request in
mfill_atomic_pte() before the VM_SHARED check so as to cover both cases.
--
Peter Xu
counter_file(page));
page_add_file_rmap(page, false);
set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
/* No need to invalidate - it was non-present before */
update_mmu_cache(dst_vma, dst_addr, dst_pte);
pte_unmap_unlock(dst_pte, ptl);
return 0;
}
Then at the entry of shmem_mcopy_atomic_pte():
if (is_continue) {
page = find_lock_page(mapping, pgoff);
if (!page)
return -EFAULT;
ret = shmem_install_uffd_pte(...,
is_continue && !(dst_vma->vm_flags & VM_SHARED));
unlock_page(page);
if (ret)
put_page(page);
return ret;
}
Do you think this would be cleaner?
--
Peter Xu
t = start;
> req.range.len = len;
> @@ -493,6 +494,17 @@ static void continue_range(int ufd, __u64 start, __u64
> len)
> if (ioctl(ufd, UFFDIO_CONTINUE, &req))
> err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
> (uint64_t)start);
> +
> + /*
> + * Error handling within the kernel for continue is subtly different
> + * from copy or zeropage, so it may be a source of bugs. Trigger an
> + * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
> + */
> + req.mapped = 0;
> + ret = ioctl(ufd, UFFDIO_CONTINUE, &req);
> + if (ret >= 0 || req.mapped != -EEXIST)
> + err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d,
> mapped=%" PRId64,
> + ret, req.mapped);
> }
>
> static void *locking_thread(void *arg)
> --
> 2.31.0.291.g576ba9dcdaf-goog
>
--
Peter Xu
Userfaultfd write-protect mode is supported starting from Linux 5.7.
Acked-by: Mike Rapoport
Signed-off-by: Peter Xu
---
man2/ioctl_userfaultfd.2 | 84 ++--
1 file changed, 81 insertions(+), 3 deletions(-)
diff --git a/man2/ioctl_userfaultfd.2 b/man2
art too, probably after the whole
hugetlbfs/shmem minor mode reaches the linux master branch.
Please review, thanks.
Peter Xu (4):
userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
userfaultfd.2: Add write-protect mode
ioctl_userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
ioctl_userfaultfd.2: A
UFFD_FEATURE_THREAD_ID is supported since Linux 4.14.
Acked-by: Mike Rapoport
Signed-off-by: Peter Xu
---
man2/userfaultfd.2 | 13 +
1 file changed, 13 insertions(+)
diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index e7dc9f813..5c41e4816 100644
--- a/man2/userfaultfd.2
Write-protect mode is supported starting from Linux 5.7.
Acked-by: Mike Rapoport
Signed-off-by: Peter Xu
---
man2/userfaultfd.2 | 108 +++--
1 file changed, 104 insertions(+), 4 deletions(-)
diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index
UFFD_FEATURE_THREAD_ID is supported in Linux 4.14.
Acked-by: Mike Rapoport
Signed-off-by: Peter Xu
---
man2/ioctl_userfaultfd.2 | 5 +
1 file changed, 5 insertions(+)
diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 47ae5f473..d4a8375b8 100644
--- a/man2
On Thu, Mar 25, 2021 at 10:32:20PM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Peter,
>
> On 3/23/21 8:16 PM, Peter Xu wrote:
> > On Tue, Mar 23, 2021 at 07:11:04PM +0100, Alejandro Colomar (man-pages)
> > wrote:
> > > > +.TP
> > > > +.B UFFDIO_
bit for normal uffdio_copy case only if both WRITE|SHARED set
for the vma flags. E.g., shmem_mcopy_atomic_pte() of a normal uffdio-copy will
fill in the page cache into pte, however what if this mapping is privately
mapped? IMHO we can't apply write bit otherwise the process will be writting
to the page cache directly.
However I think that question will be irrelevant to this patch.
Thanks,
--
Peter Xu
do have a
different commit ID here:
commit 63c826b1372c4930f89b8a55092699fa7f0d6f4e
Author: Axel Rasmussen
Date: Thu Mar 18 10:20:43 2021 -0400
userfaultfd: support minor fault handling for shmem
Axel, did you fetched the commit ID from your local tree, perhaps? Since I
should have fetched from hnaz/linux-mm and I can see Andrew's sign-off too.
Thanks,
--
Peter Xu
ffdio_writeprotect)))
return -EFAULT;
But I didn't check other places, generally I'd return -EFAULT if I can't find a
proper other replacement which has a clearer meaning.
I don't think this is really helpful to user app too because no user app would
start to read this -EFAULT to do anything useful.. how about I drop it too if
you think the description is confusing?
Thanks,
--
Peter Xu
On Tue, Mar 23, 2021 at 07:19:12PM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Peter,
>
> Please see a few more comments below.
>
> Thanks,
>
> Alex
>
> On 3/22/21 11:08 PM, Peter Xu wrote:
> > Write-protect mode is supported starting from Linux 5.7.
On Tue, Mar 23, 2021 at 02:11:29AM +, Matthew Wilcox wrote:
> On Mon, Mar 22, 2021 at 08:48:56PM -0400, Peter Xu wrote:
> > +/* Whether to check page->mapping when zapping */
> > +#define ZAP_FLAG_CHECK_MAPPING BIT(0)
> > +
> > /*
> >
On Tue, Mar 23, 2021 at 10:34:45AM +0800, Miaohe Lin wrote:
> Hi:
> On 2021/3/23 8:48, Peter Xu wrote:
> > pte_unmap_same() will always unmap the pte pointer. After the unmap,
> > vmf->pte
> > will not be valid any more. We should clear it.
> >
> > It wa
On Tue, Mar 23, 2021 at 10:27:34AM +0200, Mike Rapoport wrote:
> On Mon, Mar 22, 2021 at 06:08:45PM -0400, Peter Xu wrote:
> > UFFD_FEATURE_THREAD_ID is supported since Linux 4.14.
> >
> > Signed-off-by: Peter Xu
> > ---
> > man2/userfaultfd.2 | 13
this page it'll skip zeroing it assuming it's a zero page.
QEMU plans to fix it using pre-faults as UFFDIO_COPY will complicate the live
snapshot framework, but UFFD_FEATURE_WP_UNALLOCATED should be more efficient.
It's just that we still needs to keep the old behavior.
I'll see whether I can prepare a patch for it shortly, with some test case too.
Thanks,
--
Peter Xu
On Mon, Mar 22, 2021 at 08:48:49PM -0400, Peter Xu wrote:
> This patchset is based on tag v5.12-rc3-mmots-2021-03-17-22-26. To run the
> selftest, need to apply the two patches to fix minor mode page leak:
>
> https://lore.kernel.org/lkml/20210322175132.36659-1-pet...@redhat.
linux/userfaultfd.h header files, because it may cause kernel header
update to easily break userspace.
Signed-off-by: Peter Xu
---
tools/testing/selftests/vm/userfaultfd.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c
b/tools/te
ze fetcher.
Signed-off-by: Peter Xu
---
mm/hugetlb.c | 29 +
1 file changed, 25 insertions(+), 4 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 448ef745d5ee..d4acf9d9d087 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5110,7 +5110,7 @@ uns
ptes, because it
has taken hugetlb fault mutex so that no concurrent page fault would trigger.
While the call to hugetlb_vmdelete_list() in hugetlbfs_punch_hole() is not
safe. That's why the previous call will be with ZAP_FLAG_DROP_FILE_UFFD_WP,
while the latter one won't be able to.
Si
he special swap
pte too just like a none pte.
Note that we also need to teach UFFDIO_COPY about this special pte across the
code path so that we can safely install a new page at this special pte as long
as we know it's a stall entry.
Signed-off-by: Peter Xu
---
fs/userfaultfd.c | 5
with
_UFFDIO_WRITEPROTECT too because all existing types now support write
protection mode.
Since vma_can_userfault() will be used elsewhere, move into userfaultfd_k.h.
Signed-off-by: Peter Xu
---
fs/userfaultfd.c | 18 --
include/linux/userfaultfd_k.h| 14 ++
in
This is to let hugetlbfs be prepared to also recognize swap special ptes just
like uffd-wp special swap ptes.
Signed-off-by: Peter Xu
---
mm/hugetlb.c | 23 +--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fd3e87517e10
This starts from passing cp_flags into hugetlb_change_protection() so hugetlb
will be able to handle MM_CP_UFFD_WP[_RESOLVE] requests.
huge_pte_clear_uffd_wp() is introduced to handle the case where the
UFFDIO_WRITEPROTECT is requested upon migrating huge page entries.
Signed-off-by: Peter Xu
it even if UFFDIO_COPY_MODE_WP is provided, so that the core mm will
know this page contains valid data and never drop it.
Signed-off-by: Peter Xu
---
include/asm-generic/hugetlb.h | 5 +
include/linux/hugetlb.h | 6 --
mm/hugetlb.c | 22 +-
mm/use
d to also
teach userfaultfd itself on either UFFDIO_COPY or handling page faults, so that
everything will still work as expected.
Signed-off-by: Peter Xu
---
fs/userfaultfd.c | 15 +++
mm/shmem.c | 13 -
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/fs/
Hook up hugetlbfs_fault() with the capability to handle userfaultfd-wp faults.
We do this slightly earlier than hugetlb_cow() so that we can avoid taking some
extra locks that we definitely don't need.
Signed-off-by: Peter Xu
---
mm/hugetlb.c | 19 +++
1 file change
1 - 100 of 1032 matches
Mail list logo