> On Jan 17, 2021, at 11:25 AM, Yu Zhao wrote:
>
> On Sun, Jan 17, 2021 at 02:13:43AM -0800, Nadav Amit wrote:
>>> On Jan 17, 2021, at 1:16 AM, Yu Zhao wrote:
>>>
>>> On Sat, Jan 16, 2021 at 11:32:22PM -0800, Nadav Amit wrote:
> On Jan 16, 2021, at 8:41 PM, Yu Zhao wrote:
>
> On T
On Sun, Jan 17, 2021 at 02:13:43AM -0800, Nadav Amit wrote:
> > On Jan 17, 2021, at 1:16 AM, Yu Zhao wrote:
> >
> > On Sat, Jan 16, 2021 at 11:32:22PM -0800, Nadav Amit wrote:
> >>> On Jan 16, 2021, at 8:41 PM, Yu Zhao wrote:
> >>>
> >>> On Tue, Jan 12, 2021 at 09:43:38PM +, Will Deacon wro
> On Jan 17, 2021, at 1:16 AM, Yu Zhao wrote:
>
> On Sat, Jan 16, 2021 at 11:32:22PM -0800, Nadav Amit wrote:
>>> On Jan 16, 2021, at 8:41 PM, Yu Zhao wrote:
>>>
>>> On Tue, Jan 12, 2021 at 09:43:38PM +, Will Deacon wrote:
On Tue, Jan 12, 2021 at 12:38:34PM -0800, Nadav Amit wrote:
>>>
On Sat, Jan 16, 2021 at 11:32:22PM -0800, Nadav Amit wrote:
> > On Jan 16, 2021, at 8:41 PM, Yu Zhao wrote:
> >
> > On Tue, Jan 12, 2021 at 09:43:38PM +, Will Deacon wrote:
> >> On Tue, Jan 12, 2021 at 12:38:34PM -0800, Nadav Amit wrote:
> On Jan 12, 2021, at 11:56 AM, Yu Zhao wrote:
>
> On Jan 16, 2021, at 8:41 PM, Yu Zhao wrote:
>
> On Tue, Jan 12, 2021 at 09:43:38PM +, Will Deacon wrote:
>> On Tue, Jan 12, 2021 at 12:38:34PM -0800, Nadav Amit wrote:
On Jan 12, 2021, at 11:56 AM, Yu Zhao wrote:
On Tue, Jan 12, 2021 at 11:15:43AM -0800, Nadav Amit wrote:
> I
On Tue, Jan 12, 2021 at 09:43:38PM +, Will Deacon wrote:
> On Tue, Jan 12, 2021 at 12:38:34PM -0800, Nadav Amit wrote:
> > > On Jan 12, 2021, at 11:56 AM, Yu Zhao wrote:
> > > On Tue, Jan 12, 2021 at 11:15:43AM -0800, Nadav Amit wrote:
> > >> I will send an RFC soon for per-table deferred TLB
> On Jan 12, 2021, at 2:29 PM, Nadav Amit wrote:
>
>
>>
>> On Jan 12, 2021, at 1:43 PM, Will Deacon wrote:
>>
>> On Tue, Jan 12, 2021 at 12:38:34PM -0800, Nadav Amit wrote:
On Jan 12, 2021, at 11:56 AM, Yu Zhao wrote:
> On Tue, Jan 12, 2021 at 11:15:43AM -0800, Nadav Amit wrote:
On Tue, Jan 12, 2021 at 02:29:51PM -0800, Nadav Amit wrote:
> > On Jan 12, 2021, at 1:43 PM, Will Deacon wrote:
> >
> > On Tue, Jan 12, 2021 at 12:38:34PM -0800, Nadav Amit wrote:
> >>> On Jan 12, 2021, at 11:56 AM, Yu Zhao wrote:
> >>> On Tue, Jan 12, 2021 at 11:15:43AM -0800, Nadav Amit wrote:
> On Jan 12, 2021, at 1:43 PM, Will Deacon wrote:
>
> On Tue, Jan 12, 2021 at 12:38:34PM -0800, Nadav Amit wrote:
>>> On Jan 12, 2021, at 11:56 AM, Yu Zhao wrote:
>>> On Tue, Jan 12, 2021 at 11:15:43AM -0800, Nadav Amit wrote:
I will send an RFC soon for per-table deferred TLB flushes track
> On Jan 12, 2021, at 11:56 AM, Yu Zhao wrote:
>
> On Tue, Jan 12, 2021 at 11:15:43AM -0800, Nadav Amit wrote:
>>> On Jan 12, 2021, at 11:02 AM, Laurent Dufour
>>> wrote:
>>>
>>> Le 12/01/2021 à 17:57, Peter Zijlstra a écrit :
On Tue, Jan 12, 2021 at 04:47:17PM +0100, Laurent Dufour wrote
On Tue, Jan 12, 2021 at 12:38:34PM -0800, Nadav Amit wrote:
> > On Jan 12, 2021, at 11:56 AM, Yu Zhao wrote:
> >
> > On Tue, Jan 12, 2021 at 11:15:43AM -0800, Nadav Amit wrote:
> >>> On Jan 12, 2021, at 11:02 AM, Laurent Dufour
> >>> wrote:
> >>>
> >>> Le 12/01/2021 à 17:57, Peter Zijlstra a é
On Tue, Jan 12, 2021 at 12:38:34PM -0800, Nadav Amit wrote:
> > On Jan 12, 2021, at 11:56 AM, Yu Zhao wrote:
> > On Tue, Jan 12, 2021 at 11:15:43AM -0800, Nadav Amit wrote:
> >> I will send an RFC soon for per-table deferred TLB flushes tracking.
> >> The basic idea is to save a generation in the
On Tue, Jan 12, 2021 at 11:15:43AM -0800, Nadav Amit wrote:
> > On Jan 12, 2021, at 11:02 AM, Laurent Dufour
> > wrote:
> >
> > Le 12/01/2021 à 17:57, Peter Zijlstra a écrit :
> >> On Tue, Jan 12, 2021 at 04:47:17PM +0100, Laurent Dufour wrote:
> >>> Le 12/01/2021 à 12:43, Vinayak Menon a écrit
> On Jan 12, 2021, at 11:02 AM, Laurent Dufour
> wrote:
>
> Le 12/01/2021 à 17:57, Peter Zijlstra a écrit :
>> On Tue, Jan 12, 2021 at 04:47:17PM +0100, Laurent Dufour wrote:
>>> Le 12/01/2021 à 12:43, Vinayak Menon a écrit :
Possibility of race against other PTE modifiers
1) For
Le 12/01/2021 à 17:57, Peter Zijlstra a écrit :
On Tue, Jan 12, 2021 at 04:47:17PM +0100, Laurent Dufour wrote:
Le 12/01/2021 à 12:43, Vinayak Menon a écrit :
Possibility of race against other PTE modifiers
1) Fork - We have seen a case of SPF racing with fork marking PTEs RO and that
is des
On Tue, Jan 12, 2021 at 04:47:17PM +0100, Laurent Dufour wrote:
> Le 12/01/2021 à 12:43, Vinayak Menon a écrit :
> > Possibility of race against other PTE modifiers
> >
> > 1) Fork - We have seen a case of SPF racing with fork marking PTEs RO and
> > that
> > is described and fixed here https://
On Tue, Jan 05, 2021 at 01:03:48PM -0500, Andrea Arcangeli wrote:
> On Tue, Jan 05, 2021 at 04:37:27PM +0100, Peter Zijlstra wrote:
> > (your other email clarified this point; the COW needs to copy while
> > holding the PTL and we need TLBI under PTL if we're to change this)
>
> The COW doesn't ne
Le 12/01/2021 à 12:43, Vinayak Menon a écrit :
On 1/5/2021 9:07 PM, Peter Zijlstra wrote:
On Mon, Dec 21, 2020 at 08:16:11PM -0800, Linus Torvalds wrote:
So I think the basic rule is that "if you hold mmap_sem for writing,
you're always safe". And that really should be considered the
"default"
On 1/5/2021 9:07 PM, Peter Zijlstra wrote:
On Mon, Dec 21, 2020 at 08:16:11PM -0800, Linus Torvalds wrote:
So I think the basic rule is that "if you hold mmap_sem for writing,
you're always safe". And that really should be considered the
"default" locking.
ANY time you make a modification to t
On Tue, Jan 05, 2021 at 04:37:27PM +0100, Peter Zijlstra wrote:
> (your other email clarified this point; the COW needs to copy while
> holding the PTL and we need TLBI under PTL if we're to change this)
The COW doesn't need to hold the PT lock, the TLBI broadcast doesn't
need to be delivered unde
On Mon, Dec 21, 2020 at 08:16:11PM -0800, Linus Torvalds wrote:
> So I think the basic rule is that "if you hold mmap_sem for writing,
> you're always safe". And that really should be considered the
> "default" locking.
>
> ANY time you make a modification to the VM layer, you should basically
>
On Thu, Dec 24, 2020 at 01:49:45PM -0500, Andrea Arcangeli wrote:
> Without the above, can't the CPU decrement the tlb_flush_pending while
> the IPI to ack didn't arrive yet in csd_lock_wait?
Ehm: csd_lock_wait has smp_acquire__after_ctrl_dep() so the write side
looks ok after all sorry.
On Wed, Dec 23, 2020 at 09:18:09PM -0800, Nadav Amit wrote:
> I am not trying to be argumentative, and I did not think through about an
> alternative solution. It sounds to me that your proposed solution is correct
> and would probably be eventually (slightly) more efficient than anything
> that I
> On Dec 23, 2020, at 8:01 PM, Andrea Arcangeli wrote:
>
>> On Wed, Dec 23, 2020 at 07:09:10PM -0800, Nadav Amit wrote:
>>> Perhaps holding some small bitmap based on part of the deferred flushed
>>> pages (e.g., bits 12-17 of the address or some other kind of a single
>>> hash-function bloom-fil
> On Dec 23, 2020, at 7:34 PM, Yu Zhao wrote:
>
> On Wed, Dec 23, 2020 at 07:09:10PM -0800, Nadav Amit wrote:
>>> On Dec 23, 2020, at 6:00 PM, Andrea Arcangeli wrote:
>>>
>>> On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote:
I don’t love this as a long term fix. AFAICT we ca
> On Wed, Dec 23, 2020 at 07:09:10PM -0800, Nadav Amit wrote:
> > I think there are other cases in which Andy’s concern is relevant
> > (MADV_PAGEOUT).
I didn't try to figure how it would help MADV_PAGEOUT. MADV_PAGEOUT
sounds cool feature, but maybe it'd need a way to flush the
invalidates out an
On Wed, Dec 23, 2020 at 07:09:10PM -0800, Nadav Amit wrote:
> > On Dec 23, 2020, at 6:00 PM, Andrea Arcangeli wrote:
> >
> > On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote:
> >> I don’t love this as a long term fix. AFAICT we can have
> >> mm_tlb_flush_pending set for quite a wh
On Wed, Dec 23, 2020 at 09:00:26PM -0500, Andrea Arcangeli wrote:
> One other near zero cost improvement easy to add if this would be "if
> (vma->vm_flags & (VM_SOFTDIRTY|VM_UFFD_WP))" and it could be made
The next worry then is if UFFDIO_WRITEPROTECT is very large then there
would be a flood of w
> On Dec 23, 2020, at 7:09 PM, Nadav Amit wrote:
>
>> On Dec 23, 2020, at 6:00 PM, Andrea Arcangeli wrote:
>>
>> On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote:
>>> I don’t love this as a long term fix. AFAICT we can have
>>> mm_tlb_flush_pending set for quite a while — mprote
> On Dec 23, 2020, at 6:00 PM, Andrea Arcangeli wrote:
>
> On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote:
>> I don’t love this as a long term fix. AFAICT we can have
>> mm_tlb_flush_pending set for quite a while — mprotect seems like it can wait
>> in IO while splitting a huge
On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote:
> I don’t love this as a long term fix. AFAICT we can have mm_tlb_flush_pending
> set for quite a while — mprotect seems like it can wait in IO while splitting
> a huge page, for example. That gives us a window in which every write
> On Dec 23, 2020, at 2:29 PM, Yu Zhao wrote:
>
>
> I was hesitant to suggest the following because it isn't that straight
> forward. But since you seem to be less concerned with the complexity,
> I'll just bring it on the table -- it would take care of both ufd and
> clear_refs_write, wouldn
Hello Linus,
On Wed, Dec 23, 2020 at 03:39:53PM -0800, Linus Torvalds wrote:
> On Wed, Dec 23, 2020 at 1:39 PM Andrea Arcangeli wrote:
> >
> > On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote:
> > > Thanks for the details.
> >
> > I hope we can find a way put the page_mapcount back where t
On Wed, Dec 23, 2020 at 02:45:59PM -0800, Nadav Amit wrote:
> I think it may be reasonable.
Whatever solution used, there will be 2 users of it: uffd-wp will use
whatever technique used by clear_refs_write to avoid the
mmap_write_lock.
My favorite is Yu's patch and not the group lock anymore. The
On Wed, Dec 23, 2020 at 1:39 PM Andrea Arcangeli wrote:
>
> On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote:
> > Thanks for the details.
>
> I hope we can find a way put the page_mapcount back where there's a
> page_count right now.
I really don't think that's ever going to happen - at le
On Wed, Dec 23, 2020 at 03:29:51PM -0700, Yu Zhao wrote:
> I was hesitant to suggest the following because it isn't that straight
> forward. But since you seem to be less concerned with the complexity,
> I'll just bring it on the table -- it would take care of both ufd and
> clear_refs_write, would
> On Dec 23, 2020, at 2:05 PM, Andrea Arcangeli wrote:
>
> On Tue, Dec 22, 2020 at 04:40:32AM -0800, Nadav Amit wrote:
>>> On Dec 21, 2020, at 1:24 PM, Yu Zhao wrote:
>>>
>>> On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds wrote:
On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit wrote:
On Wed, Dec 23, 2020 at 04:39:00PM -0500, Andrea Arcangeli wrote:
> On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote:
> > Thanks for the details.
>
> I hope we can find a way put the page_mapcount back where there's a
> page_count right now.
>
> If you're so worried about having to maintai
On Tue, Dec 22, 2020 at 04:40:32AM -0800, Nadav Amit wrote:
> > On Dec 21, 2020, at 1:24 PM, Yu Zhao wrote:
> >
> > On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds wrote:
> >> On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit wrote:
> >>> Using mmap_write_lock() was my initial fix and there w
On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote:
> Thanks for the details.
I hope we can find a way put the page_mapcount back where there's a
page_count right now.
If you're so worried about having to maintain a all defined well
documented (or to be documented even better if you ACK it)
On Wed, Dec 23, 2020 at 10:52:35AM -0500, Peter Xu wrote:
> On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote:
> > In your patch, do we need to take wrprotect_rwsem in
> > handle_userfault() as well? Otherwise, it seems userspace would have
> > to synchronize between its wrprotect ioctl and f
On Wed, Dec 23, 2020 at 12:12:23PM -0700, Yu Zhao wrote:
> On Wed, Dec 23, 2020 at 11:24:16AM -0500, Peter Xu wrote:
> > On Wed, Dec 23, 2020 at 03:06:30AM -0700, Yu Zhao wrote:
> > > On Wed, Dec 23, 2020 at 01:44:42AM -0800, Linus Torvalds wrote:
> > > > On Tue, Dec 22, 2020 at 4:01 PM Linus Torva
On Wed, Dec 23, 2020 at 11:24:16AM -0500, Peter Xu wrote:
> On Wed, Dec 23, 2020 at 03:06:30AM -0700, Yu Zhao wrote:
> > On Wed, Dec 23, 2020 at 01:44:42AM -0800, Linus Torvalds wrote:
> > > On Tue, Dec 22, 2020 at 4:01 PM Linus Torvalds
> > > wrote:
> > > >
> > > > The more I look at the mprotect
> On Dec 23, 2020, at 8:23 AM, Will Deacon wrote:
>
> On Tue, Dec 22, 2020 at 11:20:21AM -0800, Nadav Amit wrote:
>>> On Dec 22, 2020, at 10:30 AM, Yu Zhao wrote:
>>>
>>> On Tue, Dec 22, 2020 at 04:40:32AM -0800, Nadav Amit wrote:
> On Dec 21, 2020, at 1:24 PM, Yu Zhao wrote:
>
>
On Wed, Dec 23, 2020 at 01:51:59PM -0500, Andrea Arcangeli wrote:
> NOTE: about the above comment, that mprotect takes
> mmap_read_lock. Your above code change in the commit above, still has
write
Correction to avoid any confusion.
On Wed, Dec 23, 2020 at 11:24:16AM -0500, Peter Xu wrote:
> I think this is not against Linus's example - where cpu2 does not have tlb
> cached so it sees RO while cpu3 does have tlb cached so cpu3 can still modify
> it. So IMHO there's no problem here.
>
> But I do think in step 2 here we overlo
On Wed, Dec 23, 2020 at 03:06:30AM -0700, Yu Zhao wrote:
> On Wed, Dec 23, 2020 at 01:44:42AM -0800, Linus Torvalds wrote:
> > On Tue, Dec 22, 2020 at 4:01 PM Linus Torvalds
> > wrote:
> > >
> > > The more I look at the mprotect code, the less I like it. We seem to
> > > be much better about the T
On Tue, Dec 22, 2020 at 11:20:21AM -0800, Nadav Amit wrote:
> > On Dec 22, 2020, at 10:30 AM, Yu Zhao wrote:
> >
> > On Tue, Dec 22, 2020 at 04:40:32AM -0800, Nadav Amit wrote:
> >>> On Dec 21, 2020, at 1:24 PM, Yu Zhao wrote:
> >>>
> >>> On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds
On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote:
> In your patch, do we need to take wrprotect_rwsem in
> handle_userfault() as well? Otherwise, it seems userspace would have
> to synchronize between its wrprotect ioctl and fault handler? i.e.,
> the fault hander needs to be aware that the
On Wed, Dec 23, 2020 at 01:44:42AM -0800, Linus Torvalds wrote:
> On Tue, Dec 22, 2020 at 4:01 PM Linus Torvalds
> wrote:
> >
> > The more I look at the mprotect code, the less I like it. We seem to
> > be much better about the TLB flushes in other places (looking at
> > mremap, for example). The
On Tue, Dec 22, 2020 at 4:01 PM Linus Torvalds
wrote:
>
> The more I look at the mprotect code, the less I like it. We seem to
> be much better about the TLB flushes in other places (looking at
> mremap, for example). The mprotect code seems to be very laissez-faire
> about the TLB flushing.
No,
On Tue, Dec 22, 2020 at 09:56:11PM -0500, Andrea Arcangeli wrote:
> On Tue, Dec 22, 2020 at 04:39:46PM -0700, Yu Zhao wrote:
> > We are talking about non-COW anon pages here -- they can't be mapped
> > more than once. So why not just identify them by checking
> > page_mapcount == 1 and then uncondi
On Tue, Dec 22, 2020 at 05:23:39PM -0700, Yu Zhao wrote:
> and 2) people are spearheading multiple efforts to reduce the mmap_lock
> contention, which hopefully would make ufd users suffer less soon.
In my view UFFD is an already deployed working solution that
eliminates the mmap_lock_write conten
On Tue, Dec 22, 2020 at 04:39:46PM -0700, Yu Zhao wrote:
> We are talking about non-COW anon pages here -- they can't be mapped
> more than once. So why not just identify them by checking
> page_mapcount == 1 and then unconditionally reuse them? (This is
> probably where I've missed things.)
The p
On Tue, Dec 22, 2020 at 3:50 PM Linus Torvalds
wrote:
>
> The rule is that the TLB flush has to be done before the page table
> lock is released.
I take that back. I guess it's ok as long as the mmap_sem is held for
writing. Then the TLB flush can be delayed until just before releasing
the mmap_s
On Tue, Dec 22, 2020 at 04:01:45PM -0800, Linus Torvalds wrote:
> On Tue, Dec 22, 2020 at 3:50 PM Linus Torvalds
> wrote:
> >
> > See zap_pte_range() for an example of doing it right, even in the
> > presence of complexities (ie that has an example of both flushing the
> > TLB, and doing the actua
On Tue, Dec 22, 2020 at 3:50 PM Linus Torvalds
wrote:
>
> See zap_pte_range() for an example of doing it right, even in the
> presence of complexities (ie that has an example of both flushing the
> TLB, and doing the actual "free the pages after flush", and it does
> the two cases separately).
Th
On Tue, Dec 22, 2020 at 3:39 PM Yu Zhao wrote:
>
> 2) is the false positive because of what we do, and it's causing the
> memory corruption because do_wp_page() tries to make copies of pages
> that seem to be RO but may have stale RW tlb entries pending flush.
Yeah, that's definitely a different
On Tue, Dec 22, 2020 at 05:02:37PM -0500, Andrea Arcangeli wrote:
> On Tue, Dec 22, 2020 at 02:14:41PM -0700, Yu Zhao wrote:
> > This works but I don't prefer this option because 1) this is new
> > way of making pte_wrprotect safe and 2) it's specific to ufd and
> > can't be applied to clear_soft_d
On Tue, Dec 22, 2020 at 02:14:41PM -0700, Yu Zhao wrote:
> This works but I don't prefer this option because 1) this is new
> way of making pte_wrprotect safe and 2) it's specific to ufd and
> can't be applied to clear_soft_dirty() which has no "catcher". No
I didn't look into clear_soft_dirty iss
On Tue, Dec 22, 2020 at 12:58:18PM -0800, Nadav Amit wrote:
> I had somewhat similar ideas - saving in each page-struct the generation,
> which would allow to: (1) extend pte_same() to detect interim changes
> that were reverted (RO->RW->RO) and (2) per-PTE pending flushes.
What don't you feel saf
On Tue, Dec 22, 2020 at 12:19:49PM -0800, Nadav Amit wrote:
> Perhaps any change to PTE in a page-table should increase a page-table
> generation that we would save in the page-table page-struct (next to the
The current rule is that by the time in the page fault we find a
pte/hugepmd in certain st
On Tue, Dec 22, 2020 at 02:31:52PM -0500, Andrea Arcangeli wrote:
> On Mon, Dec 21, 2020 at 07:19:35PM -0800, Andy Lutomirski wrote:
> > instance of this (with mmap_sem held for write) in x86:
> > mark_screen_rdonly(). Dare I ask how broken this is? We could likely
>
> That one is buggy despite
> On Dec 22, 2020, at 11:31 AM, Andrea Arcangeli wrote:
>
> From 4ace4d1b53f5cb3b22a5c2dc33facc4150b112d6 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli
> Date: Tue, 22 Dec 2020 14:30:16 -0500
> Subject: [PATCH 1/1] mm: userfaultfd: avoid leaving stale TLB after
> userfaultfd_writeprotect()
>
> On Dec 22, 2020, at 12:34 PM, Andy Lutomirski wrote:
>
> On Sat, Dec 19, 2020 at 2:06 PM Nadav Amit wrote:
>>> [ I have in mind another solution, such as keeping in each page-table a
>>> “table-generation” which is the mm-generation at the time of the change,
>>> and only flush if “table-gener
On Sat, Dec 19, 2020 at 2:06 PM Nadav Amit wrote:
> > [ I have in mind another solution, such as keeping in each page-table a
> > “table-generation” which is the mm-generation at the time of the change,
> > and only flush if “table-generation”==“mm-generation”, but it requires
> > some thought on
On Tue, Dec 22, 2020 at 08:15:53PM +, Matthew Wilcox wrote:
> On Tue, Dec 22, 2020 at 02:31:52PM -0500, Andrea Arcangeli wrote:
> > My previous suggestion to use a mutex to serialize
> > userfaultfd_writeprotect with a mutex will still work, but we can run
> > as many wrprotect and un-wrprotect
> On Dec 22, 2020, at 11:44 AM, Andrea Arcangeli wrote:
>
> On Mon, Dec 21, 2020 at 02:55:12PM -0800, Nadav Amit wrote:
>> wouldn’t mmap_write_downgrade() be executed before mprotect_fixup() (so
>
> I assume you mean "in" mprotect_fixup, after change_protection.
>
> If you would downgrade the m
On Mon, Dec 21, 2020 at 8:16 PM Linus Torvalds
wrote:
>
> On Mon, Dec 21, 2020 at 7:19 PM Andy Lutomirski wrote:
> >
> > Ugh, this is unpleasantly complicated.
>
> What I *should* have said is that *because* userfaultfd is changing
> the VM layout, it should always act as if it had to take the mm
On Tue, Dec 22, 2020 at 02:31:52PM -0500, Andrea Arcangeli wrote:
> My previous suggestion to use a mutex to serialize
> userfaultfd_writeprotect with a mutex will still work, but we can run
> as many wrprotect and un-wrprotect as we want in parallel, as long as
> they're not simultaneous, we can d
On Mon, Dec 21, 2020 at 02:55:12PM -0800, Nadav Amit wrote:
> wouldn’t mmap_write_downgrade() be executed before mprotect_fixup() (so
I assume you mean "in" mprotect_fixup, after change_protection.
If you would downgrade the mmap_lock to read there, then it'd severely
slowdown the non contention
On Mon, Dec 21, 2020 at 07:19:35PM -0800, Andy Lutomirski wrote:
> instance of this (with mmap_sem held for write) in x86:
> mark_screen_rdonly(). Dare I ask how broken this is? We could likely
That one is buggy despite it takes the mmap_write_lock... inverting
the last two lines would fix it th
> On Dec 22, 2020, at 10:30 AM, Yu Zhao wrote:
>
> On Tue, Dec 22, 2020 at 04:40:32AM -0800, Nadav Amit wrote:
>>> On Dec 21, 2020, at 1:24 PM, Yu Zhao wrote:
>>>
>>> On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds wrote:
On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit wrote:
> U
On Tue, Dec 22, 2020 at 04:40:32AM -0800, Nadav Amit wrote:
> > On Dec 21, 2020, at 1:24 PM, Yu Zhao wrote:
> >
> > On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds wrote:
> >> On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit wrote:
> >>> Using mmap_write_lock() was my initial fix and there w
> On Dec 21, 2020, at 1:24 PM, Yu Zhao wrote:
>
> On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds wrote:
>> On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit wrote:
>>> Using mmap_write_lock() was my initial fix and there was a strong pushback
>>> on this approach due to its potential impact
> On Dec 21, 2020, at 7:19 PM, Andy Lutomirski wrote:
>
> On Mon, Dec 21, 2020 at 3:22 PM Linus Torvalds
> wrote:
>> On Mon, Dec 21, 2020 at 2:30 PM Peter Xu wrote:
>>> AFAIU mprotect() is the only one who modifies the pte using the mmap write
>>> lock. NUMA balancing is also using read mmap l
On Mon, Dec 21, 2020 at 7:19 PM Andy Lutomirski wrote:
>
> Ugh, this is unpleasantly complicated.
I probably should have phrased it differently, because the case you
quote is actually a good one:
> I will admit that any API that
> takes an address and more-or-less-blindly marks it RO makes me q
On Mon, Dec 21, 2020 at 3:22 PM Linus Torvalds
wrote:
>
> On Mon, Dec 21, 2020 at 2:30 PM Peter Xu wrote:
> >
> > AFAIU mprotect() is the only one who modifies the pte using the mmap write
> > lock. NUMA balancing is also using read mmap lock when changing pte
> > protections, while my understan
On Mon, Dec 21, 2020 at 04:11:01PM -0800, Linus Torvalds wrote:
> On Mon, Dec 21, 2020 at 4:00 PM Yu Zhao wrote:
> >
> > My first instinct is to be conservative and revert 09854ba94c6a ("mm:
> > do_wp_page() simplification") so people are less likely to come back
> > and complain about performance
On Mon, Dec 21, 2020 at 4:00 PM Yu Zhao wrote:
>
> My first instinct is to be conservative and revert 09854ba94c6a ("mm:
> do_wp_page() simplification") so people are less likely to come back
> and complain about performance issues from holding mmap lock for
> write when clearing pte_write.
Well,
On Mon, Dec 21, 2020 at 03:33:30PM -0800, Linus Torvalds wrote:
> On Mon, Dec 21, 2020 at 3:12 PM Yu Zhao wrote:
> >
> > I can't say I disagree with you but the man has made the call and I
> > think we should just move on.
>
> "The man" can always be convinced by numbers.
>
> So if somebody come
> On Dec 21, 2020, at 3:30 PM, Linus Torvalds
> wrote:
>
> On Mon, Dec 21, 2020 at 2:55 PM Nadav Amit wrote:
>> So as an alternative solution, I can do copying under the PTL after
>> flushing, which seems to solve the problem.
>
> ...
> Note that the "Re-validate under PTL" code in cow_user_pa
On Mon, Dec 21, 2020 at 2:55 PM Nadav Amit wrote:
>
> So as an alternative solution, I can do copying under the PTL after
> flushing, which seems to solve the problem.
I think that's a valid model, but note that we do the "re-check ptl"
in a (*completely(* different part than we do the actual PTE
On Mon, Dec 21, 2020 at 3:12 PM Yu Zhao wrote:
>
> I can't say I disagree with you but the man has made the call and I
> think we should just move on.
"The man" can always be convinced by numbers.
So if somebody comes up with an alternate patch, and explains it, and
shows that it is better - go
On Mon, Dec 21, 2020 at 2:30 PM Peter Xu wrote:
>
> AFAIU mprotect() is the only one who modifies the pte using the mmap write
> lock. NUMA balancing is also using read mmap lock when changing pte
> protections, while my understanding is mprotect() used write lock only because
> it manipulates th
On Mon, Dec 21, 2020 at 1:55 PM Peter Xu wrote:
>
> Frankly speaking I don't know why it's always safe to do data copy without the
> pgtable lock in wp_page_copy(), since I don't know what guaranteed us from
> data
> changing on the original page due to any reason.
So the reason it should be saf
On Mon, Dec 21, 2020 at 05:30:41PM -0500, Peter Xu wrote:
> On Mon, Dec 21, 2020 at 01:49:55PM -0800, Nadav Amit wrote:
> > BTW: In general, I think that you are right, and that changing of PTEs
> > should not require taking mmap_lock for write. However, I am not sure
> > cow_user_page() is not the
> On Dec 21, 2020, at 2:30 PM, Peter Xu wrote:
>
> On Mon, Dec 21, 2020 at 01:49:55PM -0800, Nadav Amit wrote:
>> BTW: In general, I think that you are right, and that changing of PTEs
>> should not require taking mmap_lock for write. However, I am not sure
>> cow_user_page() is not the only one
On Mon, Dec 21, 2020 at 01:49:55PM -0800, Nadav Amit wrote:
> BTW: In general, I think that you are right, and that changing of PTEs
> should not require taking mmap_lock for write. However, I am not sure
> cow_user_page() is not the only one that poses a problem and whether a more
> systematic sol
On Mon, Dec 21, 2020 at 12:23:06PM -0800, Nadav Amit wrote:
> 2. Copy the page in cow_user_page() while holding the PTL and after flushing
>has been done. I am not sure if there are potential problems with
>special-pages (2 flushes might be necessary for special pages).
This seems to be a
> On Dec 21, 2020, at 1:24 PM, Yu Zhao wrote:
>
> On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds wrote:
>> On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit wrote:
>>> Using mmap_write_lock() was my initial fix and there was a strong pushback
>>> on this approach due to its potential impact
On Mon, Dec 21, 2020 at 12:26:22PM -0800, Linus Torvalds wrote:
> On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit wrote:
> >
> > Using mmap_write_lock() was my initial fix and there was a strong pushback
> > on this approach due to its potential impact on performance.
>
> From whom?
>
> Somebody who
On Mon, Dec 21, 2020 at 12:23 PM Nadav Amit wrote:
>
> Using mmap_write_lock() was my initial fix and there was a strong pushback
> on this approach due to its potential impact on performance.
>From whom?
Somebody who doesn't understand that correctness is more important
than performance? And th
On Mon, Dec 21, 2020 at 12:21 PM Yu Zhao wrote:
>
> Well, unfortunately we have places that use optimizations like
>
> inc_tlb_flush_pending()
> lock page table
> pte_wrprotect
> flush_tlb_range()
> dec_tlb_flush_pending()
>
> which complicate things.
My point is, none of that ma
> On Dec 21, 2020, at 11:55 AM, Linus Torvalds
> wrote:
>
> On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao wrote:
>> Nadav Amit found memory corruptions when running userfaultfd test above.
>> It seems to me the problem is related to commit 09854ba94c6a ("mm:
>> do_wp_page() simplification"). Can you
On Mon, Dec 21, 2020 at 11:55:02AM -0800, Linus Torvalds wrote:
> On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao wrote:
> >
> > Nadav Amit found memory corruptions when running userfaultfd test above.
> > It seems to me the problem is related to commit 09854ba94c6a ("mm:
> > do_wp_page() simplification"
On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao wrote:
>
> Nadav Amit found memory corruptions when running userfaultfd test above.
> It seems to me the problem is related to commit 09854ba94c6a ("mm:
> do_wp_page() simplification"). Can you please take a look? Thanks.
>
> TL;DR: it may not safe to make
On Mon, Dec 21, 2020 at 10:31:57AM -0800, Nadav Amit wrote:
> > On Dec 21, 2020, at 9:27 AM, Peter Xu wrote:
> >
> > Hi, Nadav,
> >
> > On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote:
> >
> > [...]
> >
> >> So to correct myself, I think that what I really encountered was actually
>
On Mon, Dec 21, 2020 at 10:31:57AM -0800, Nadav Amit wrote:
> > On Dec 21, 2020, at 9:27 AM, Peter Xu wrote:
> >
> > Hi, Nadav,
> >
> > On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote:
> >
> > [...]
> >
> >> So to correct myself, I think that what I really encountered was actually
>
> On Dec 21, 2020, at 9:27 AM, Peter Xu wrote:
>
> Hi, Nadav,
>
> On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote:
>
> [...]
>
>> So to correct myself, I think that what I really encountered was actually
>> during MM_CP_UFFD_WP_RESOLVE (i.e., when the protection is removed). The
>>
1 - 100 of 120 matches
Mail list logo