Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out

2016-09-13 Thread Andrea Arcangeli
Hello, On Tue, Sep 13, 2016 at 04:53:49PM +0800, Huang, Ying wrote: > I am glad to discuss my final goal, that is, swapping out/in the full > THP without splitting. Why I want to do that is copied as below, I think that is a fine objective. It wasn't implemented initially just to keep things

Re: [PATCH] mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin()

2016-09-07 Thread Andrea Arcangeli
ing mapped ptes. > > Signed-off-by: Ebru Akagunduz <ebru.akagun...@gmail.com> > Suggested-by: Andrea Arcangeli <aarca...@redhat.com> > --- > mm/khugepaged.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) Reviewed-by: Andrea Arcangeli <aarca...@redhat.com>

Re: [PATCH] mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin()

2016-09-07 Thread Andrea Arcangeli
ing mapped ptes. > > Signed-off-by: Ebru Akagunduz > Suggested-by: Andrea Arcangeli > --- > mm/khugepaged.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) Reviewed-by: Andrea Arcangeli

Re: mm: use-after-free in collapse_huge_page

2016-09-07 Thread Andrea Arcangeli
gt; > --- > mm/khugepaged.c | 15 --- > 1 file changed, 8 insertions(+), 7 deletions(-) Reviewed-by: Andrea Arcangeli <aarca...@redhat.com>

Re: mm: use-after-free in collapse_huge_page

2016-09-07 Thread Andrea Arcangeli
lock was dropped. > > [1] > http://lkml.kernel.org/r/cact4y+z3gigbvhca9krjfcjx0g70v_nrhbwkbu+ygoesbdk...@mail.gmail.com > > Signed-off-by: Kirill A. Shutemov > Reported-by: Dmitry Vyukov > --- > mm/khugepaged.c | 15 ++++--- > 1 file changed, 8 insertions(+), 7 deletions(-) Reviewed-by: Andrea Arcangeli

Re: mm: use-after-free in collapse_huge_page

2016-08-29 Thread Andrea Arcangeli
Hello Kirill, On Mon, Aug 29, 2016 at 03:42:33PM +0300, Kirill A. Shutemov wrote: > @@ -898,13 +899,13 @@ static bool __collapse_huge_page_swapin(struct > mm_struct *mm, > /* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */ > if (ret & VM_FAULT_RETRY) { >

Re: mm: use-after-free in collapse_huge_page

2016-08-29 Thread Andrea Arcangeli
Hello Kirill, On Mon, Aug 29, 2016 at 03:42:33PM +0300, Kirill A. Shutemov wrote: > @@ -898,13 +899,13 @@ static bool __collapse_huge_page_swapin(struct > mm_struct *mm, > /* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */ > if (ret & VM_FAULT_RETRY) { >

Re: [PATCH 00/34] Move LRU page reclaim from zones to nodes v9

2016-08-19 Thread Andrea Arcangeli
On Fri, Aug 19, 2016 at 03:53:59PM +0100, Mel Gorman wrote: > Compaction is not the same as LRU management. Sure but compaction is invoked by reclaim and if reclaim is node-wide, it makes more sense if compaction would be node-wide as well. Otherwise what you compact? Just the higher zone, or

Re: [PATCH 00/34] Move LRU page reclaim from zones to nodes v9

2016-08-19 Thread Andrea Arcangeli
On Fri, Aug 19, 2016 at 03:53:59PM +0100, Mel Gorman wrote: > Compaction is not the same as LRU management. Sure but compaction is invoked by reclaim and if reclaim is node-wide, it makes more sense if compaction would be node-wide as well. Otherwise what you compact? Just the higher zone, or

Re: [PATCH 00/34] Move LRU page reclaim from zones to nodes v9

2016-08-19 Thread Andrea Arcangeli
On Fri, Aug 19, 2016 at 03:23:20PM +0200, Vlastimil Babka wrote: > What's that? Never head of this before, but sounds scary :) I thought > that zone_reclaim itself was rather discouraged nowadays, not a big > candidate for further improvement.,, It's some fix that I tried to push upstream but

Re: [PATCH 00/34] Move LRU page reclaim from zones to nodes v9

2016-08-19 Thread Andrea Arcangeli
On Fri, Aug 19, 2016 at 03:23:20PM +0200, Vlastimil Babka wrote: > What's that? Never head of this before, but sounds scary :) I thought > that zone_reclaim itself was rather discouraged nowadays, not a big > candidate for further improvement.,, It's some fix that I tried to push upstream but

Re: [PATCH 00/34] Move LRU page reclaim from zones to nodes v9

2016-08-19 Thread Andrea Arcangeli
Hello Mel, On Fri, Jul 08, 2016 at 10:34:36AM +0100, Mel Gorman wrote: > Minor changes this time > > Changelog since v8 > This is the latest version of a series that moves LRUs from the zones to I'm afraid this is a bit incomplete... I had troubles in rebasing the compaction-enabled

Re: [PATCH 00/34] Move LRU page reclaim from zones to nodes v9

2016-08-19 Thread Andrea Arcangeli
Hello Mel, On Fri, Jul 08, 2016 at 10:34:36AM +0100, Mel Gorman wrote: > Minor changes this time > > Changelog since v8 > This is the latest version of a series that moves LRUs from the zones to I'm afraid this is a bit incomplete... I had troubles in rebasing the compaction-enabled

Re: [PATCH 0/7] userfaultfd: add support for shared memory

2016-08-04 Thread Andrea Arcangeli
Hi Mike, On Thu, Aug 04, 2016 at 11:14:11AM +0300, Mike Rapoport wrote: > These patches enable userfaultfd support for shared memory mappings. The > VMAs backed with shmem/tmpfs can be registered with userfaultfd which > allows management of page faults in these areas by userland. > > This patch

Re: [PATCH 0/7] userfaultfd: add support for shared memory

2016-08-04 Thread Andrea Arcangeli
Hi Mike, On Thu, Aug 04, 2016 at 11:14:11AM +0300, Mike Rapoport wrote: > These patches enable userfaultfd support for shared memory mappings. The > VMAs backed with shmem/tmpfs can be registered with userfaultfd which > allows management of page faults in these areas by userland. > > This patch

Re: [PATCHv1, RFC 00/33] ext4: support of huge pages

2016-07-27 Thread Andrea Arcangeli
Hello, On Wed, Jul 27, 2016 at 01:33:35PM +0300, Kirill A. Shutemov wrote: > I guess you can get work 64k blocks with 4k pages if you *always* allocate > order-4 pages for page cache of the filesystem. But I don't think it's > sustainable. It's significant pressure on buddy allocator and

Re: [PATCHv1, RFC 00/33] ext4: support of huge pages

2016-07-27 Thread Andrea Arcangeli
Hello, On Wed, Jul 27, 2016 at 01:33:35PM +0300, Kirill A. Shutemov wrote: > I guess you can get work 64k blocks with 4k pages if you *always* allocate > order-4 pages for page cache of the filesystem. But I don't think it's > sustainable. It's significant pressure on buddy allocator and

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Andrea Arcangeli
Hello Michal, CC'ed Hugh, On Fri, Jun 03, 2016 at 04:46:00PM +0200, Michal Hocko wrote: > What do you think about the external dependencies mentioned above. Do > you think this is a sufficient argument wrt. occasional higher > latencies? It's a tradeoff and both latencies would be short and

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Andrea Arcangeli
Hello Michal, CC'ed Hugh, On Fri, Jun 03, 2016 at 04:46:00PM +0200, Michal Hocko wrote: > What do you think about the external dependencies mentioned above. Do > you think this is a sufficient argument wrt. occasional higher > latencies? It's a tradeoff and both latencies would be short and

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Andrea Arcangeli
On Thu, Jun 02, 2016 at 02:21:10PM +0200, Michal Hocko wrote: > Testing with the patch makes some sense as well, but I would like to > hear from Andrea whether the approach is good because I am wondering why > he hasn't done that before - it feels so much simpler than the current > code. The

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Andrea Arcangeli
On Thu, Jun 02, 2016 at 02:21:10PM +0200, Michal Hocko wrote: > Testing with the patch makes some sense as well, but I would like to > hear from Andrea whether the approach is good because I am wondering why > he hasn't done that before - it feels so much simpler than the current > code. The

Re: v4.6 kernel BUG at mm/rmap.c:1101!

2016-05-24 Thread Andrea Arcangeli
On Tue, May 24, 2016 at 11:12:23AM +0300, Mika Westerberg wrote: > Hmm, the kernel shipped with Fedora 23 has that enabled: > > lahna % grep CONFIG_DEBUG_VM /boot/config-4.4.9-300.fc23.x86_64 > CONFIG_DEBUG_VM=y > # CONFIG_DEBUG_VM_VMACACHE is not set > # CONFIG_DEBUG_VM_RB is not set Yes, it

Re: v4.6 kernel BUG at mm/rmap.c:1101!

2016-05-24 Thread Andrea Arcangeli
On Tue, May 24, 2016 at 11:12:23AM +0300, Mika Westerberg wrote: > Hmm, the kernel shipped with Fedora 23 has that enabled: > > lahna % grep CONFIG_DEBUG_VM /boot/config-4.4.9-300.fc23.x86_64 > CONFIG_DEBUG_VM=y > # CONFIG_DEBUG_VM_VMACACHE is not set > # CONFIG_DEBUG_VM_RB is not set Yes, it

Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem

2016-05-23 Thread Andrea Arcangeli
On Tue, May 24, 2016 at 12:49:42AM +0300, Kirill A. Shutemov wrote: > That's what we do now and that's not enough. > > We would need to serialize against pmd_lock() during normal page-fault > path (and other pte manipulation), which we don't do now if pmd points to > page table. Yes, mmap_sem

Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem

2016-05-23 Thread Andrea Arcangeli
On Tue, May 24, 2016 at 12:49:42AM +0300, Kirill A. Shutemov wrote: > That's what we do now and that's not enough. > > We would need to serialize against pmd_lock() during normal page-fault > path (and other pte manipulation), which we don't do now if pmd points to > page table. Yes, mmap_sem

Re: v4.6 kernel BUG at mm/rmap.c:1101!

2016-05-23 Thread Andrea Arcangeli
gt; > Note that we use address only in CONFIG_DEBUG_VM=y case and the bug is not > visible on production kernels with the option disabled. > > diff --git a/mm/rmap.c b/mm/rmap.c > index 8a839935b18c..0ea5d9071b32 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1098,6 +1098,8 @@ void page_move_anon_rmap(struct page *page, > > VM_BUG_ON_PAGE(!PageLocked(page), page); > VM_BUG_ON_VMA(!anon_vma, vma); > + if (IS_ENABLED(CONFIG_DEBUG_VM) && PageTransHuge(page)) > + address &= HPAGE_PMD_MASK; > VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page); > > anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON; Reviewed-by: Andrea Arcangeli <aarca...@redhat.com> Just sent a patch doing the exact same thing just emebedded in the VM_BUG_ON_PAGE, either version is fine with me.

Re: v4.6 kernel BUG at mm/rmap.c:1101!

2016-05-23 Thread Andrea Arcangeli
gt; > Note that we use address only in CONFIG_DEBUG_VM=y case and the bug is not > visible on production kernels with the option disabled. > > diff --git a/mm/rmap.c b/mm/rmap.c > index 8a839935b18c..0ea5d9071b32 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1098,6 +1098,8 @@ void page_move_anon_rmap(struct page *page, > > VM_BUG_ON_PAGE(!PageLocked(page), page); > VM_BUG_ON_VMA(!anon_vma, vma); > + if (IS_ENABLED(CONFIG_DEBUG_VM) && PageTransHuge(page)) > + address &= HPAGE_PMD_MASK; > VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page); > > anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON; Reviewed-by: Andrea Arcangeli Just sent a patch doing the exact same thing just emebedded in the VM_BUG_ON_PAGE, either version is fine with me.

Re: v4.6 kernel BUG at mm/rmap.c:1101!

2016-05-23 Thread Andrea Arcangeli
test this to shut off the false positive? >From 4db87e3e44837a0b038e58eaa3fea29db84723ec Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli <aarca...@redhat.com> Date: Mon, 23 May 2016 17:03:57 +0200 Subject: [PATCH 1/1] mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap() If the page_move_anon_rm

Re: v4.6 kernel BUG at mm/rmap.c:1101!

2016-05-23 Thread Andrea Arcangeli
test this to shut off the false positive? >From 4db87e3e44837a0b038e58eaa3fea29db84723ec Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Mon, 23 May 2016 17:03:57 +0200 Subject: [PATCH 1/1] mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap() If the page_move_anon_rmap() is refiling

Re: [PATCH 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create()

2016-05-16 Thread Andrea Arcangeli
ome latency in the footprint reduction in the future non-cooperative usage). Reviewed-by: Andrea Arcangeli <aarca...@redhat.com> > +static inline bool userfaultfd_get_mm(struct userfaultfd_ctx *ctx) > +{ > + return atomic_inc_not_zero(>mm->mm_users); > +} Nice cl

Re: [PATCH 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create()

2016-05-16 Thread Andrea Arcangeli
ome latency in the footprint reduction in the future non-cooperative usage). Reviewed-by: Andrea Arcangeli > +static inline bool userfaultfd_get_mm(struct userfaultfd_ctx *ctx) > +{ > + return atomic_inc_not_zero(>mm->mm_users); > +} Nice cleanup, but wouldn't it be more ge

[PATCH 1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults

2016-05-12 Thread Andrea Arcangeli
ion. Reviewed-by: "Kirill A. Shutemov" <kir...@shutemov.name> Signed-off-by: Andrea Arcangeli <aarca...@redhat.com> --- include/linux/mm.h | 9 +++ include/linux/swap.h | 6 ++--- mm/huge_memory.c | 71 +--- mm/memory

[PATCH 1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults

2016-05-12 Thread Andrea Arcangeli
ion. Reviewed-by: "Kirill A. Shutemov" Signed-off-by: Andrea Arcangeli --- include/linux/mm.h | 9 +++ include/linux/swap.h | 6 ++--- mm/huge_memory.c | 71 +--- mm/memory.c | 22 ++-- mm/swapfile.c

Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Andrea Arcangeli
Hello Nicolas, On Thu, May 12, 2016 at 05:31:52PM +0200, Nicolas Morey-Chaisemartin wrote: > > > Le 05/12/2016 à 03:52 PM, Jerome Glisse a écrit : > > On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote: > >> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit : > >>> On Thu,

Re: [Question] Missing data after DMA read transfer - mm issue with transparent huge page?

2016-05-12 Thread Andrea Arcangeli
Hello Nicolas, On Thu, May 12, 2016 at 05:31:52PM +0200, Nicolas Morey-Chaisemartin wrote: > > > Le 05/12/2016 à 03:52 PM, Jerome Glisse a écrit : > > On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote: > >> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit : > >>> On Thu,

[PATCH 1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults

2016-05-10 Thread Andrea Arcangeli
used only once now, while with the previous code reuse_swap_page(page++) would have called page_mapcount on page+1 and it would have increased page twice instead of just once. Reviewed-by: "Kirill A. Shutemov" <kir...@shutemov.name> Signed-off-by: Andrea Arcangeli <aarca.

[PATCH 1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults

2016-05-10 Thread Andrea Arcangeli
used only once now, while with the previous code reuse_swap_page(page++) would have called page_mapcount on page+1 and it would have increased page twice instead of just once. Reviewed-by: "Kirill A. Shutemov" Signed-off-by: Andrea Arcangeli --- include/linux/mm.h | 9 +++

[PATCH 0/3] mm: thp: mapcount updates

2016-05-06 Thread Andrea Arcangeli
anyway. Andrea Arcangeli (3): mm: thp: calculate the mapcount correctly for THP pages during WP faults mm: thp: microoptimize compound_mapcount() mm: thp: split_huge_pmd_address() comment improvement include/linux/mm.h | 12 +++-- include/linux/swap.h | 8 +++--- mm

[PATCH 0/3] mm: thp: mapcount updates

2016-05-06 Thread Andrea Arcangeli
anyway. Andrea Arcangeli (3): mm: thp: calculate the mapcount correctly for THP pages during WP faults mm: thp: microoptimize compound_mapcount() mm: thp: split_huge_pmd_address() comment improvement include/linux/mm.h | 12 +++-- include/linux/swap.h | 8 +++--- mm

[PATCH 1/3] mm: thp: calculate the mapcount correctly for THP pages during WP faults

2016-05-06 Thread Andrea Arcangeli
ill A. Shutemov" <kir...@shutemov.name> Signed-off-by: Andrea Arcangeli <aarca...@redhat.com> --- include/linux/mm.h | 9 +++ include/linux/swap.h | 8 --- mm/huge_memory.c | 67 +--- mm/memory.c | 22 ++

[PATCH 1/3] mm: thp: calculate the mapcount correctly for THP pages during WP faults

2016-05-06 Thread Andrea Arcangeli
Shutemov" Signed-off-by: Andrea Arcangeli --- include/linux/mm.h | 9 +++ include/linux/swap.h | 8 --- mm/huge_memory.c | 67 +--- mm/memory.c | 22 ++--- mm/swapfile.c| 13 +- 5 files cha

[PATCH 2/3] mm: thp: microoptimize compound_mapcount()

2016-05-06 Thread Andrea Arcangeli
compound_mapcount() is only called after PageCompound() has already been checked by the caller, so there's no point to check it again. Gcc may optimize it away too because it's inline but this will remove the runtime check for sure and add it'll add an assert instead. Signed-off-by: Andrea

[PATCH 3/3] mm: thp: split_huge_pmd_address() comment improvement

2016-05-06 Thread Andrea Arcangeli
Comment is partly wrong, this improves it by including the case of split_huge_pmd_address() called by try_to_unmap_one if TTU_SPLIT_HUGE_PMD is set. Signed-off-by: Andrea Arcangeli <aarca...@redhat.com> --- mm/huge_memory.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff

[PATCH 2/3] mm: thp: microoptimize compound_mapcount()

2016-05-06 Thread Andrea Arcangeli
compound_mapcount() is only called after PageCompound() has already been checked by the caller, so there's no point to check it again. Gcc may optimize it away too because it's inline but this will remove the runtime check for sure and add it'll add an assert instead. Signed-off-by: Andrea

[PATCH 3/3] mm: thp: split_huge_pmd_address() comment improvement

2016-05-06 Thread Andrea Arcangeli
Comment is partly wrong, this improves it by including the case of split_huge_pmd_address() called by try_to_unmap_one if TTU_SPLIT_HUGE_PMD is set. Signed-off-by: Andrea Arcangeli --- mm/huge_memory.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b

Re: [PATCH v2] ksm: fix conflict between mmput and scan_get_next_rmap_item

2016-05-06 Thread Andrea Arcangeli
gt; + spin_unlock(_mmlist_lock); > > /* Repeat until we've completed scanning the whole list */ > slot = ksm_scan.mm_slot; Reviewed-by: Andrea Arcangeli <aarca...@redhat.com> While the above patch is correct, I would however prefer if you could update it to keep relea

Re: [PATCH v2] ksm: fix conflict between mmput and scan_get_next_rmap_item

2016-05-06 Thread Andrea Arcangeli
gt; + spin_unlock(_mmlist_lock); > > /* Repeat until we've completed scanning the whole list */ > slot = ksm_scan.mm_slot; Reviewed-by: Andrea Arcangeli While the above patch is correct, I would however prefer if you could update it to keep releasing the ksm_mmli

Re: [PATCH] ksm: fix conflict between mmput and scan_get_next_rmap_item

2016-05-05 Thread Andrea Arcangeli
Hello Zhou, Great catch. On Thu, May 05, 2016 at 08:42:56PM +0800, Zhou Chengming wrote: > remove_trailing_rmap_items(slot, ksm_scan.rmap_list); > + up_read(>mmap_sem); > > spin_lock(_mmlist_lock); > ksm_scan.mm_slot = list_entry(slot->mm_list.next, > @@ -1666,16 +1667,12

Re: [PATCH] ksm: fix conflict between mmput and scan_get_next_rmap_item

2016-05-05 Thread Andrea Arcangeli
Hello Zhou, Great catch. On Thu, May 05, 2016 at 08:42:56PM +0800, Zhou Chengming wrote: > remove_trailing_rmap_items(slot, ksm_scan.rmap_list); > + up_read(>mmap_sem); > > spin_lock(_mmlist_lock); > ksm_scan.mm_slot = list_entry(slot->mm_list.next, > @@ -1666,16 +1667,12

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-05-05 Thread Andrea Arcangeli
On Thu, May 05, 2016 at 06:11:10PM +0300, Kirill A. Shutemov wrote: > Hm. How total_mapcount equal to NULL wouldn't lead to NULL-pointer > dereference inside page_trans_huge_mapcount()? Sorry for the confusion, this was still work in progress and then I've seen the email from Alex and I sent the

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-05-05 Thread Andrea Arcangeli
On Thu, May 05, 2016 at 06:11:10PM +0300, Kirill A. Shutemov wrote: > Hm. How total_mapcount equal to NULL wouldn't lead to NULL-pointer > dereference inside page_trans_huge_mapcount()? Sorry for the confusion, this was still work in progress and then I've seen the email from Alex and I sent the

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-05-05 Thread Andrea Arcangeli
On Thu, May 05, 2016 at 04:39:24PM +0200, Andrea Arcangeli wrote: > I'm currently testing this: I must have been testing an earlier version, this below has better chance not to oops. There's a reason I didn't attempt a proper submit yet.. this is just for testing until we're sure this ok. I a

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-05-05 Thread Andrea Arcangeli
On Thu, May 05, 2016 at 04:39:24PM +0200, Andrea Arcangeli wrote: > I'm currently testing this: I must have been testing an earlier version, this below has better chance not to oops. There's a reason I didn't attempt a proper submit yet.. this is just for testing until we're sure this ok. I a

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-05-05 Thread Andrea Arcangeli
Hello Alex, On Wed, May 04, 2016 at 07:19:27PM -0600, Alex Williamson wrote: > On Mon, 2 May 2016 20:03:07 +0200 > Andrea Arcangeli <aarca...@redhat.com> wrote: > > > On Mon, May 02, 2016 at 07:00:42PM +0300, Kirill A. Shutemov wrote: > > > Agreed. I just didn't

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-05-05 Thread Andrea Arcangeli
Hello Alex, On Wed, May 04, 2016 at 07:19:27PM -0600, Alex Williamson wrote: > On Mon, 2 May 2016 20:03:07 +0200 > Andrea Arcangeli wrote: > > > On Mon, May 02, 2016 at 07:00:42PM +0300, Kirill A. Shutemov wrote: > > > Agreed. I just didn't see the two-refcounts sol

Re: GUP guarantees wrt to userspace mappings

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 07:12:52PM +0300, Kirill A. Shutemov wrote: > Any reason why mmu_notifier is not an option? No way to trigger an hardware re-tried secondary MMU fault as result of PCI DMA memory access, and expensive to do an MMU notifier invalidate if it requires waiting for the DMA to

Re: GUP guarantees wrt to userspace mappings

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 07:12:52PM +0300, Kirill A. Shutemov wrote: > Any reason why mmu_notifier is not an option? No way to trigger an hardware re-tried secondary MMU fault as result of PCI DMA memory access, and expensive to do an MMU notifier invalidate if it requires waiting for the DMA to

Re: GUP guarantees wrt to userspace mappings

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 05:22:49PM +0200, Jerome Glisse wrote: > I think this is still fine as it means that device will read only and thus > you can migrate to different page (ie the guest is not expecting to read back > anything writen by the device and device writting to the page would be

Re: GUP guarantees wrt to userspace mappings

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 05:22:49PM +0200, Jerome Glisse wrote: > I think this is still fine as it means that device will read only and thus > you can migrate to different page (ie the guest is not expecting to read back > anything writen by the device and device writting to the page would be

Re: GUP guarantees wrt to userspace mappings

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 06:00:13PM +0300, Kirill A. Shutemov wrote: > Switching to non-fast GUP would help :-P If we had a race in khugepaged or ksmd against gup_fast O_DIRECT we'd get flood of bugreports of data corruption with KVM run with cache=direct. Just wanted to reassure there's no race,

Re: GUP guarantees wrt to userspace mappings

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 06:00:13PM +0300, Kirill A. Shutemov wrote: > Switching to non-fast GUP would help :-P If we had a race in khugepaged or ksmd against gup_fast O_DIRECT we'd get flood of bugreports of data corruption with KVM run with cache=direct. Just wanted to reassure there's no race,

Re: GUP guarantees wrt to userspace mappings redesign

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 03:14:02PM +0300, Kirill A. Shutemov wrote: > Quick look around: > > - I don't see any check page_count() around __replace_page() in uprobes, >so it can easily replace pinned page. > > - KSM has the page_count() check, there's still race wrt GUP_fast: it can >

Re: GUP guarantees wrt to userspace mappings redesign

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 03:14:02PM +0300, Kirill A. Shutemov wrote: > Quick look around: > > - I don't see any check page_count() around __replace_page() in uprobes, >so it can easily replace pinned page. > > - KSM has the page_count() check, there's still race wrt GUP_fast: it can >

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 07:00:42PM +0300, Kirill A. Shutemov wrote: > Sounds correct, but code is going to be ugly :-/ Now if a page is not shared in the parent, it is already in the local anon_vma. The only thing we could lose here is a pmd split in the child caused by swapping and then parent

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 07:00:42PM +0300, Kirill A. Shutemov wrote: > Sounds correct, but code is going to be ugly :-/ Now if a page is not shared in the parent, it is already in the local anon_vma. The only thing we could lose here is a pmd split in the child caused by swapping and then parent

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 01:41:19PM +0300, Kirill A. Shutemov wrote: > I don't think this would work correctly. Let's check one of callers: > > static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long address, pte_t *page_table, pmd_t *pmd, >

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-05-02 Thread Andrea Arcangeli
On Mon, May 02, 2016 at 01:41:19PM +0300, Kirill A. Shutemov wrote: > I don't think this would work correctly. Let's check one of callers: > > static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long address, pte_t *page_table, pmd_t *pmd, >

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-04-29 Thread Andrea Arcangeli
00 2001 From: Andrea Arcangeli <aarca...@redhat.com> Date: Fri, 29 Apr 2016 01:05:06 +0200 Subject: [PATCH 1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults This will provide fully accuracy to the mapcount calculation in the write protect faults, so page pinning w

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-04-29 Thread Andrea Arcangeli
00 2001 From: Andrea Arcangeli Date: Fri, 29 Apr 2016 01:05:06 +0200 Subject: [PATCH 1/1] mm: thp: calculate the mapcount correctly for THP pages during WP faults This will provide fully accuracy to the mapcount calculation in the write protect faults, so page pinning will not get broken by false p

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-04-28 Thread Andrea Arcangeli
k vs 4MB. The problem of course is when we really need a COW, we'll waste an additional 32k, but then it doesn't matter that much as we'd be forced to load 4MB of cache anyway in such case. There's room for optimizations but even the simple below patch would be ok for now. >From 09e3d1ff10b49fb9c3a

Re: [BUG] vfio device assignment regression with THP ref counting redesign

2016-04-28 Thread Andrea Arcangeli
k vs 4MB. The problem of course is when we really need a COW, we'll waste an additional 32k, but then it doesn't matter that much as we'd be forced to load 4MB of cache anyway in such case. There's room for optimizations but even the simple below patch would be ok for now. >From 09e3d1ff10b49fb9c3ab77f

Re: [PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
On Wed, Apr 27, 2016 at 05:57:30PM +0200, Andrea Arcangeli wrote: > couldn't do a fix as cleaner as this one for 4.6. ehm "cleaner then" If you've suggestions for a better name than PageTransCompoundMap I can respin a new patch though, I considered "CanMap" but I opte

Re: [PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
On Wed, Apr 27, 2016 at 05:57:30PM +0200, Andrea Arcangeli wrote: > couldn't do a fix as cleaner as this one for 4.6. ehm "cleaner then" If you've suggestions for a better name than PageTransCompoundMap I can respin a new patch though, I considered "CanMap" but I opte

Re: [PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
On Wed, Apr 27, 2016 at 06:18:34PM +0300, Kirill A. Shutemov wrote: > Okay, I see. > > But do we really want to make PageTransCompoundMap() visiable beyond KVM > code? It looks like too KVM-specific. Any other secondary MMU notifier manager (KVM is just one of the many MMU notifier users) will

Re: [PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
On Wed, Apr 27, 2016 at 06:18:34PM +0300, Kirill A. Shutemov wrote: > Okay, I see. > > But do we really want to make PageTransCompoundMap() visiable beyond KVM > code? It looks like too KVM-specific. Any other secondary MMU notifier manager (KVM is just one of the many MMU notifier users) will

Re: [PATCHv7 00/29] THP-enabled tmpfs/shmem using compound pages

2016-04-27 Thread Andrea Arcangeli
Hello Andres, On Tue, Apr 19, 2016 at 10:07:29AM -0700, Andres Lagar-Cavilla wrote: > Andrea, we provide the, ahem, adjustments to > transparent_hugepage_adjust. Rest assured we aggressively use mmu > notifiers with no further changes required. Did you notice I just fixed a THP related bug in

Re: [PATCHv7 00/29] THP-enabled tmpfs/shmem using compound pages

2016-04-27 Thread Andrea Arcangeli
Hello Andres, On Tue, Apr 19, 2016 at 10:07:29AM -0700, Andres Lagar-Cavilla wrote: > Andrea, we provide the, ahem, adjustments to > transparent_hugepage_adjust. Rest assured we aggressively use mmu > notifiers with no further changes required. Did you notice I just fixed a THP related bug in

Re: [PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
On Wed, Apr 27, 2016 at 04:50:30PM +0300, Kirill A. Shutemov wrote: > I know nothing about kvm. How do you protect against pmd splitting between > get_user_pages() and the check? get_user_pages_fast() runs fully lockless and unpins the page right away (we need a get_user_pages_fast without the

Re: [PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
On Wed, Apr 27, 2016 at 04:50:30PM +0300, Kirill A. Shutemov wrote: > I know nothing about kvm. How do you protect against pmd splitting between > get_user_pages() and the check? get_user_pages_fast() runs fully lockless and unpins the page right away (we need a get_user_pages_fast without the

[PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
), KVM would map the whole compound page into the shadow pagetables, despite regular faults or userfaults (like UFFDIO_COPY) may map regular pages into the primary MMU as result of the pte faults, leading to the guest mode and userland mode going out of sync and not working on the same memory at all time

[PATCH 1/1] mm: thp: kvm: fix memory corruption in KVM with THP enabled

2016-04-27 Thread Andrea Arcangeli
), KVM would map the whole compound page into the shadow pagetables, despite regular faults or userfaults (like UFFDIO_COPY) may map regular pages into the primary MMU as result of the pte faults, leading to the guest mode and userland mode going out of sync and not working on the same memory at all ti

Re: [PATCH 0/5] userfaultfd: extension for non cooperative uffd usage

2016-04-22 Thread Andrea Arcangeli
Hello Pavel and Mike, On Wed, Apr 20, 2016 at 12:44:48PM +0300, Pavel Emelyanov wrote: > On 03/20/2016 03:42 PM, Mike Rapoport wrote: > > Hi, > > > > This set is to address the issues that appear in userfaultfd usage > > scenarios when the task monitoring the uffd and the mm-owner do not > >

Re: [PATCH 0/5] userfaultfd: extension for non cooperative uffd usage

2016-04-22 Thread Andrea Arcangeli
Hello Pavel and Mike, On Wed, Apr 20, 2016 at 12:44:48PM +0300, Pavel Emelyanov wrote: > On 03/20/2016 03:42 PM, Mike Rapoport wrote: > > Hi, > > > > This set is to address the issues that appear in userfaultfd usage > > scenarios when the task monitoring the uffd and the mm-owner do not > >

Re: [PATCHv7 00/29] THP-enabled tmpfs/shmem using compound pages

2016-04-19 Thread Andrea Arcangeli
Hello, On Mon, Apr 18, 2016 at 03:55:44PM -0700, Shi, Yang wrote: > Hi Kirill, > > Finally, I got some time to look into and try yours and Hugh's patches, > got two problems. One thing that come to mind to test is this: qemu with -machine accel=kvm -mem-path=/dev/shm/,share=on . The THP

Re: [PATCHv7 00/29] THP-enabled tmpfs/shmem using compound pages

2016-04-19 Thread Andrea Arcangeli
Hello, On Mon, Apr 18, 2016 at 03:55:44PM -0700, Shi, Yang wrote: > Hi Kirill, > > Finally, I got some time to look into and try yours and Hugh's patches, > got two problems. One thing that come to mind to test is this: qemu with -machine accel=kvm -mem-path=/dev/shm/,share=on . The THP

Re: mm: BUG in khugepaged_scan_mm_slot

2016-04-04 Thread Andrea Arcangeli
Hello, On Mon, Apr 04, 2016 at 03:06:25PM +0300, Kirill A. Shutemov wrote: > On Mon, Apr 04, 2016 at 02:03:54PM +0200, Vlastimil Babka wrote: > > [+CC Andrea] > > > > On 04/02/2016 11:48 AM, Dmitry Vyukov wrote: > > >Hello, > > > > > >The following program triggers a BUG in

Re: mm: BUG in khugepaged_scan_mm_slot

2016-04-04 Thread Andrea Arcangeli
Hello, On Mon, Apr 04, 2016 at 03:06:25PM +0300, Kirill A. Shutemov wrote: > On Mon, Apr 04, 2016 at 02:03:54PM +0200, Vlastimil Babka wrote: > > [+CC Andrea] > > > > On 04/02/2016 11:48 AM, Dmitry Vyukov wrote: > > >Hello, > > > > > >The following program triggers a BUG in

Re: + x86-add-support-for-pud-sized-transparent-hugepages-checkpatch-fixes.patch added to -mm tree

2016-03-09 Thread Andrea Arcangeli
t; #endif > > Or perhaps better, centralise the non-SMP definitions: > > arch/x86/include/asm/pgtable-2level.h | 6 -- > arch/x86/include/asm/pgtable-3level.h | 7 +-- > arch/x86/include/asm/pgtable.h| 5 + > arch/x86/include/asm/pgtable_64.h | 18 ++ > 4 files changed, 8 insertions(+), 28 deletions(-) Reviewed-by: Andrea Arcangeli <aarca...@redhat.com>

Re: + x86-add-support-for-pud-sized-transparent-hugepages-checkpatch-fixes.patch added to -mm tree

2016-03-09 Thread Andrea Arcangeli
t; #endif > > Or perhaps better, centralise the non-SMP definitions: > > arch/x86/include/asm/pgtable-2level.h | 6 -- > arch/x86/include/asm/pgtable-3level.h | 7 +-- > arch/x86/include/asm/pgtable.h| 5 + > arch/x86/include/asm/pgtable_64.h | 18 ++ > 4 files changed, 8 insertions(+), 28 deletions(-) Reviewed-by: Andrea Arcangeli

Re: + x86-add-support-for-pud-sized-transparent-hugepages-checkpatch-fixes.patch added to -mm tree

2016-03-09 Thread Andrea Arcangeli
Hello everyone, On Fri, Mar 04, 2016 at 03:30:18PM -0500, Matthew Wilcox wrote: > On Wed, Feb 03, 2016 at 08:48:35AM +0100, Ingo Molnar wrote: > > > @@ -111,8 +111,10 @@ static inline pud_t native_pudp_get_and_ > > > #ifdef CONFIG_SMP > > > return native_make_pud(xchg(>pud, 0)); > > > #else >

Re: + x86-add-support-for-pud-sized-transparent-hugepages-checkpatch-fixes.patch added to -mm tree

2016-03-09 Thread Andrea Arcangeli
Hello everyone, On Fri, Mar 04, 2016 at 03:30:18PM -0500, Matthew Wilcox wrote: > On Wed, Feb 03, 2016 at 08:48:35AM +0100, Ingo Molnar wrote: > > > @@ -111,8 +111,10 @@ static inline pud_t native_pudp_get_and_ > > > #ifdef CONFIG_SMP > > > return native_make_pud(xchg(>pud, 0)); > > > #else >

Re: fs: uninterruptible hang in handle_userfault

2016-03-03 Thread Andrea Arcangeli
Hello, On Thu, Mar 03, 2016 at 08:46:41AM +0100, Sedat Dilek wrote: > One technical question: > How do I get the latest Linux version shipped userfaultfd first? > ( Maybe there exist more elegant ways I do. Always open to improve my > Git knowledge. ) Perhaps there are cleaner ways, I would do

Re: fs: uninterruptible hang in handle_userfault

2016-03-03 Thread Andrea Arcangeli
Hello, On Thu, Mar 03, 2016 at 08:46:41AM +0100, Sedat Dilek wrote: > One technical question: > How do I get the latest Linux version shipped userfaultfd first? > ( Maybe there exist more elegant ways I do. Always open to improve my > Git knowledge. ) Perhaps there are cleaner ways, I would do

Re: [PATCH] eventfd: document lockless access in eventfd_poll

2016-03-02 Thread Andrea Arcangeli
unlock ctx->qwh.lock > + * lock ctx->wqh.lock (in poll_wait) > + * __add_wait_queue > + * unlock ctx->wqh.lock > + * eventfd_poll returns 0 > + */ > + count = READ_ONCE(ctx->count); > > if (count > 0) > events |= POLLIN; Reviewed-by: Andrea Arcangeli <aarca...@redhat.com>

Re: [PATCH] eventfd: document lockless access in eventfd_poll

2016-03-02 Thread Andrea Arcangeli
unlock ctx->qwh.lock > + * lock ctx->wqh.lock (in poll_wait) > + * __add_wait_queue > + * unlock ctx->wqh.lock > + * eventfd_poll returns 0 > + */ > + count = READ_ONCE(ctx->count); > > if (count > 0) > events |= POLLIN; Reviewed-by: Andrea Arcangeli

Re: [PATCH 1/1] mm: thp: Redefine default THP defrag behaviour disable it by default

2016-03-02 Thread Andrea Arcangeli
On Fri, Feb 26, 2016 at 01:32:53PM +0300, Kirill A. Shutemov wrote: > Could you elaborate on problems with rmap? I have looked into this deeply > yet. > > Do you see anything what would prevent following basic scheme: > > - Identify series of small pages as candidate for collapsing into >a

Re: [PATCH 1/1] mm: thp: Redefine default THP defrag behaviour disable it by default

2016-03-02 Thread Andrea Arcangeli
On Fri, Feb 26, 2016 at 01:32:53PM +0300, Kirill A. Shutemov wrote: > Could you elaborate on problems with rmap? I have looked into this deeply > yet. > > Do you see anything what would prevent following basic scheme: > > - Identify series of small pages as candidate for collapsing into >a

Re: fs: uninterruptible hang in handle_userfault

2016-03-02 Thread Andrea Arcangeli
think some more about this and come up with > solutions how to avoid these kinds of "very late user space accesses" > cleanly, I think that would be great. Agreed. Thanks, Andrea >From 03f7e43aab4e4b6f02599f4e4675581f691e Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli <

Re: fs: uninterruptible hang in handle_userfault

2016-03-02 Thread Andrea Arcangeli
think some more about this and come up with > solutions how to avoid these kinds of "very late user space accesses" > cleanly, I think that would be great. Agreed. Thanks, Andrea >From 03f7e43aab4e4b6f02599f4e4675581f691e Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli

Re: fs: uninterruptible hang in handle_userfault

2016-03-02 Thread Andrea Arcangeli
Hello, On Wed, Mar 02, 2016 at 12:48:46AM +, Al Viro wrote: > On Tue, Mar 01, 2016 at 12:06:49PM -0800, Linus Torvalds wrote: > > > So the only access we really care about is the child tid-pointer > > clearing one, and that always happens after PF_EXITING has been set > > afaik. > > > > No

<    3   4   5   6   7   8   9   10   11   12   >