Re: [RFC PATCH 7/8] hugetlb: add update_and_free_page_no_sleep for irq context

2021-03-24 Thread Mike Kravetz
On 3/19/21 6:18 PM, Hillf Danton wrote: > On Fri, 19 Mar 2021 15:42:08 -0700 Mike Kravetz wrote: >> + >> +if (!can_sleep && free_page_may_sleep(h, page)) { >> +/* >> + * Send page freeing to workqueue >> + * >&g

Re: [RFC PATCH 7/8] hugetlb: add update_and_free_page_no_sleep for irq context

2021-03-24 Thread Mike Kravetz
On 3/24/21 1:43 AM, Michal Hocko wrote: > On Tue 23-03-21 11:51:04, Mike Kravetz wrote: >> On 3/22/21 11:10 AM, Roman Gushchin wrote: >>> On Mon, Mar 22, 2021 at 10:42:23AM -0700, Mike Kravetz wrote: >>>> Cc: Roman, Christoph >>>> >>>> On 3/22/

Re: [RFC PATCH 2/8] hugetlb: recompute min_count when dropping hugetlb_lock

2021-03-24 Thread Mike Kravetz
On 3/24/21 1:36 AM, Michal Hocko wrote: > On Tue 23-03-21 16:18:08, Mike Kravetz wrote: > [...] >> Here is another thought. >> In patch 5 you suggest removing all pages from hugetlb with the lock >> held, and adding them to a list. Then, drop the lock and free all >>

Re: [RFC PATCH 5/8] hugetlb: change free_pool_huge_page to remove_pool_huge_page

2021-03-24 Thread Mike Kravetz
On 3/24/21 1:40 AM, Michal Hocko wrote: > On Tue 23-03-21 18:03:07, Mike Kravetz wrote: > [...] >> Since you brought up cgroups ... what is your opinion on lock hold time >> in hugetlb_cgroup_css_offline? We could potentially be calling >> hugetlb_cgroup_move_parent for

Re: [RFC PATCH 5/8] hugetlb: change free_pool_huge_page to remove_pool_huge_page

2021-03-23 Thread Mike Kravetz
On 3/23/21 12:57 AM, Michal Hocko wrote: > On Mon 22-03-21 16:28:07, Mike Kravetz wrote: >> On 3/22/21 7:31 AM, Michal Hocko wrote: >>> On Fri 19-03-21 15:42:06, Mike Kravetz wrote: >>> [...] >>>> @@ -2090,9 +2084,15 @@ static void return_unused_surplus_page

Re: [RFC PATCH 2/8] hugetlb: recompute min_count when dropping hugetlb_lock

2021-03-23 Thread Mike Kravetz
load), or to duplicate the load and do it >> again later (reaching a different result). >> >> Similarly, the compiler is allowed to byte-wise load the variable in any >> random order and re-assemble. >> >> If any of that is a problem, you have to use READ_ONCE(). > > Thanks for the confirmation! > Here is another thought. In patch 5 you suggest removing all pages from hugetlb with the lock held, and adding them to a list. Then, drop the lock and free all pages on the list. If we do this, then the value computed here (min_count) can not change while we are looping. So, this patch would be unnecessary. That is another argument in favor of batching the frees. Unless there is something wrong in my thinking, I am going to take that approach and drop this patch. -- Mike Kravetz

Re: [RFC PATCH 7/8] hugetlb: add update_and_free_page_no_sleep for irq context

2021-03-23 Thread Mike Kravetz
On 3/22/21 11:10 AM, Roman Gushchin wrote: > On Mon, Mar 22, 2021 at 10:42:23AM -0700, Mike Kravetz wrote: >> Cc: Roman, Christoph >> >> On 3/22/21 1:41 AM, Peter Zijlstra wrote: >>> On Fri, Mar 19, 2021 at 03:42:08PM -0700, Mike Kravetz wrote: >>>> Th

Re: [RFC PATCH 5/8] hugetlb: change free_pool_huge_page to remove_pool_huge_page

2021-03-22 Thread Mike Kravetz
On 3/22/21 7:31 AM, Michal Hocko wrote: > On Fri 19-03-21 15:42:06, Mike Kravetz wrote: > [...] >> @@ -2090,9 +2084,15 @@ static void return_unused_surplus_pages(struct hstate >> *h, >> while (nr_pages--) { >> h->resv_huge_pages--;

Re: [RFC PATCH 2/8] hugetlb: recompute min_count when dropping hugetlb_lock

2021-03-22 Thread Mike Kravetz
On 3/22/21 7:07 AM, Michal Hocko wrote: > On Fri 19-03-21 15:42:03, Mike Kravetz wrote: >> The routine set_max_huge_pages reduces the number of hugetlb_pages, >> by calling free_pool_huge_page in a loop. It does this as long as >> persistent_huge_pages() is above a calcu

Re: [PATCH] userfaultfd/hugetlbfs: Fix minor fault page leak

2021-03-22 Thread Mike Kravetz
t /proc/meminfo. > > Cc: Axel Rasmussen > Cc: Andrea Arcangeli > Cc: Mike Kravetz > Cc: Mike Rapoport > Cc: Andrew Morton > Fixes: f2bf15fb0969 ("userfaultfd: add minor fault registration mode") > Signed-off-by: Peter Xu > --- > mm/hugetlb.c | 1

Re: [RFC PATCH 7/8] hugetlb: add update_and_free_page_no_sleep for irq context

2021-03-22 Thread Mike Kravetz
Cc: Roman, Christoph On 3/22/21 1:41 AM, Peter Zijlstra wrote: > On Fri, Mar 19, 2021 at 03:42:08PM -0700, Mike Kravetz wrote: >> The locks acquired in free_huge_page are irq safe. However, in certain >> circumstances the routine update_and_free_page could sleep. Since >>

Re: [RFC PATCH 3/8] hugetlb: create remove_hugetlb_page() to separate functionality

2021-03-22 Thread Mike Kravetz
On 3/22/21 7:15 AM, Michal Hocko wrote: > On Fri 19-03-21 15:42:04, Mike Kravetz wrote: >> The new remove_hugetlb_page() routine is designed to remove a hugetlb >> page from hugetlbfs processing. It will remove the page from the active >> or free list, update global counters

Re: [RFC PATCH 1/8] hugetlb: add per-hstate mutex to synchronize user adjustments

2021-03-22 Thread Mike Kravetz
On 3/22/21 6:59 AM, Michal Hocko wrote: > On Fri 19-03-21 15:42:02, Mike Kravetz wrote: >> The number of hugetlb pages can be adjusted by writing to the >> sysps/proc files nr_hugepages, nr_hugepages_mempolicy or >> nr_overcommit_hugepages. There is nothing to prev

Re: [RFC PATCH 6/8] hugetlb: make free_huge_page irq safe

2021-03-21 Thread Mike Kravetz
On 3/19/21 3:42 PM, Mike Kravetz wrote: > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in > non-task context") was added to address the issue of free_huge_page > being called from irq context. That commit hands off free_huge_page > processing to a

[RFC PATCH 4/8] hugetlb: call update_and_free_page without hugetlb_lock

2021-03-19 Thread Mike Kravetz
page to reduce long hold times. The ugly unlock/lock cycle in free_pool_huge_page will be removed in a subsequent patch which restructures free_pool_huge_page. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a

[RFC PATCH 5/8] hugetlb: change free_pool_huge_page to remove_pool_huge_page

2021-03-19 Thread Mike Kravetz
allocators. The hugetlb_lock is dropped before freeing to these allocators which results in shorter lock hold times. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 53 +--- 1 file changed, 30 insertions(+), 23 deletions(-) diff --git a/mm/hugetlb.c b

[RFC PATCH 6/8] hugetlb: make free_huge_page irq safe

2021-03-19 Thread Mike Kravetz
check and workqueue handoff. [1] https://lore.kernel.org/linux-mm/f1c03b05bc43a...@google.com/ Signed-off-by: Mike Kravetz --- mm/hugetlb.c| 206 mm/hugetlb_cgroup.c | 10 ++- 2 files changed, 100 insertions(+), 116 deletions(-)

[RFC PATCH 0/8] make hugetlb put_page safe for all calling contexts

2021-03-19 Thread Mike Kravetz
ogle.com/ [2] http://lkml.kernel.org/r/20210311021321.127500-1-mike.krav...@oracle.com Mike Kravetz (8): hugetlb: add per-hstate mutex to synchronize user adjustments hugetlb: recompute min_count when dropping hugetlb_lock hugetlb: create remove_hugetlb_page() to separate functionality huget

[RFC PATCH 7/8] hugetlb: add update_and_free_page_no_sleep for irq context

2021-03-19 Thread Mike Kravetz
allocator. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 12 +- mm/hugetlb.c| 86 +++-- 2 files changed, 94 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index f42d44050548..a81ca39c06be 10

[RFC PATCH 8/8] hugetlb: track hugetlb pages allocated via cma_alloc

2021-03-19 Thread Mike Kravetz
flag HPageCma to indicate the page was allocated via cma_alloc. This flag can be used so that only gigantic pages allocated via cma_alloc will have deferred freeing. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 7 +++ mm/hugetlb.c| 18 ++ 2 files

[RFC PATCH 1/8] hugetlb: add per-hstate mutex to synchronize user adjustments

2021-03-19 Thread Mike Kravetz
occurrence is running at a time. Specifically, alloc_pool_huge_page uses a hstate specific variable without any synchronization. Add a mutex to the hstate and use it to only allow one hugetlb page adjustment at a time. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 1 + mm/hugetlb.c

[RFC PATCH 2/8] hugetlb: recompute min_count when dropping hugetlb_lock

2021-03-19 Thread Mike Kravetz
can drop hugetlb_lock. If the lock is dropped, counters could change the calculated min_count value may no longer be valid. The routine try_to_free_low has the same issue. Recalculate min_count in each loop iteration as hugetlb_lock may have been dropped. Signed-off-by: Mike Kravetz --- mm

[RFC PATCH 3/8] hugetlb: create remove_hugetlb_page() to separate functionality

2021-03-19 Thread Mike Kravetz
call, the 'page' can be treated as a normal compound page or a collection of base size pages. remove_hugetlb_page is to be called with the hugetlb_lock held. Creating this routine and separating functionality is in preparation for restructuring code to reduce lock hold times. Signed-

Re: [PATCH v2 5/5] mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case

2021-03-16 Thread Mike Kravetz
On 3/15/21 11:49 PM, Miaohe Lin wrote: > On 2021/3/16 11:07, Mike Kravetz wrote: >> On 3/15/21 7:27 PM, Miaohe Lin wrote: >>> The fault_mutex hashing overhead can be avoided in truncate_op case >>> because page faults can not race with truncation in this routine.

Re: [PATCH v2 5/5] mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case

2021-03-15 Thread Mike Kravetz
On 3/15/21 7:27 PM, Miaohe Lin wrote: > The fault_mutex hashing overhead can be avoided in truncate_op case > because page faults can not race with truncation in this routine. So > calculate hash for fault_mutex only in !truncate_op case to save some cpu > cycles. > > Reviewe

Re: [PATCH v2] hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings

2021-03-15 Thread Mike Kravetz
On 3/12/21 7:11 PM, Miaohe Lin wrote: > On 2021/3/13 3:09, Mike Kravetz wrote: >> On 3/1/21 4:05 AM, Miaohe Lin wrote: >>> The current implementation of hugetlb_cgroup for shared mappings could have >>> different behavior. Consider the following two scenarios: &

Re: [PATCH 5/5] mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case

2021-03-13 Thread Mike Kravetz
On 3/12/21 6:49 PM, Miaohe Lin wrote: > Hi: > On 2021/3/13 4:03, Mike Kravetz wrote: >> On 3/8/21 3:28 AM, Miaohe Lin wrote: >>> The fault_mutex hashing overhead can be avoided in truncate_op case because >>> page faults can not race with truncation in this rout

Re: [PATCH 5/5] mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case

2021-03-12 Thread Mike Kravetz
sh = 0; Do we need to initialize hash here? I would not bring this up normally, but the purpose of the patch is to save cpu cycles. -- Mike Kravetz > > index = page->index; > - hash = hug

Re: [PATCH 4/5] mm/hugetlb: simplify the code when alloc_huge_page() failed in hugetlb_no_page()

2021-03-12 Thread Mike Kravetz
xisting code made that very clear. Would have been even more clear with an unlikely modifier. In any case, the lengthy comment above this code makes it clear why the check is there. Code changes are fine. Reviewed-by: Mike Kravetz -- Mike Kravetz >

Re: [PATCH 3/5] hugetlb_cgroup: remove unnecessary VM_BUG_ON_PAGE in hugetlb_cgroup_migrate()

2021-03-12 Thread Mike Kravetz
> --- > mm/hugetlb_cgroup.c | 1 - > 1 file changed, 1 deletion(-) Reviewed-by: Mike Kravetz -- Mike Kravetz > > diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c > index 8668ba87cfe4..3dde6ddf0170 100644 > --- a/mm/hugetlb_cgroup.c > +++ b/mm/hugetlb_cgroup.c > @@ -785,

Re: [PATCH v2] hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings

2021-03-12 Thread Mike Kravetz
-- > mm/hugetlb.c | 42 ++ > mm/hugetlb_cgroup.c| 11 +++-- > 3 files changed, 60 insertions(+), 8 deletions(-) Just a few minor nits below, all in comments. It is not required, but would be nice to update these. Code lo

Re: [PATCH v18 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-03-12 Thread Mike Kravetz
On 3/12/21 12:15 AM, Michal Hocko wrote: > On Thu 11-03-21 14:53:08, Mike Kravetz wrote: >> On 3/11/21 9:59 AM, Mike Kravetz wrote: >>> On 3/11/21 4:17 AM, Michal Hocko wrote: >>>>> Yeah per cpu preempt counting shouldn't be noticeable but I have to

Re: [PATCH 0/3] Add support for free vmemmap pages of HugeTLB for arm64

2021-03-12 Thread Mike Kravetz
going wrong. > Are you specifying 'hugetlb_free_vmemmap=on' on the kernel command line? This feature is only enabled if you 'opt in' via the command line option. -- Mike Kravetz

Re: [PATCH v18 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-03-11 Thread Mike Kravetz
On 3/11/21 9:59 AM, Mike Kravetz wrote: > On 3/11/21 4:17 AM, Michal Hocko wrote: >>> Yeah per cpu preempt counting shouldn't be noticeable but I have to >>> confess I haven't benchmarked it. >> >> But all this seems moot now >> http://lkml.kern

Re: [PATCH v18 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-03-11 Thread Mike Kravetz
ge requests must be sent to a workqueue. Any ideas on how to address this? -- Mike Kravetz

Re: [PATCH] hugetlb: select PREEMPT_COUNT if HUGETLB_PAGE for in_atomic use

2021-03-11 Thread Mike Kravetz
>> >> The code really doesn't look _that_ complicated. > > Fair enough. As I've said I am not a great fan of this patch either > but it is a quick fix for a likely long term problem. If reworking the > hugetlb locking is preferable then be it. Thanks you Michal and Peter. This patch was mostly about starting a discussion, as this topic came up in a couple different places. I included the 'train wreck' of how we got here just for a bit of history. I'll start working on a proper fix. -- Mike Kravetz

[PATCH] hugetlb: select PREEMPT_COUNT if HUGETLB_PAGE for in_atomic use

2021-03-10 Thread Mike Kravetz
[2] https://lore.kernel.org/linux-mm/yejji9oawhuza...@dhcp22.suse.cz/ [3] https://lore.kernel.org/linux-mm/ydzaawk41k4gd...@dhcp22.suse.cz/ Suggested-by: Michal Hocko Signed-off-by: Mike Kravetz --- fs/Kconfig | 1 + mm/hugetlb.c | 10 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff

Re: [PATCH v18 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-03-10 Thread Mike Kravetz
On 3/10/21 1:49 PM, Paul E. McKenney wrote: > On Wed, Mar 10, 2021 at 10:11:22PM +0100, Michal Hocko wrote: >> On Wed 10-03-21 10:56:08, Mike Kravetz wrote: >>> On 3/10/21 7:19 AM, Michal Hocko wrote: >>>> On Mon 08-03-21 18:28:02, Muchun Song wrote: >>>&

Re: [RFC PATCH 0/3] hugetlb: add demote/split page functionality

2021-03-10 Thread Mike Kravetz
may sound crazy, but I think it may be the long term goal. -- Mike Kravetz

Re: [RFC PATCH 0/3] hugetlb: add demote/split page functionality

2021-03-10 Thread Mike Kravetz
On 3/10/21 8:23 AM, Michal Hocko wrote: > On Mon 08-03-21 16:18:52, Mike Kravetz wrote: > [...] >> Converting larger to smaller hugetlb pages can be accomplished today by >> first freeing the larger page to the buddy allocator and then allocating >> the smaller pages.

Re: [PATCH] mm/hugetlb: Fix build with !ARCH_WANT_HUGE_PMD_SHARE

2021-03-10 Thread Mike Kravetz
xactly sure how this is supposed to be handled. > Cc: Andrew Morton > Cc: Mike Kravetz > Cc: Axel Rasmussen > Reported-by: Naresh Kamboju > Tested-by: Naresh Kamboju > Signed-off-by: Peter Xu > --- > mm/hugetlb.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deleti

Re: [PATCH v18 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-03-10 Thread Mike Kravetz
on configurations. I'll put together a separate patch where we can discuss the merits of making the change from !in_task to in_atomic, and what work remains in this put_page area. -- Mike Kravetz

Re: [RFC PATCH 0/3] hugetlb: add demote/split page functionality

2021-03-09 Thread Mike Kravetz
On 3/9/21 9:50 AM, David Hildenbrand wrote: > On 09.03.21 18:11, Mike Kravetz wrote: >> On 3/9/21 1:01 AM, David Hildenbrand wrote: >>> On 09.03.21 01:18, Mike Kravetz wrote: >>>> To address these issues, introduce the concept of hugetlb page demotion. >>>&

Re: [RFC PATCH 0/3] hugetlb: add demote/split page functionality

2021-03-09 Thread Mike Kravetz
On 3/9/21 1:01 AM, David Hildenbrand wrote: > On 09.03.21 01:18, Mike Kravetz wrote: >> To address these issues, introduce the concept of hugetlb page demotion. >> Demotion provides a means of 'in place' splitting a hugetlb page to >> pages of a smaller size. For

[RFC PATCH 1/3] hugetlb: add demote hugetlb page sysfs interfaces

2021-03-08 Thread Mike Kravetz
number of hugetlb pages to an appropriate number of demote_size pages. This patch does not provide full demote functionality. It only provides the sysfs interfaces and uses existing code to free pages to the buddy allocator is demote_size == PAGESIZE. Signed-off-by: Mike Kravetz --- include

[RFC PATCH 3/3] hugetlb: add hugetlb demote page support

2021-03-08 Thread Mike Kravetz
Demote page functionality will split a huge page into a number of huge pages of a smaller size. For example, on x86 a 1GB huge page can be demoted into 512 2M huge pages. Demotion is done 'in place' by simply splitting the huge page. Signed-off-by: Mike Kravetz --- mm/huge

[RFC PATCH 0/3] hugetlb: add demote/split page functionality

2021-03-08 Thread Mike Kravetz
erved huge pages. Therefore, when a value is written to the sysfs demote file that value is only the maximum number of pages which will be demoted. It is possible fewer will actually be demoted. If demote_size is PAGESIZE, demote will simply free pages to the buddy allocator. Mike Kravetz (3): hu

[RFC PATCH 2/3] hugetlb: add HPageCma flag and code to free non-gigantic pages in CMA

2021-03-08 Thread Mike Kravetz
appropriate action. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 7 +++ mm/hugetlb.c| 27 +-- 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5e9d6c8ab411..b4ec2daea5aa

Re: [PATCH] mm/hugetlb: suppress wrong warning info when alloc gigantic page

2021-03-04 Thread Mike Kravetz
On 3/4/21 1:35 AM, David Hildenbrand wrote: > On 19.02.21 20:14, Mike Kravetz wrote: >> On 2/19/21 4:39 AM, Chen Wandun wrote: >>> If hugetlb_cma is enabled, it will skip boot time allocation >>> when allocating gigantic page, that doesn't means allocation >&

Re: [PATCH v17 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-03-03 Thread Mike Kravetz
(h, page); > @@ -1447,7 +1486,7 @@ void free_huge_page(struct page *page) > /* >* Defer freeing if in non-task context to avoid hugetlb_lock deadlock. >*/ > - if (!in_task()) { > + if (!in_atomic()) { That should be "if (in_atomic()) instead of &qu

Re: [PATCH] mm/hugetlb: use some helper functions to cleanup code

2021-03-02 Thread Mike Kravetz
These are all straight forward substitutions of open coded calculations with the appropriate helper routine. Reviewed-by: Mike Kravetz > --- > fs/hugetlbfs/inode.c | 2 +- > mm/hugetlb.c | 6 +++--- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff

Re: [PATCH v2] mm/hugetlb: remove redundant reservation check condition in alloc_huge_page()

2021-03-02 Thread Mike Kravetz
. Therefore, !vma_resv_map(vma) is redundant in the > expression: > map_chg || avoid_reserve || !vma_resv_map(vma); > Remove the redundant check. > > [Thanks Mike Kravetz for reshaping this commit message!] > > Signed-off-by: Miaohe Lin Thanks, Reviewed-by: Mike Kr

Re: possible deadlock in sk_clone_lock

2021-03-02 Thread Mike Kravetz
On 3/2/21 6:29 AM, Michal Hocko wrote: > On Tue 02-03-21 06:11:51, Shakeel Butt wrote: >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko wrote: >>> >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote: >>>> On 3/1/21 9:23 AM, Michal Hocko wrote: >>>

Re: possible deadlock in sk_clone_lock

2021-03-01 Thread Mike Kravetz
tlb_lock irq safe would not help. Again, I may be missing something. Note that we also are considering doing more with the hugetlb lock dropped in this path in the 'free vmemmap of hugetlb pages' series. Since we need to do some work that could block in this path, it seems like we really need to use a workqueue. It is too bad that there is not an interface to identify all the cases where interrupts are disabled. -- Mike Kravetz

Re: possible deadlock in sk_clone_lock

2021-02-26 Thread Mike Kravetz
there was the suggestion to change the !in_task to in_atomic. I need to do some research on the subtle differences between in_task, in_atomic, etc. TBH, I 'thought' !in_task would prevent the issue reported here. But, that obviously is not the case. -- Mike Kravetz

Re: [PATCH v3 2/2] mm: Make alloc_contig_range handle in-use hugetlb pages

2021-02-25 Thread Mike Kravetz
++ > mm/vmscan.c | 5 +++-- > 4 files changed, 34 insertions(+), 9 deletions(-) Thanks, Changes look good. I like the simple retry one time for pages which may go from free to in use. Reviewed-by: Mike Kravetz BTW, This series will need to be rebased on lat

Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages

2021-02-25 Thread Mike Kravetz
nge or commit 2c7452a075d4. So, when start_isolate_page_range goes to allocate another gigantic page it will never notice/operate on the existing gigantic page. Again, this is confusing and I might be missing something. In any case, I agree that gigantic pages are tricky and we should leave them out of the discussion for now. We can rethink this later if necessary. -- Mike Kravetz

Re: [PATCH v3 1/2] mm: Make alloc_contig_range handle free hugetlb pages

2021-02-25 Thread Mike Kravetz
em and alloc_contig_range. > > Signed-off-by: Oscar Salvador Thanks Oscar, I spent a bunch of time looking for possible race issues. Thankfully, the recent code from Muchun dealing with free lists helps. In addition, all the hugetlb acounting looks good. Reviewed-by: Mike Krave

Re: [PATCH v7 1/6] userfaultfd: add minor fault registration mode

2021-02-25 Thread Mike Kravetz
On 2/25/21 9:49 AM, Axel Rasmussen wrote: > On Wed, Feb 24, 2021 at 4:26 PM Mike Kravetz wrote: >> >> On 2/18/21 4:48 PM, Axel Rasmussen wrote: >> >>> @@ -401,8 +398,10 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, >>> unsigned long rea

Re: [PATCH v7 2/6] userfaultfd: disable huge PMD sharing for MINOR registered VMAs

2021-02-24 Thread Mike Kravetz
it can check and potentially update the page's > contents. > > Huge PMD sharing would prevent these faults from occurring for > suitably aligned areas, so disable it upon UFFD registration. > > Reviewed-by: Peter Xu > Signed-off-by: Axel Rasmussen Thanks, Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v7 1/6] userfaultfd: add minor fault registration mode

2021-02-24 Thread Mike Kravetz
goto out; > + } > } > > /* > I'm good with the hugetlb.c changes. Since this in nearly identical to the other handle_userfault() in this routine, it might be good to create a common wrapper. But, that is not required. -- Mike Kravetz

[PATCH] hugetlb: document the new location of page subpool pointer

2021-02-23 Thread Mike Kravetz
Expand comments, no functional change. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index cccd1aab69dd..c0467a7a1fe0 100644 --- a/include/linux/hugetlb.h +++ b/include/linux

Re: [PATCH] hugetlb: fix uninitialized subpool pointer

2021-02-23 Thread Mike Kravetz
On 2/23/21 3:21 PM, Mike Kravetz wrote: > On 2/23/21 2:58 PM, Oscar Salvador wrote: >> On 2021-02-23 23:55, Mike Kravetz wrote: >>> Yes, that is the more common case where the once active hugetlb page >>> will be simply added to the free list via enqueue_huge_page().

Re: [RFC] linux-next panic in hugepage_subpool_put_pages()

2021-02-23 Thread Mike Kravetz
On 2/23/21 3:58 PM, Andrew Morton wrote: > On Tue, 23 Feb 2021 10:06:12 -0800 Mike Kravetz > wrote: > >> On 2/23/21 6:57 AM, Gerald Schaefer wrote: >>> Hi, >>> >>> LTP triggered a panic on s390 in hugepage_subpool_put_pages() with >>> linux-nex

Re: [PATCH] hugetlb: fix uninitialized subpool pointer

2021-02-23 Thread Mike Kravetz
On 2/23/21 2:58 PM, Oscar Salvador wrote: > On 2021-02-23 23:55, Mike Kravetz wrote: >> Yes, that is the more common case where the once active hugetlb page >> will be simply added to the free list via enqueue_huge_page(). This >> path does not go through prep_new_huge_pa

Re: [PATCH] hugetlb: fix uninitialized subpool pointer

2021-02-23 Thread Mike Kravetz
On 2/23/21 2:45 PM, Oscar Salvador wrote: > On Tue, Feb 23, 2021 at 01:55:44PM -0800, Mike Kravetz wrote: >> Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages() >> with linux-next 5.12.0-20210222. >> Call trace: >> hugepage_subpool_

[PATCH] hugetlb: fix uninitialized subpool pointer

2021-02-23 Thread Mike Kravetz
ter in prep_new_huge_page(). Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags") Reported-by: Gerald Schaefer Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c232cb67dda2..7ae

Re: [RFC] linux-next panic in hugepage_subpool_put_pages()

2021-02-23 Thread Mike Kravetz
set_max_huge_pages to __free_huge_page is actually how the code puts newly allocated pages on it's interfal free list. I will do a bit more verification and put together a patch (it should be simple). -- Mike Kravetz

Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-02-22 Thread Mike Kravetz
; > - rc = 0; > + rc = update_and_free_page(h, head); > + if (rc) > + h->max_huge_pages++; Since update_and_free_page failed, the number of surplus pages was incremented. Surplus pages are the number of pages greater than max_

Re: [PATCH 1/2] mm: Make alloc_contig_range handle free hugetlb pages

2021-02-19 Thread Mike Kravetz
Yes, something like this should work. I'll let Oscar work out the details. One thing to note is that you also need to check for old_page not on the free list here. It could have been allocated and in use. In addition, make sure to check the new flag HPageFreed to ensure page is on free

Re: [PATCH RFC] mm/madvise: introduce MADV_POPULATE to prefault/prealloc memory

2021-02-19 Thread Mike Kravetz
"issue" is that the reservation happens on mmap(). mbind() runs > afterwards. Preallocation saves you from that. > > I suspect something similar will happen with anonymous memory with mbind() > even if we reserved swap space. Did not test yet, though. > Sorry, for jumping in late ... hugetlb keyword just hit my mail filters :) Yes, it is true that hugetlb reservations are not numa aware. So, even if pages are reserved at mmap time one could still SIGBUS if a fault is restricted to a node with insufficient pages. I looked into this some years ago, and there really is not a good way to make hugetlb reservations numa aware. preallocation, or on demand populating as proposed here is a way around the issue. -- Mike Kravetz

Re: [PATCH] mm/hugetlb: suppress wrong warning info when alloc gigantic page

2021-02-19 Thread Mike Kravetz
s ignored. IMO, it make sense to log a warning if ignoring a user specified parameter. The user should not be attempting boot time allocation and CMA reservation for 1G pages. I do not think we should drop the warning as the it tells the user thay have specified two incompatible allocatio

Re: [PATCH v2 1/2] mm: Make alloc_contig_range handle free hugetlb pages

2021-02-18 Thread Mike Kravetz
hen pages are > freed > + * instead of enqueued again. > + */ > + spin_lock(&hugetlb_lock); > + h->surplus_huge_pages++; > + h->surplus_huge_pages_node[nid]++;

Re: [RFC PATCH 5/5] mm proc/task_mmu.c: add hugetlb specific routine for clear_refs

2021-02-18 Thread Mike Kravetz
On 2/17/21 12:25 PM, Peter Xu wrote: > On Wed, Feb 10, 2021 at 04:03:22PM -0800, Mike Kravetz wrote: >> There was is no hugetlb specific routine for clearing soft dirty and >> other referrences. The 'default' routines would only clear the >> VM_SOFTDIRTY flag in

Re: [RFC PATCH 3/5] mm proc/task_mmu.c: add soft dirty pte checks for hugetlb

2021-02-18 Thread Mike Kravetz
On 2/17/21 11:35 AM, Peter Xu wrote: > On Wed, Feb 10, 2021 at 04:03:20PM -0800, Mike Kravetz wrote: >> Pagemap was only using the vma flag PM_SOFT_DIRTY for hugetlb vmas. >> This is insufficient. Check the individual pte entries. >> >> Signed-off-by: Mike Kravetz &g

Re: [RFC PATCH 2/5] hugetlb: enhance hugetlb fault processing to support soft dirty

2021-02-18 Thread Mike Kravetz
On 2/17/21 11:32 AM, Peter Xu wrote: > On Wed, Feb 10, 2021 at 04:03:19PM -0800, Mike Kravetz wrote: >> hugetlb fault processing code would COW all write faults where the >> pte was not writable. Soft dirty will write protect ptes as part >> of it's tracking mechanism.

Re: [RFC PATCH 1/5] hugetlb: add hugetlb helpers for soft dirty support

2021-02-18 Thread Mike Kravetz
On 2/17/21 8:24 AM, Peter Xu wrote: > On Wed, Feb 10, 2021 at 04:03:18PM -0800, Mike Kravetz wrote: >> Add interfaces to set and clear soft dirty in hugetlb ptes. Make >> hugetlb interfaces needed for /proc clear_refs available outside >> hugetlb.c. >> >> arch/

Re: [PATCH v3 1/4] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share()

2021-02-18 Thread Mike Kravetz
On 2/18/21 2:27 PM, Peter Xu wrote: > On Thu, Feb 18, 2021 at 02:13:52PM -0800, Mike Kravetz wrote: >> On 2/18/21 1:54 PM, Peter Xu wrote: >>> It is a preparation work to be able to behave differently in the per >>> architecture huge_pte_alloc() according to different

Re: [PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs

2021-02-18 Thread Mike Kravetz
not so > sure > about that after rereading the code, yet again. I have not followed this thread, but HugeTLB hit my mail filter and I can help with this question. No, PageTransCompoundMap() will not detect HugeTLB. hugetlb pages do not use the compound_mapcount_ptr field. So, that final check/return in PageTransCompoundMap() will always be false. -- Mike Kravetz

Re: [PATCH v3 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp

2021-02-18 Thread Mike Kravetz
inux/hugetlb.h | 3 +++ > mm/hugetlb.c| 51 + > 3 files changed, 58 insertions(+) Thanks, Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v3 1/4] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share()

2021-02-18 Thread Mike Kravetz
t; > pte_t *huge_pte_alloc(struct mm_struct *mm, > +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long addr, unsigned long sz) > { > pgd_t *pgd; Didn't kernel test robot report this build error on the first patch series? -- Mike Kravetz

Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

2021-02-18 Thread Mike Kravetz
On 2/18/21 9:34 AM, Mike Kravetz wrote: > On 2/18/21 9:25 AM, Jason Gunthorpe wrote: >> On Thu, Feb 18, 2021 at 02:45:54PM +, Matthew Wilcox wrote: >>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote: >>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kr

Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

2021-02-18 Thread Mike Kravetz
Matthew Wilcox wrote: >>>>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote: >>>>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz >>>>>> wrote: >>>>>>> page structs are not guaranteed to be contiguous for

Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

2021-02-18 Thread Mike Kravetz
On 2/18/21 9:25 AM, Jason Gunthorpe wrote: > On Thu, Feb 18, 2021 at 02:45:54PM +, Matthew Wilcox wrote: >> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote: >>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz >>> wrote: >>>> page structs

Re: [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp

2021-02-17 Thread Mike Kravetz
for (address = start; address < end; address += PUD_SIZE) { > + unsigned long tmp = address; > + > + ptep = huge_pte_offset(mm, address, sz); > + if (!ptep) > + continue; > + ptl = huge_pte_lock(h, m

Re: [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled

2021-02-17 Thread Mike Kravetz
> mm/hugetlb.c | 20 ++++++-- > 4 files changed, 26 insertions(+), 8 deletions(-) Thanks, Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [External] Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-02-17 Thread Mike Kravetz
On 2/17/21 12:13 AM, Michal Hocko wrote: > On Tue 16-02-21 11:44:34, Mike Kravetz wrote: > [...] >> If we are not going to do the allocations under the lock, then we will need >> to either preallocate or take the workqueue approach. > > We can still drop the lock temporar

Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

2021-02-17 Thread Mike Kravetz
On 2/17/21 11:02 AM, Andrew Morton wrote: > On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz > wrote: > >> page structs are not guaranteed to be contiguous for gigantic pages. The >> routine update_and_free_page can encounter a gigantic page, yet it assumes >> page

[PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption

2021-02-17 Thread Mike Kravetz
an Signed-off-by: Mike Kravetz Cc: --- mm/hugetlb.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4bdb58ab14cb..94e9fa803294 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1312,14 +1312,16 @@ static inline void destroy_compound_gig

[PATCH 2/2] hugetlb: fix copy_huge_page_from_user contig page struct assumption

2021-02-17 Thread Mike Kravetz
CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the gigantic page will be allocated. Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Mike Kravetz Cc: --- mm/memory.c | 10 +++

Re: [RFC PATCH] mm, oom: introduce vm.sacrifice_hugepage_on_oom

2021-02-16 Thread Mike Kravetz
caused a DOS scenario as Michal sugested. However, this is an 'opt in' feature. So, I would not expect anyone who carefully plans the size of their hugetlb pool to enable such a feature. If there is a use case where hugetlb pages are used in a non-essential application, this might be of use. -- Mike Kravetz

Re: [External] Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-02-16 Thread Mike Kravetz
s may not be too bad in the case of freeing a single page, but would become more complex when doing bulk freeing. After a little thought, the workqueue approach may even end up simpler. However, I would suggest a very simple workqueue implementation with non-blocking allocations. If we can not quickly get vmemmap pages, put the page back on the hugetlb free list and treat as a surplus page. -- Mike Kravetz

Re: [PATCH 2/2] mm/hugetlb: refactor subpage recording

2021-02-13 Thread Mike Kravetz
On 1/26/21 6:10 PM, Jason Gunthorpe wrote: > On Tue, Jan 26, 2021 at 05:58:53PM -0800, Mike Kravetz wrote: > >> As pointed out by Joao, you can also see the differences in pfn_to_page >> for CONFIG_SPARSE_VMEMMAP and CONFIG_SPARSEMEM. The only time we might >

Re: [PATCH v2 2/2] mm/hugetlb: refactor subpage recording

2021-02-13 Thread Mike Kravetz
On 2/13/21 7:44 AM, Zi Yan wrote: > On 11 Feb 2021, at 18:44, Mike Kravetz wrote: > >> On 2/11/21 12:47 PM, Zi Yan wrote: >>> On 28 Jan 2021, at 16:53, Mike Kravetz wrote: >>>> On 1/28/21 10:26 AM, Joao Martins wrote: >>>>> For a given hugepage back

Re: [PATCH] mm/hugetlb: remove redundant reservation check condition in alloc_huge_page()

2021-02-12 Thread Mike Kravetz
map_chg || avoid_reserve || !vma_resv_map(vma); Remove the redundant check. -- Mike Kravetz > > Signed-off-by: Miaohe Lin > --- > mm/hugetlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4f2c92ddbca4..36c3646fa55f 10

Re: [PATCH] mm/hugetlb: optimize the surplus state transfer code in move_hugetlb_state()

2021-02-12 Thread Mike Kravetz
g to comment that the usual case is migrating to another node and old_nid != new_nid. However, this really is workload and system configuration dependent. In any case, the quick check is worth potentially saving a lock/unlock cycle. Reviewed-by: Mike Kravetz -- Mike Kravetz > > dif

Re: [PATCH v5 04/10] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp

2021-02-12 Thread Mike Kravetz
On 2/12/21 1:18 PM, Peter Xu wrote: > On Fri, Feb 12, 2021 at 10:11:39AM -0800, Mike Kravetz wrote: >> On 2/10/21 1:21 PM, Axel Rasmussen wrote: >>> From: Peter Xu >>> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h >>> index b820078

Re: [PATCH v5 02/10] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled

2021-02-12 Thread Mike Kravetz
On 2/12/21 12:47 PM, Axel Rasmussen wrote: > On Fri, Feb 12, 2021 at 12:40 PM Peter Xu wrote: >> >> On Thu, Feb 11, 2021 at 04:19:55PM -0800, Mike Kravetz wrote: >>> want_pmd_share() is currently just a check for >>> CONFIG_ARCH_WANT_HUGE_PMD_SHARE. >>>

Re: [PATCH v5 05/10] userfaultfd: add minor fault registration mode

2021-02-12 Thread Mike Kravetz
n the kernel code. > + ret = handle_userfault(&vmf, VM_UFFD_MINOR); > + i_mmap_lock_read(mapping); > + mutex_lock(&hugetlb_fault_mutex_table[hash]); > + goto out; > + } > } > > /* > -- Mike Kravetz

Re: [PATCH v5 04/10] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp

2021-02-12 Thread Mike Kravetz
linux/mmu_notifier.h > +++ b/include/linux/mmu_notifier.h > @@ -51,6 +51,7 @@ enum mmu_notifier_event { > MMU_NOTIFY_SOFT_DIRTY, > MMU_NOTIFY_RELEASE, > MMU_NOTIFY_MIGRATE, > + MMU_NOTIFY_HUGETLB_UNSHARE, I don't claim to know much about mmu notifiers. Current

<    1   2   3   4   5   6   7   8   9   10   >