Re: [External] Re: [PATCH v12 04/13] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page

2021-01-13 Thread Mike Kravetz
ions. - If we see this being used for more general purpose remapping, then we should go with a more general purpose implementation. Again, just my opinion. -- Mike Kravetz

Re: [PATCH v12 03/13] mm: Introduce VM_WARN_ON_PAGE macro

2021-01-13 Thread Mike Kravetz
e/need for this macro in the following patch. Looks like Oscar has already done that, and free_bootmem_page will now use VM_BUG_ON_PAGE. So, this patch can be dropped. -- Mike Kravetz

Re: [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags

2021-01-13 Thread Mike Kravetz
On 1/13/21 6:45 AM, Matthew Wilcox wrote: > On Mon, Jan 11, 2021 at 01:01:50PM -0800, Mike Kravetz wrote: >> +if (hpage_spool(page)) { > > Could this be named hpage_subpool(), please? > Of course! -- Mike Kravetz

Re: [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags

2021-01-13 Thread Mike Kravetz
On 1/13/21 5:54 AM, Oscar Salvador wrote: > On Mon, Jan 11, 2021 at 01:01:50PM -0800, Mike Kravetz wrote: >> As hugetlbfs evolved, state information about hugetlb pages was added. >> One 'convenient' way of doing this was to use available fields in tail >> pages. Ove

Re: [PATCH v3 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one

2021-01-12 Thread Mike Kravetz
of that code. Free huge pages have a ref count of 0, and this code is checking for ref count of 1. That is a long way of saying that I still agree with this patch. The only modification/suggestion would be enhancing the commit message as suggested by Michal. -- Mike Kravetz

Re: [PATCH v3 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page

2021-01-11 Thread Mike Kravetz
f fail. However, we are very unlikely to ever exercise this code. Adding an optimization that is unlikely to be exercised is certainly questionable. Memory offline is the only code that could benefit from this optimization. As someone with more memory offline user experience, what is your opinion Mic

Re: [PATCH v3 3/6] mm: hugetlb: fix a race between freeing and dissolving the page

2021-01-11 Thread Mike Kravetz
nel.org > --- > mm/hugetlb.c | 26 ++ > 1 file changed, 26 insertions(+) Thanks, It is unfortunate that we have to add more huge page state information to fix this issue. However, I believe we have explored all other options. Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [RFC PATCH 0/2] userfaultfd: handle minor faults, add UFFDIO_CONTINUE

2021-01-11 Thread Mike Kravetz
associated with those regions are shared by all the mappings. Suitably aligned means 'on a 1GB boundary' and 1GB in size. When pmds are shared, your mappings will never see a 'minor fault'. This is because the PMD (page table entries) is shared. -- Mike Kravetz

Re: [PATCH v3 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page

2021-01-11 Thread Mike Kravetz
; --- > fs/hugetlbfs/inode.c| 3 ++- > include/linux/hugetlb.h | 2 ++ > mm/hugetlb.c| 2 +- > 3 files changed, 5 insertions(+), 2 deletions(-) Thanks. Although page_huge_active is declared in page-flags.h, I much prefer the declaration of set_page_huge_active to be in

Re: [RFC PATCH 0/2] userfaultfd: handle minor faults, add UFFDIO_CONTINUE

2021-01-11 Thread Mike Kravetz
On 1/11/21 3:08 PM, Peter Xu wrote: > On Mon, Jan 11, 2021 at 02:42:48PM -0800, Mike Kravetz wrote: >> On 1/7/21 11:04 AM, Axel Rasmussen wrote: >>> Overview >>> >>> >>> This series adds a new userfaultfd registration mode, >>

[RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags

2021-01-11 Thread Mike Kravetz
patches. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 11 +++--- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c| 80 +++-- 3 files changed, 67 insertions(+), 26 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index

[RFC PATCH 3/3] hugetlb: convert PageHugeTemporary() to HPageTempSurplus

2021-01-11 Thread Mike Kravetz
Use new hugetlb specific flag HPageTempSurplus to replace the PageHugeTemporary() interfaces. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 38 +- 1 file changed, 9 insertions(+), 29 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 34ce82f4823c

[RFC PATCH 2/3] hugetlb: convert page_huge_active() to HPageMigratable flag

2021-01-11 Thread Mike Kravetz
, it can not be static. Therefore, a new set of hugetlb page flag macros is added for non-static flag functions. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h| 17 +++ include/linux/page-flags.h | 6 mm/hugetlb.c | 60

[RFC PATCH 0/3] create hugetlb flags to consolidate state

2021-01-11 Thread Mike Kravetz
feature is enabled. [1] https://lore.kernel.org/r/20210106084739.63318-1-songmuc...@bytedance.com [2] https://lore.kernel.org/r/20210110124017.86750-4-songmuc...@bytedance.com Mike Kravetz (3): hugetlb: use page.private for hugetlb specific page flags hugetlb: convert page_huge_active

Re: [PATCH v2 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active

2021-01-08 Thread Mike Kravetz
ger a BUG_ON which is in the > page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the > VM_BUG_ON_PAGE. > > Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()") > Signed-off-by: Muchun Song > Reviewed-by: Mike Kravetz > --- > mm

Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page

2021-01-07 Thread Mike Kravetz
touch > the page->mapping. So we need to check PageHuge(). > The check and llist_add() should be protected by > hugetlb_lock. But we cannot do that. Right? If dissolve > happens after it is linked to the list. We also should > remove it from the list (hpage_freelist). It seems to make > the thing more complex. You are correct. This is also an issue/potential problem with this race. It seems that adding the state information might be the least complex way to address issue. -- Mike Kravetz

Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page

2021-01-07 Thread Mike Kravetz
On 1/7/21 12:40 AM, Michal Hocko wrote: > On Wed 06-01-21 12:58:29, Mike Kravetz wrote: >> On 1/6/21 8:56 AM, Michal Hocko wrote: >>> On Wed 06-01-21 16:47:36, Muchun Song wrote: >>>> There is a race condition between __free_huge_page() >>>> and

Re: [PATCH 3/6] hugetlb: add free page reporting support

2021-01-07 Thread Mike Kravetz
to_nid(page); > + > + list_move(>lru, >hugepage_freelists[nid]); > +} The above routines move pages between the free and active lists without any update to free page counts. How does that work? Will the number of entries on the free list get out of sync with the free_huge_pages counters? -- Mike Kravetz

Re: [PATCH] mm/hugetlb: Fix potential double free in hugetlb_register_node() error path

2021-01-07 Thread Mike Kravetz
. > > Fixes: a3437870160c ("hugetlb: new sysfs interface") > Signed-off-by: Miaohe Lin > Cc: > --- > mm/hugetlb.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) Thanks, this is a potential issue that should be fixed. Reviewed-by: Mike Kravetz This

Re: [PATCH] mm/hugetlb: Fix potential missing huge page size info

2021-01-07 Thread Mike Kravetz
-off-by: Miaohe Lin > Cc: > --- > mm/hugetlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks! Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page

2021-01-06 Thread Mike Kravetz
On 1/6/21 12:02 PM, Michal Hocko wrote: > On Wed 06-01-21 11:30:25, Mike Kravetz wrote: >> On 1/6/21 8:35 AM, Michal Hocko wrote: >>> On Wed 06-01-21 16:47:35, Muchun Song wrote: >>>> Because we only can isolate a active page via isolate_huge_page() >>>>

Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page

2021-01-06 Thread Mike Kravetz
d_gigantic_page(struct page *page, >> unsigned int order) { } >> #endif >> >> +/* >> + * Because we reuse the mapping field of some tail page structs, we should >> + * reset those mapping to initial value before @

Re: [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page

2021-01-06 Thread Mike Kravetz
tion could race with the page fault and the page could be migrated before being added to the page table of the faulting task. This was an issue when hugetlb_no_page set_page_huge_active right after allocating and clearing the huge page. Commit cb6acd01e2e4 moved the set_page_huge_active after adding the page to the page table to address this issue. -- Mike Kravetz

Re: [External] Re: [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page

2021-01-05 Thread Mike Kravetz
On 1/4/21 7:46 PM, Muchun Song wrote: > On Tue, Jan 5, 2021 at 11:14 AM Muchun Song wrote: >> >> On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz wrote: >>> >>> On 1/3/21 10:58 PM, Muchun Song wrote: >>>> When dissolve_free_huge_page() races with __free

Re: [External] Re: [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page

2021-01-05 Thread Mike Kravetz
On 1/4/21 6:55 PM, Muchun Song wrote: > On Tue, Jan 5, 2021 at 8:02 AM Mike Kravetz wrote: >> >> On 1/3/21 10:58 PM, Muchun Song wrote: >>> There is a race condition between __free_huge_page() >>> and dissolve_free_huge_page(). >>> >>> CPU0:

Re: [External] Re: [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page

2021-01-05 Thread Mike Kravetz
On 1/4/21 6:44 PM, Muchun Song wrote: > On Tue, Jan 5, 2021 at 6:40 AM Mike Kravetz wrote: >> >> On 1/3/21 10:58 PM, Muchun Song wrote: >>> Because we only can isolate a active page via isolate_huge_page() >>> and hugetlbfs_fallocate() forget to mark it

Re: [PATCH 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active

2021-01-04 Thread Mike Kravetz
ged, 1 deletion(-) Thanks! Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH 5/6] mm: hugetlb: fix a race between isolating and freeing page

2021-01-04 Thread Mike Kravetz
the buddy allocator. > > Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle > hugepage") > Signed-off-by: Muchun Song > --- > mm/hugetlb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Thanks! Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page

2021-01-04 Thread Mike Kravetz
queue. Is it acceptable to keep retrying in that case? In addition, the 'Free some vmemmap' series may slow the free_huge_page path even more. In these worst case scenarios, I am not sure we want to just spin retrying. -- Mike Kravetz > > Signed-off-by: Muchun Song > --- >

Re: [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page

2021-01-04 Thread Mike Kravetz
in_unlock(_lock) > + * spin_lock(_lock) > + * enqueue_huge_page(page) > + * // It is wrong, the page is already freed > + * spin_unlock(_lock) > + * > + * The race window is bet

Re: [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page

2021-01-04 Thread Mike Kravetz
); > + putback_active_hugepage(page); I'm curious why you used putback_active_hugepage() here instead of simply calling set_page_huge_active() before the put_page()? When the page was allocated, it was placed on the active list (alloc_huge_page). Therefore, the hug

Re: [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one

2021-01-04 Thread Mike Kravetz
+) Thanks! Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

2020-12-23 Thread Mike Kravetz
avior. Correct? >> On some systems, hugetlb pages are a precious resource and the sysadmin >> carefully configures the number needed by applications. Removing a hugetlb >> page (even for a very short period of time) could cause serious application >> failure. > > That' true, especially for 1G pages. Any suggestions? > Let the hugepage allocator be aware of this situation and retry ? I would hate to add that complexity to the allocator. This question is likely based on my lack of understanding of virtio-balloon usage and this reporting mechanism. But, why do the hugetlb pages have to be 'temporarily' allocated for reporting purposes? -- Mike Kravetz

Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

2020-12-22 Thread Mike Kravetz
an be allocated from the buddy is (MAX_ORDER - 1). So, the check should be '>='. -- Mike Kravetz

Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

2020-12-22 Thread Mike Kravetz
= HUGEPAGE_REPORTING_CAPACITY; > + > + /* update budget to reflect call to report function */ > + budget--; > + > + /* reacquire zone lock and resume processing */ > + spin_lock_irq(_lock); > + > + /* flush reported pages from the sg list */ > + hugepage_reporting_drain(prdev, h, sgl, > + HUGEPAGE_REPORTING_CAPACITY, !ret); > + > + /* > + * Reset next to first entry, the old next isn't valid > + * since we dropped the lock to report the pages > + */ > + next = list_first_entry(list, struct page, lru); > + > + /* exit on error */ > + if (ret) > + break; > + } > + > + /* Rotate any leftover pages to the head of the freelist */ > + if (>lru != list && !list_is_first(>lru, list)) > + list_rotate_to_front(>lru, list); > + > + spin_unlock_irq(_lock); > + > + return ret; > +} -- Mike Kravetz

Re: [External] Re: [PATCH v10 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page

2020-12-21 Thread Mike Kravetz
t with reuse addr and all subsequent pages in the range are mapped to reuse addr. I know it is not very generic or flexible. But, it might be easier to understand than the adjustments (+- PAGE_SIZE) currently being made in the code. Just a thought. -- Mike Kravetz

Re: [PATCH v10 02/11] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-12-21 Thread Mike Kravetz
r_page_bootmem_info is enabled if HUGETLB_PAGE_FREE_VMEMMAP > is defined. > > Signed-off-by: Muchun Song > --- > arch/x86/mm/init_64.c | 2 +- > fs/Kconfig| 18 ++ > 2 files changed, 19 insertions(+), 1 deletion(-) Thanks for updating, Acked-by: Mike Kravetz -- Mike Kravetz

[PATCH 1/2] mm/hugetlb: change hugetlb_reserve_pages() to type bool

2020-12-21 Thread Mike Kravetz
currently assume a zero return code indicates success. Change the callers to look for true to indicate success. No functional change, only code cleanup. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 4 ++-- include/linux/hugetlb.h | 2 +- mm/hugetlb.c| 37

[PATCH 2/2] hugetlbfs: remove special hugetlbfs_set_page_dirty()

2020-12-21 Thread Mike Kravetz
routine hugetlbfs_set_page_dirty with __set_page_dirty_no_writeback as it addresses both of these issues. Suggested-by: Matthew Wilcox Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs

Re: [PATCH v9 05/11] mm/hugetlb: Allocate the vmemmap pages associated with each HugeTLB page

2020-12-16 Thread Mike Kravetz
* vmemmap pages successfully, then we can free > + * a HugeTLB page. > + */ > + goto retry; > + } > + list_add_tail(>lru, list); > + } > +} > + -- Mike Kravetz

Re: [PATCH v9 04/11] mm/hugetlb: Defer freeing of HugeTLB pages

2020-12-16 Thread Mike Kravetz
e GFP_ATOMIC to allocate the vmemmap pages. > > Signed-off-by: Muchun Song It is unfortunate we need to add this complexitty, but I can not think of another way. One small comment (no required change) below. Reviewed-by: Mike Kravetz > --- > m

Re: [PATCH v9 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page

2020-12-16 Thread Mike Kravetz
On 12/16/20 2:25 PM, Oscar Salvador wrote: > On Wed, Dec 16, 2020 at 02:08:30PM -0800, Mike Kravetz wrote: >>> + * vmemmap_rmap_walk - walk vmemmap page table >>> + >>> +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, >>> +

Re: [PATCH v9 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page

2020-12-16 Thread Mike Kravetz
the name implies this routine will reuse vmemmap pages. Perhaps, it makes more sense to rename as 'vmemmap_remap_free'? It will first remap, then free vmemmap. But, then I looked at the code above and perhaps you are using the word '_reuse' because the page before the range will be reused? The vmemmap p

Re: [PATCH v9 02/11] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-12-15 Thread Mike Kravetz
On 12/15/20 5:03 PM, Mike Kravetz wrote: > On 12/13/20 7:45 AM, Muchun Song wrote: >> diff --git a/fs/Kconfig b/fs/Kconfig >> index 976e8b9033c4..4c3a9c614983 100644 >> --- a/fs/Kconfig >> +++ b/fs/Kconfig >> @@ -245,6 +245,21 @@ config HUGETLBFS >> config

Re: [PATCH v9 02/11] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-12-15 Thread Mike Kravetz
an be saved for each 1GB HugeTLB page. When a HugeTLB page is allocated or freed, the vmemmap array representing the range associated with the page will need to be remapped. When a page is allocated, vmemmap pages are freed after remapping. When a page

Re: [PATCH] mm/hugetlb: fix deadlock in hugetlb_cow error path

2020-12-15 Thread Mike Kravetz
On 12/14/20 5:06 PM, Mike Kravetz wrote: > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index d029d938d26d..8713f8ef0f4c 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4106,10 +4106,30 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, > struct v

[PATCH] mm/hugetlb: fix deadlock in hugetlb_cow error path

2020-12-14 Thread Mike Kravetz
.com Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Cc: Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d029d938d26d.

Re: [PATCH v7 00/15] Free some vmemmap pages of hugetlb page

2020-12-03 Thread Mike Kravetz
As previously mentioned, I feel qualified to review the hugetlb changes and some other closely related changes. However, this patch set is touching quite a few areas and I do not feel qualified to make authoritative statements about them all. I too hope others will take a look. -- Mike Kravetz

[PATCH] hugetlb_cgroup: fix offline of hugetlb cgroup with reservations

2020-12-03 Thread Mike Kravetz
ss_offline was noticed. The hstate index is not reinitialized each time through the do-while loop. Fix this as well. Fixes: 1adc4d419aa2 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations") Cc: Reported-by: Adrian Moreno Tested-by: Adrian Moreno Signed-off-by: M

Re: [RFC PATCH 01/13] fs/userfaultfd: fix wrong error code on WP & !VM_MAYWRITE

2020-12-01 Thread Mike Kravetz
ugetlbfs: allow > registration of ranges containing huge pages"). > > Fix it. > > Cc: Mike Kravetz > Cc: Jens Axboe > Cc: Andrea Arcangeli > Cc: Peter Xu > Cc: Alexander Viro > Cc: io-ur...@vger.kernel.org > Cc: linux-fsde...@vger.kernel.org > Cc: linu

Re: [PATCH v5 00/21] Free some vmemmap pages of hugetlb page

2020-11-23 Thread Mike Kravetz
On 11/22/20 11:38 PM, Michal Hocko wrote: > On Fri 20-11-20 09:45:12, Mike Kravetz wrote: >> On 11/20/20 1:43 AM, David Hildenbrand wrote: > [...] >>>>> To keep things easy, maybe simply never allow to free these hugetlb pages >>>>> again for now? If they

Re: [PATCH v5 00/21] Free some vmemmap pages of hugetlb page

2020-11-20 Thread Mike Kravetz
is would eliminate a bunch of the complex code doing page table manipulation. It does not address the issue of struct page pages going away which is being discussed here, but it could be a way to simply the first version of this code. If this is going to be an 'opt in' feature as previously suggested, then eliminating the PMD/huge page vmemmap mapping may be acceptable. My guess is that sysadmins would only 'opt in' if they expect most of system memory to be used by hugetlb pages. We certainly have database and virtualization use cases where this is true. -- Mike Kravetz

Re: [PATCH v4 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-19 Thread Mike Kravetz
changed, 85 insertions(+) Thanks for the cleanup. Oscar made some other comments. I only have one additional minor comment below. With those minor cleanups, Acked-by: Mike Kravetz > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c ... > +int vmemmap_pgtable_prealloc(struct hstate *h, struc

Re: [External] Re: [PATCH v4 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-19 Thread Mike Kravetz
one before removing the pages of struct pages. This keeps everything 'consistent' as things are remapped. If you want to use one of the 'pages of struct pages' for the new pte page, then there will be a period of time when things are inconsistent. Before setting up the mapping, some code could potentially access that pages of struct pages. I tend to agree that allocating allocating a new page is the safest thing to do here. Or, perhaps someone can think of a way make this safe. -- Mike Kravetz

Re: [PATCH v2] mm: hugetlb: fix type of delta parameter and related local variables in gather_surplus_pages()

2020-11-19 Thread Mike Kravetz
7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) Thank you, Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH] mm,hugetlb: Remove unneded initialization

2020-11-19 Thread Mike Kravetz
ile changed, 2 deletions(-) Thanks, Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v4 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate

2020-11-18 Thread Mike Kravetz
| 4 | ---+ | | | > + * | 2M| | 5 | -+ | | > + * | | | 6 | ---+ | > + * | | | 7 | -+ > + * | | +---+ > + * | | > + * | | > + * +---+ -- Mike Kravetz

Re: [PATCH v4 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate

2020-11-18 Thread Mike Kravetz
0; > > should not be needed. > Actually, we do not initialize other values like resv_huge_pages > or surplus_huge_pages. > > If that is the case, the "else" could go. > > Mike? Correct. Those assignments have been in the code for a very long time. > The changes itself look good to me. > I think that putting all the vemmap stuff into hugetlb-vmemmap.* was > the right choice. Agree! -- Mike Kravetz

Re: [PATCH v4 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-11-18 Thread Mike Kravetz
def_bool HUGETLB_PAGE > + depends on X86 > + depends on SPARSEMEM_VMEMMAP > + depends on HAVE_BOOTMEM_INFO_NODE > + help > + When using SPARSEMEM_VMEMMAP, the system can save up some memory Should that read, When using HUGETLB_PAGE_FREE_VMEMMAP, ... as the help message is for this config option. -- Mike Kravetz

Re: [PATCH] mm: hugetlb: fix type of delta parameter in gather_surplus_pages()

2020-11-18 Thread Mike Kravetz
; __must_hold(_lock) > { > struct list_head surplus_list; Thank you for noticing the type difference. However, if the parameter delta is changed to long then we should also change the local variables in gather_surplus_pages that are used with delta. Specifically, the local va

Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-12 Thread Mike Kravetz
On 11/12/20 4:35 PM, Mike Kravetz wrote: > On 11/10/20 7:41 PM, Muchun Song wrote: >> On Wed, Nov 11, 2020 at 8:47 AM Mike Kravetz wrote: >>> >>> On 11/8/20 6:10 AM, Muchun Song wrote: >>> I am reading the code incorrectly it does not appear page->l

Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-12 Thread Mike Kravetz
On 11/10/20 7:41 PM, Muchun Song wrote: > On Wed, Nov 11, 2020 at 8:47 AM Mike Kravetz wrote: >> >> On 11/8/20 6:10 AM, Muchun Song wrote: >> I am reading the code incorrectly it does not appear page->lru (of the huge >> page) is being used for

Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-10 Thread Mike Kravetz
r(page, HUGETLB_PAGE_DTOR); > set_hugetlb_cgroup(page, NULL); When I saw that comment in previous patch series, I assumed page->lru was being used to store preallocated pages and pages to free. However, unless I am reading the code incorrectly it does not appear page->lru (of the huge page) is being used for this purpose. Is that correct? If it is correct, would using page->lru of the huge page make this code simpler? I am just missing the reason why you are using page_huge_pte(page)->lru -- Mike Kravetz

Re: [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-11-10 Thread Mike Kravetz
On 11/10/20 11:50 AM, Matthew Wilcox wrote: > On Tue, Nov 10, 2020 at 11:31:31AM -0800, Mike Kravetz wrote: >> On 11/9/20 5:52 AM, Oscar Salvador wrote: >>> On Sun, Nov 08, 2020 at 10:10:55PM +0800, Muchun Song wrote: > > I don't like config options. I like boot opt

Re: [External] Re: [PATCH v3 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate

2020-11-10 Thread Mike Kravetz
> > Hi Mike, what's your opinion? I would be happy to see this in a separate file. As Oscar mentions, the hugetlb.c file/code is already somethat difficult to read and understand. -- Mike Kravetz

Re: [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-11-10 Thread Mike Kravetz
is might be a > trade-off between saving up memory and increasing the cost > of certain operations on allocation/free path. > That is why I mentioned it there. Yes, this is somewhat a trade-off. As a config option, this is something that would likely be decided by distros. I almost hate to suggest this, but is it something that an end user would want to decide? Is this something that perhaps should be a boot/kernel command line option? -- Mike Kravetz

Re: [PATCH v3 00/21] Free some vmemmap pages of hugetlb page

2020-11-10 Thread Mike Kravetz
ool. I remember the use case pointed out in commit 099730d67417. It says, "I have a hugetlbfs user which is never explicitly allocating huge pages with 'nr_hugepages'. They only set 'nr_overcommit_hugepages' and then let the pages be allocated from the buddy allocator at fault time." In this case, I suspect they were using 'page fault' allocation for initialization much like someone using /proc/sys/vm/nr_hugepages. So, the overhead may not be as noticeable. -- Mike Kravetz

Re: [PATCH 0/4] hugetlbfs: use hinode_rwsem for pmd sharing synchronization

2020-11-05 Thread Mike Kravetz
On 11/2/20 4:28 PM, Mike Kravetz wrote: > The RFC series reverted all patches where i_mmap_rwsem was used for > pmd sharing synchronization, and then added code to use hinode_rwsem. > This series ends up with the same code in the end, but is structured > as follows: > > - Rev

[PATCH] hugetlbfs: fix anon huge page migration race

2020-11-05 Thread Mike Kravetz
/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/ Reported-by: Qian Cai Suggested-by: Hugh Dickins Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Cc: Signed-off-by: Mike Kravetz --- mm/hugetlb.c| 90 +++

[PATCH 2/4] hugetlbfs: add hinode_rwsem to hugetlb specific inode

2020-11-02 Thread Mike Kravetz
ensure proper locking are also added. Use of the new semaphore and supporting routines will be provided in a later patch. Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Cc: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 12 inc

[PATCH 1/4] Revert hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race

2020-11-02 Thread Mike Kravetz
sem to address page fault/truncate race") Cc: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 28 mm/hugetlb.c | 23 --- 2 files changed, 20 insertions(+), 31 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetl

[PATCH 4/4] huegtlbfs: handle page fault/truncate races

2020-11-02 Thread Mike Kravetz
c: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 34 -- mm/hugetlb.c | 40 ++-- 2 files changed, 58 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index bc9979382a1e..6b

[PATCH 3/4] hugetlbfs: use hinode_rwsem for pmd sharing synchronization

2020-11-02 Thread Mike Kravetz
d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Cc: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 31 +-- include/linux/fs.h | 15 include/linux/hugetlb.h | 8 -- mm/hugetlb.c| 188 +++

[PATCH 0/4] hugetlbfs: use hinode_rwsem for pmd sharing synchronization

2020-11-02 Thread Mike Kravetz
is that this will be easier to review. Mike Kravetz (4): Revert hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race hugetlbfs: add hinode_rwsem to hugetlb specific inode hugetlbfs: use hinode_rwsem for pmd sharing synchronization huegtlbfs: handle page fault/truncate races fs/hugetlbfs/inode.c

Re: [External] Re: [PATCH v2 07/19] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page

2020-10-29 Thread Mike Kravetz
On 10/28/20 11:13 PM, Muchun Song wrote: > On Thu, Oct 29, 2020 at 7:42 AM Mike Kravetz wrote: >> >> On 10/26/20 7:51 AM, Muchun Song wrote: >>> + >>> +static inline spinlock_t *vmemmap_pmd_lockptr(pmd_t *pmd) >>> +{ >>> + static DEFINE_S

Re: [PATCH v2 05/19] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-10-28 Thread Mike Kravetz
(page, HUGETLB_PAGE_DTOR); > set_hugetlb_cgroup(page, NULL); > @@ -1783,6 +1892,14 @@ static struct page *alloc_fresh_huge_page(struct > hstate *h, > if (!page) > return NULL; > > + if (vmemmap_pgtable_prealloc(h, page)) { > + if (hstate_is_gigantic(h)) > + free_gigantic_page(page, huge_page_order(h)); > + else > + put_page(page); > + return NULL; > + } > + It seems a bit strange that we will fail a huge page allocation if vmemmap_pgtable_prealloc fails. Not sure, but it almost seems like we shold allow the allocation and log a warning? It is somewhat unfortunate that we need to allocate a page to free pages. > if (hstate_is_gigantic(h)) > prep_compound_gigantic_page(page, huge_page_order(h)); > prep_new_huge_page(h, page, page_to_nid(page)); > -- Mike Kravetz

Re: [RFC PATCH 0/3] Allocate memmap from hotadded memory (per device)

2020-10-28 Thread Mike Kravetz
mentioned. > More eyes on that series would be appreciated. That series will dynamically free and allocate memmap pages as hugetlb pages are allocated or freed. I haven't looked through this series, but my first thought is that we would need to ensure those allocs/frees are directed to t

Re: [External] Re: [PATCH v2 05/19] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-10-28 Thread Mike Kravetz
On 10/28/20 12:26 AM, Muchun Song wrote: > On Wed, Oct 28, 2020 at 8:33 AM Mike Kravetz wrote: >> On 10/26/20 7:51 AM, Muchun Song wrote: >> >> I see the following routines follow the pattern for vmemmap manipulation >> in dax. > > Did you mean move those

Re: [PATCH v2 07/19] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page

2020-10-28 Thread Mike Kravetz
gd_none(*pgd)) > + return NULL; > + p4d = p4d_offset(pgd, addr); > + if (p4d_none(*p4d)) > + return NULL; > + pud = pud_offset(p4d, addr); > + > + WARN_ON_ONCE(pud_bad(*pud)); > + if (pud_none(*pud) || pud_bad(*pud)) > + return NULL; > + pmd = pmd_offset(pud, addr); > + > + return pmd; > +} That routine is not really hugetlb specific. Perhaps we could move it to sparse-vmemmap.c? Or elsewhere? -- Mike Kravetz

Re: [PATCH v2 04/19] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate

2020-10-27 Thread Mike Kravetz
; 2 files changed, 38 insertions(+) Patch looks fine with updated commit message. Acked-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH] mm/hugetable.c: align some prints

2020-10-26 Thread Mike Kravetz
es from day 1 of their existence? I would prefer not to touch them in case some is depending on current format. -- Mike Kravetz > --- > drivers/base/node.c | 4 ++-- > mm/hugetlb.c| 6 +++--- > 2 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/

[RFC PATCH v2 3/4] hugetlbfs: use hinode_rwsem for pmd sharing synchronization

2020-10-26 Thread Mike Kravetz
the semaphore if pmd sharing is possible. Also, add lockdep_assert calls to huge_pmd_share/unshare to help catch callers not using the proper locking. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 11 +- include/linux/hugetlb.h | 8 ++-- mm/hugetlb.c| 83

[RFC PATCH v2 1/4] hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync

2020-10-26 Thread Mike Kravetz
s per hugetlb calculation") commit 87bf91d39bb5 ("hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race") commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Signed-off-by

[RFC PATCH v2 2/4] hugetlbfs: add hinode_rwsem to hugetlb specific inode

2020-10-26 Thread Mike Kravetz
is possible in an attempt to minimize performance impacts. In addition, routines which can be used with lockdep to help with proper locking are also added. Use of the new semaphore and supporting routines will be provided in a subsequent patch. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c

[RFC PATCH v2 4/4] huegtlbfs: handle page fault/truncate races

2020-10-26 Thread Mike Kravetz
as necessary. File truncation (remove_inode_hugepages) needs to handle page mapping changes that could have happened before locking the page. This could happen if page was added to page cache and later backed out in fault processing. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 34

[RFC PATCH v2 0/4] hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization

2020-10-26 Thread Mike Kravetz
hinode_lock_write() helper. - Split out addition of hinode_rwsem and helper routines to a separate patch. [1] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/ Mike Kravetz (4): hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync hugetlbfs: add

Re: [PATCH rfc 0/2] mm: cma: make cma_release() non-blocking

2020-10-22 Thread Mike Kravetz
it could be used by the hugetlb code to make it simpler. -- Mike Kravetz

Re: [PATCH rfc 0/2] mm: cma: make cma_release() non-blocking

2020-10-22 Thread Mike Kravetz
On 10/21/20 7:33 PM, Roman Gushchin wrote: > On Wed, Oct 21, 2020 at 05:15:53PM -0700, Mike Kravetz wrote: >> On 10/16/20 3:52 PM, Roman Gushchin wrote: >>> This small patchset makes cma_release() non-blocking and simplifies >>> the code in hugetlbfs, where previously

Re: [PATCH rfc 0/2] mm: cma: make cma_release() non-blocking

2020-10-21 Thread Mike Kravetz
he bitmap in cma_clear_bitmap and could block. However, I do not see why cma->lock has to be a mutex. I may be missing something, but I do not see any code protected by the mutex doing anything that could sleep? Could we simply change that mutex to a spinlock? -- Mike Kravetz

[PATCH] hugetlb_cgroup: fix reservation accounting

2020-10-21 Thread Mike Kravetz
quot;) Cc: Reported-by: Michal Privoznik Co-developed-by: David Hildenbrand Signed-off-by: David Hildenbrand Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 67fc6383995b.

Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2020-10-21 Thread Mike Kravetz
> On 21.10.20 15:11, David Hildenbrand wrote: >> On 21.10.20 14:57, Michal Privoznik wrote: >>> On 10/21/20 5:35 AM, Mike Kravetz wrote: >>>> On 10/20/20 6:38 AM, David Hildenbrand wrote: >>>> >>>> It would be good if Mina (at least) would lo

Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2020-10-20 Thread Mike Kravetz
g seen with the qemu use case. I'm still doing more testing and code inspection to look for other issues. >From 861bcd7d0443f18a5fed3c3ddc5f1c71e78c4ef4 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Tue, 20 Oct 2020 20:21:42 -0700 Subject: [PATCH] hugetlb_cgroup: fix reservation accounting S

Re: [RFC PATCH 2/3] hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization

2020-10-15 Thread Mike Kravetz
On 10/15/20 4:05 PM, HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Oct 13, 2020 at 04:10:59PM -0700, Mike Kravetz wrote: >> Due to pmd sharing, the huge PTE pointer returned by huge_pte_alloc >> may not be valid. This can happen if a call to huge_pmd_unshare for >> the same p

Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2020-10-15 Thread Mike Kravetz
On 10/14/20 11:31 AM, Mike Kravetz wrote: > On 10/14/20 11:18 AM, David Hildenbrand wrote: > > FWIW - I ran libhugetlbfs tests which do a bunch of hole punching > with (and without) hugetlb controller enabled and did not see this issue. > I took a closer look aft

Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2020-10-14 Thread Mike Kravetz
ks for having a look. I started poking around myself but, > being new to cgroup code, I even failed to understand why that code gets > triggered though the hugetlb controller isn't even enabled. > > I assume you at least have to make sure that there is > a page populated (MMAP_POPULATE, or rea

[RFC PATCH 2/3] hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization

2020-10-13 Thread Mike Kravetz
is not taken if the caller knows the target can not possibly be part of a shared pmd. lockdep_assert calls are added to huge_pmd_share and huge_pmd_unshare to help catch callers not using the proper locking. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 8 include/linux/hugetlb.h | 66

[RFC PATCH 3/3] huegtlbfs: handle page fault/truncate races

2020-10-13 Thread Mike Kravetz
as necessary. File truncation (remove_inode_hugepages) needs to handle page mapping changes that could have happened before locking the page. This could happen if page was added to page cache and later backed out in fault processing. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 34

[RFC PATCH 0/3] hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization

2020-10-13 Thread Mike Kravetz
slate' approach seemed best but I am open to whatever would be easiest to review. [1] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/ Mike Kravetz (3): hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync hugetlbfs: introduce hinode_rwsem for pmd

[RFC PATCH 1/3] hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync

2020-10-13 Thread Mike Kravetz
s per hugetlb calculation") commit 87bf91d39bb5 ("hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race") commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Signed-off-by

Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

2020-10-12 Thread Mike Kravetz
On 10/12/20 6:59 PM, Xing Zhengjun wrote: > > > On 10/13/2020 1:40 AM, Mike Kravetz wrote: >> On 10/11/20 10:29 PM, Xing Zhengjun wrote: >>> Hi Mike, >>> >>> I re-test it in v5.9-rc8, the regression still existed. It is almost >>>

<    1   2   3   4   5   6   7   8   9   10   >