Re: [PATCH 08/10] mm/hugetlb: return non-isolated page in the loop instead of break and check

2020-08-10 Thread Mike Kravetz
t; Signed-off-by: Wei Yang I agree with Baoquan He in that this is more of a style change. Certainly there is the potential to avoid an extra check and that is always good. The real value in this patch (IMO) is removal of the stale comment. Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH 09/10] mm/hugetlb: narrow the hugetlb_lock protection area during preparing huge page

2020-08-10 Thread Mike Kravetz
On 8/7/20 2:12 AM, Wei Yang wrote: > set_hugetlb_cgroup_[rsvd] just manipulate page local data, which is not > necessary to be protected by hugetlb_lock. > > Let's take this out. > > Signed-off-by: Wei Yang Thanks! Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH 10/10] mm/hugetlb: not necessary to abuse temporary page to workaround the nasty free_huge_page

2020-08-10 Thread Mike Kravetz
ol resize which itself could cause surplus to exceed overcommit. IMO both approaches are valid. - Advantage of temporary page is that it can not cause surplus to exceed overcommit. Disadvantage is as mentioned in the comment 'abuse of temporary page'. - Advantage of this patch is that it uses existing counters. Disadvantage is that it can momentarily cause surplus to exceed overcommit. Unless someone has a strong opinion, I prefer the changes in this patch. -- Mike Kravetz

Re: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

2020-08-10 Thread Mike Kravetz
Cc: Michal On 8/10/20 7:11 PM, Baoquan He wrote: > Hi Mike, > > On 07/23/20 at 11:21am, Mike Kravetz wrote: >> On 7/23/20 2:11 AM, Baoquan He wrote: > ... >>>> But is kernel expected to warn for all such situations where the user >>>> requested r

Re: [RFC PATCH 00/24] mm/hugetlb: Free some vmemmap pages of hugetlb page

2020-09-29 Thread Mike Kravetz
tlb pages. https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg02407.html More care/coordination would be needed to support double mapping with this new option. However, this series provides a boot option to disable freeing of unneeded page structs. -- Mike Kravetz

Re: [RFC PATCH 01/24] mm/memory_hotplug: Move bootmem info registration API to bootmem_info.c

2020-09-29 Thread Mike Kravetz
On 9/15/20 5:59 AM, Muchun Song wrote: > Move bootmem info registration common API to individual bootmem_info.c > for later patch use. > > Signed-off-by: Muchun Song This is just code movement. Acked-by: Mike Kravetz -- Mike Kravetz > --- > arch/x86/mm/init_64.c

Re: [RFC PATCH 02/24] mm/memory_hotplug: Move {get,put}_page_bootmem() to bootmem_info.c

2020-09-29 Thread Mike Kravetz
On 9/15/20 5:59 AM, Muchun Song wrote: > In the later patch, we will use {get,put}_page_bootmem() to initialize > the page for vmemmap or free vmemmap page to buddy. So move them out of > CONFIG_MEMORY_HOTPLUG_SPARSE. > > Signed-off-by: Muchun Song More code movement. Acked-b

Re: [RFC PATCH 03/24] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-09-29 Thread Mike Kravetz
that support it, say Y here. > + > + If unsure, say N. > + I could be wrong, but I believe the convention is introduce the config option at the same time code which depends on the option is introduced. Therefore, it might be better to combine with the next patch. Also, it looks like most of your development was done on x86. Should this option be limited to x86 only for now? -- Mike Kravetz

Re: [v5] mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged

2020-09-30 Thread Mike Kravetz
ree_kbytes is not updated to 11334 because user defined value 9 is preferred # cat /proc/sys/vm/min_free_kbytes 90112 -- Mike Kravetz

Re: [v5] mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged

2020-09-30 Thread Mike Kravetz
On 9/30/20 1:47 PM, Vijay Balakrishna wrote: > On 9/30/2020 11:20 AM, Mike Kravetz wrote: >> On 9/29/20 9:49 AM, Vijay Balakrishna wrote: >> >> Sorry for jumping in so late. Should we use this as an opportunity to >> also fix up the messages logged when (re)calcu

Re: [RFC PATCH 05/24] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate

2020-09-30 Thread Mike Kravetz
at, but hopefully avoids some confusion. -- Mike Kravetz > So we introduce a new nr_free_vmemmap_pages field in the hstate to > indicate how many vmemmap pages associated with a hugetlb page that we > can free to buddy system. > > Signed-off-by: Muchun Song > --- > inc

Re: [RFC V2] mm/vmstat: Add events for HugeTLB migration

2020-10-01 Thread Mike Kravetz
migration statistics. While here, this updates current trace event > 'mm_migrate_pages' to accommodate now available HugeTLB based statistics. > > Cc: Daniel Jordan > Cc: Zi Yan > Cc: John Hubbard > Cc: Mike Kravetz > Cc: Oscar Salvador > Cc: Andrew Morton > Cc: linux

Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2020-10-14 Thread Mike Kravetz
ks for having a look. I started poking around myself but, > being new to cgroup code, I even failed to understand why that code gets > triggered though the hugetlb controller isn't even enabled. > > I assume you at least have to make sure that there is > a page populated (MMAP_POPULATE, or rea

Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2020-10-15 Thread Mike Kravetz
On 10/14/20 11:31 AM, Mike Kravetz wrote: > On 10/14/20 11:18 AM, David Hildenbrand wrote: > > FWIW - I ran libhugetlbfs tests which do a bunch of hole punching > with (and without) hugetlb controller enabled and did not see this issue. > I took a closer look aft

Re: [RFC PATCH 2/3] hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization

2020-10-15 Thread Mike Kravetz
On 10/15/20 4:05 PM, HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Oct 13, 2020 at 04:10:59PM -0700, Mike Kravetz wrote: >> Due to pmd sharing, the huge PTE pointer returned by huge_pte_alloc >> may not be valid. This can happen if a call to huge_pmd_unshare for >> the same p

Re: [RFC PATCH 00/24] mm/hugetlb: Free some vmemmap pages of hugetlb page

2020-10-07 Thread Mike Kravetz
On 9/29/20 2:58 PM, Mike Kravetz wrote: > On 9/15/20 5:59 AM, Muchun Song wrote: >> Hi all, >> >> This patch series will free some vmemmap pages(struct page structures) >> associated with each hugetlbpage when preallocated to save memory. > ... >> T

Re: [PATCH v3 7/8] mm/mempolicy: use a standard migration target allocation callback

2020-10-08 Thread Mike Kravetz
> But my belief (best confirmed by you running your tests with a > suitably placed BUG_ON or WARN_ON) is that you'll never find a > PageAnon in a vma_shareable() area, so will never need try_to_unmap() > to unshare a pagetable in the PageAnon case, so won't need i_mmap_rwsem > for

Re: [PATCH v3 7/8] mm/mempolicy: use a standard migration target allocation callback

2020-10-09 Thread Mike Kravetz
On 10/8/20 10:50 PM, Hugh Dickins wrote: > On Thu, 8 Oct 2020, Mike Kravetz wrote: >> On 10/7/20 8:21 PM, Hugh Dickins wrote: >>> >>> Mike, j'accuse... your 5.7 commit c0d0381ade79 ("hugetlbfs: >>> use i_mmap_rwsem for more pmd sharing synchronization"

Re: [PATCH v3 7/8] mm/mempolicy: use a standard migration target allocation callback

2020-10-09 Thread Mike Kravetz
On 10/9/20 3:23 PM, Hugh Dickins wrote: > On Fri, 9 Oct 2020, Mike Kravetz wrote: >> On 10/8/20 10:50 PM, Hugh Dickins wrote: >>> >>> It's a problem I've faced before in tmpfs, keeping a hold on the >>> mapping while page lock is dropped. Quite awkward: ig

Re: [RFC PATCH] cma: make number of CMA areas dynamic, remove CONFIG_CMA_AREAS

2020-09-16 Thread Mike Kravetz
On 9/15/20 9:32 PM, Christoph Hellwig wrote: > On Wed, Sep 02, 2020 at 08:02:04PM -0700, Mike Kravetz wrote: >> --- a/arch/arm/mm/dma-mapping.c >> +++ b/arch/arm/mm/dma-mapping.c >> @@ -383,25 +383,34 @@ postcore_initcall(atomic_pool_init); >> struct dma_contig_early_re

Re: [PATCH] cma: make number of CMA areas dynamic, remove CONFIG_CMA_AREAS

2020-09-16 Thread Mike Kravetz
On 9/16/20 2:14 AM, Song Bao Hua (Barry Song) wrote: >>> -Original Message- >>> From: Mike Kravetz [mailto:mike.krav...@oracle.com] >>> Sent: Wednesday, September 16, 2020 8:57 AM >>> To: linux...@kvack.org; linux-kernel@vger.kernel.org; >>>

Re: [drm/mgag200] 913ec479bb: vm-scalability.throughput 26.2% improvement

2020-08-27 Thread Mike Kravetz
n Xing and improved performance by 20 something percent. That seems in line with this report/improvement. Perhaps the tooling is not always accurate in determining the commit which causes the performance changes? Perhaps I am misreading information in the reports? -- Mike Kravetz

Re: [External] Re: [PATCH] mm/hugetlb: Fix a race between hugetlb sysctl handlers

2020-08-27 Thread Mike Kravetz
lb ctl_table entries are defined and initialized. This is not something you introduced. The unnecessary assignments are in the existing code. However, there is no need to carry them forward. -- Mike Kravetz

Re: [PATCH 10/10] mm/hugetlb: not necessary to abuse temporary page to workaround the nasty free_huge_page

2020-08-11 Thread Mike Kravetz
sted' it by allocating an 'extra' page and freeing it via this method in alloc_surplus_huge_page. >From 864c5f8ef4900c95ca3f6f2363a85f3cb25e793e Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Tue, 11 Aug 2020 12:45:41 -0700 Subject: [PATCH] hugetlb: optimize race error return in alloc_s

Re: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

2020-08-11 Thread Mike Kravetz
of this, let's just leave things as they are and not add the message. It is pretty clear that a user needs to read the value after writing to determine if all pages were allocated. The log message would add little benefit to the end user. -- Mike Kravetz

Re: [PATCH 10/10] mm/hugetlb: not necessary to abuse temporary page to workaround the nasty free_huge_page

2020-08-11 Thread Mike Kravetz
On 8/11/20 4:19 PM, Wei Yang wrote: > On Tue, Aug 11, 2020 at 02:43:28PM -0700, Mike Kravetz wrote: >> Subject: [PATCH] hugetlb: optimize race error return in >> alloc_surplus_huge_page >> >> The routine alloc_surplus_huge_page() could race with with a pool >&

Re: [RFC PATCH] cma: make number of CMA areas dynamic, remove CONFIG_CMA_AREAS

2020-09-08 Thread Mike Kravetz
On 9/3/20 6:58 PM, Song Bao Hua (Barry Song) wrote: > >> -Original Message- >> From: Mike Kravetz [mailto:mike.krav...@oracle.com] >> Sent: Thursday, September 3, 2020 3:02 PM >> To: linux...@kvack.org; linux-kernel@vger.kernel.org; >> linux-arm-ke

Re: [Patch v3 7/7] mm/hugetlb: take the free hpage during the iteration directly

2020-08-31 Thread Mike Kravetz
break; Previously, when encountering a PageHWPoison(page) the loop would continue and check the next page in the list. It now breaks the loop and returns NULL. Is not this a change in behavior? Perhaps you want to change that break to a continue. Or, restructure in some other wa

Re: [Patch v4 7/7] mm/hugetlb: take the free hpage during the iteration directly

2020-09-01 Thread Mike Kravetz
t; Signed-off-by: Wei Yang Thank you! Reviewed-by: Mike Kravetz -- Mike Kravetz

[PATCH] hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem

2020-08-03 Thread Mike Kravetz
ppropriate locking. Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Cc: Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 4 1 file changed, 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 590111ea6975..0f6716422a53 100644 --- a/m

Re: [PATCH] hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem

2020-08-03 Thread Mike Kravetz
On 8/3/20 3:52 PM, Matthew Wilcox wrote: > On Mon, Aug 03, 2020 at 03:43:35PM -0700, Mike Kravetz wrote: >> Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing >> synchronization") requires callers of huge_pte_alloc to hold i_mmap_rwsem >> in at le

Re: [PATCH] hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem

2020-08-03 Thread Mike Kravetz
On 8/3/20 4:00 PM, Mike Kravetz wrote: > On 8/3/20 3:52 PM, Matthew Wilcox wrote: >> On Mon, Aug 03, 2020 at 03:43:35PM -0700, Mike Kravetz wrote: >>> Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing >>> synchronization") requi

Re: [PATCH v7 0/3] make dma_alloc_coherent NUMA-aware by per-NUMA CMA

2020-08-21 Thread Mike Kravetz
asn't too worried because of the limited hugetlb use case. However, this series is adding another user of per-node CMA areas. With more users, should try to sync up number of CMA areas and number of nodes? Or, perhaps I am worrying about nothing? -- Mike Kravetz

Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

2020-08-21 Thread Mike Kravetz
On 8/21/20 1:39 AM, Xing Zhengjun wrote: > > > On 6/26/2020 5:33 AM, Mike Kravetz wrote: >> On 6/22/20 3:01 PM, Mike Kravetz wrote: >>> On 6/21/20 5:55 PM, kernel test robot wrote: >>>> Greeting, >>>> >>>> FYI, we noticed a -33.4%

Re: [PATCH v7 0/3] make dma_alloc_coherent NUMA-aware by per-NUMA CMA

2020-08-21 Thread Mike Kravetz
On 8/21/20 1:47 PM, Song Bao Hua (Barry Song) wrote: > > >> -Original Message- >> From: Song Bao Hua (Barry Song) >> Sent: Saturday, August 22, 2020 7:27 AM >> To: 'Mike Kravetz' ; h...@lst.de; >> m.szyprow...@samsung.com; robin.mur...@arm.com;

Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

2020-08-21 Thread Mike Kravetz
On 8/21/20 2:02 PM, Mike Kravetz wrote: > Would you be willing to test this series on top of 34ae204f1851? I will need > to rebase the series to take the changes made by 34ae204f1851 into account. Actually, the series in this thread will apply/run cleanly on top of 34ae204f1851. N

[PATCH] hugetlb: add lockdep check for i_mmap_rwsem held in huge_pmd_share

2020-09-11 Thread Mike Kravetz
have been included with commit 34ae204f1851 ("hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem"). Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 81

[PATCH] cma: make number of CMA areas dynamic, remove CONFIG_CMA_AREAS

2020-09-15 Thread Mike Kravetz
before normal memory allocators, so use the memblock allocator. Acked-by: Roman Gushchin Signed-off-by: Mike Kravetz --- rfc->v1 - Made minor changes suggested by Song Bao Hua (Barry Song) - Removed check for late calls to cma_init_reserved_mem that was part of RFC. - Added ACK f

Re: [PATCH v2 04/19] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate

2020-10-27 Thread Mike Kravetz
; 2 files changed, 38 insertions(+) Patch looks fine with updated commit message. Acked-by: Mike Kravetz -- Mike Kravetz

[PATCH 1/2] mm/hugetlb: change hugetlb_reserve_pages() to type bool

2020-12-21 Thread Mike Kravetz
currently assume a zero return code indicates success. Change the callers to look for true to indicate success. No functional change, only code cleanup. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 4 ++-- include/linux/hugetlb.h | 2 +- mm/hugetlb.c| 37

[PATCH 2/2] hugetlbfs: remove special hugetlbfs_set_page_dirty()

2020-12-21 Thread Mike Kravetz
routine hugetlbfs_set_page_dirty with __set_page_dirty_no_writeback as it addresses both of these issues. Suggested-by: Matthew Wilcox Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs

Re: [PATCH v10 02/11] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-12-21 Thread Mike Kravetz
r_page_bootmem_info is enabled if HUGETLB_PAGE_FREE_VMEMMAP > is defined. > > Signed-off-by: Muchun Song > --- > arch/x86/mm/init_64.c | 2 +- > fs/Kconfig| 18 ++ > 2 files changed, 19 insertions(+), 1 deletion(-) Thanks for updating, Acked-by: Mike Kravetz -- Mike Kravetz

Re: [External] Re: [PATCH v10 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page

2020-12-21 Thread Mike Kravetz
t with reuse addr and all subsequent pages in the range are mapped to reuse addr. I know it is not very generic or flexible. But, it might be easier to understand than the adjustments (+- PAGE_SIZE) currently being made in the code. Just a thought. -- Mike Kravetz

Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

2020-12-22 Thread Mike Kravetz
= HUGEPAGE_REPORTING_CAPACITY; > + > + /* update budget to reflect call to report function */ > + budget--; > + > + /* reacquire zone lock and resume processing */ > + spin_lock_irq(_lock); > + > + /* flush reported pages from the sg list */ > + hugepage_reporting_drain(prdev, h, sgl, > + HUGEPAGE_REPORTING_CAPACITY, !ret); > + > + /* > + * Reset next to first entry, the old next isn't valid > + * since we dropped the lock to report the pages > + */ > + next = list_first_entry(list, struct page, lru); > + > + /* exit on error */ > + if (ret) > + break; > + } > + > + /* Rotate any leftover pages to the head of the freelist */ > + if (>lru != list && !list_is_first(>lru, list)) > + list_rotate_to_front(>lru, list); > + > + spin_unlock_irq(_lock); > + > + return ret; > +} -- Mike Kravetz

Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

2020-12-22 Thread Mike Kravetz
an be allocated from the buddy is (MAX_ORDER - 1). So, the check should be '>='. -- Mike Kravetz

Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

2020-12-23 Thread Mike Kravetz
avior. Correct? >> On some systems, hugetlb pages are a precious resource and the sysadmin >> carefully configures the number needed by applications. Removing a hugetlb >> page (even for a very short period of time) could cause serious application >> failure. > > That' true, especially for 1G pages. Any suggestions? > Let the hugepage allocator be aware of this situation and retry ? I would hate to add that complexity to the allocator. This question is likely based on my lack of understanding of virtio-balloon usage and this reporting mechanism. But, why do the hugetlb pages have to be 'temporarily' allocated for reporting purposes? -- Mike Kravetz

Re: [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one

2021-01-04 Thread Mike Kravetz
+) Thanks! Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page

2021-01-04 Thread Mike Kravetz
); > + putback_active_hugepage(page); I'm curious why you used putback_active_hugepage() here instead of simply calling set_page_huge_active() before the put_page()? When the page was allocated, it was placed on the active list (alloc_huge_page). Therefore, the hug

Re: [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page

2021-01-04 Thread Mike Kravetz
in_unlock(_lock) > + * spin_lock(_lock) > + * enqueue_huge_page(page) > + * // It is wrong, the page is already freed > + * spin_unlock(_lock) > + * > + * The race window is bet

Re: [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page

2021-01-04 Thread Mike Kravetz
queue. Is it acceptable to keep retrying in that case? In addition, the 'Free some vmemmap' series may slow the free_huge_page path even more. In these worst case scenarios, I am not sure we want to just spin retrying. -- Mike Kravetz > > Signed-off-by: Muchun Song > --- >

Re: [PATCH 5/6] mm: hugetlb: fix a race between isolating and freeing page

2021-01-04 Thread Mike Kravetz
the buddy allocator. > > Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle > hugepage") > Signed-off-by: Muchun Song > --- > mm/hugetlb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Thanks! Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active

2021-01-04 Thread Mike Kravetz
ged, 1 deletion(-) Thanks! Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [External] Re: [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page

2021-01-05 Thread Mike Kravetz
On 1/4/21 6:44 PM, Muchun Song wrote: > On Tue, Jan 5, 2021 at 6:40 AM Mike Kravetz wrote: >> >> On 1/3/21 10:58 PM, Muchun Song wrote: >>> Because we only can isolate a active page via isolate_huge_page() >>> and hugetlbfs_fallocate() forget to mark it

Re: [External] Re: [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page

2021-01-05 Thread Mike Kravetz
On 1/4/21 6:55 PM, Muchun Song wrote: > On Tue, Jan 5, 2021 at 8:02 AM Mike Kravetz wrote: >> >> On 1/3/21 10:58 PM, Muchun Song wrote: >>> There is a race condition between __free_huge_page() >>> and dissolve_free_huge_page(). >>> >>> CPU0:

Re: [External] Re: [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page

2021-01-05 Thread Mike Kravetz
On 1/4/21 7:46 PM, Muchun Song wrote: > On Tue, Jan 5, 2021 at 11:14 AM Muchun Song wrote: >> >> On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz wrote: >>> >>> On 1/3/21 10:58 PM, Muchun Song wrote: >>>> When dissolve_free_huge_page() races with __free

[PATCH] mm/hugetlb: fix deadlock in hugetlb_cow error path

2020-12-14 Thread Mike Kravetz
.com Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Cc: Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d029d938d26d.

Re: [PATCH] mm/hugetlb: fix deadlock in hugetlb_cow error path

2020-12-15 Thread Mike Kravetz
On 12/14/20 5:06 PM, Mike Kravetz wrote: > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index d029d938d26d..8713f8ef0f4c 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4106,10 +4106,30 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, > struct v

Re: [PATCH v9 02/11] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-12-15 Thread Mike Kravetz
an be saved for each 1GB HugeTLB page. When a HugeTLB page is allocated or freed, the vmemmap array representing the range associated with the page will need to be remapped. When a page is allocated, vmemmap pages are freed after remapping. When a page

Re: [PATCH v9 02/11] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-12-15 Thread Mike Kravetz
On 12/15/20 5:03 PM, Mike Kravetz wrote: > On 12/13/20 7:45 AM, Muchun Song wrote: >> diff --git a/fs/Kconfig b/fs/Kconfig >> index 976e8b9033c4..4c3a9c614983 100644 >> --- a/fs/Kconfig >> +++ b/fs/Kconfig >> @@ -245,6 +245,21 @@ config HUGETLBFS >> config

Re: [PATCH v9 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page

2020-12-16 Thread Mike Kravetz
the name implies this routine will reuse vmemmap pages. Perhaps, it makes more sense to rename as 'vmemmap_remap_free'? It will first remap, then free vmemmap. But, then I looked at the code above and perhaps you are using the word '_reuse' because the page before the range will be reused? The vmemmap p

Re: [PATCH v9 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page

2020-12-16 Thread Mike Kravetz
On 12/16/20 2:25 PM, Oscar Salvador wrote: > On Wed, Dec 16, 2020 at 02:08:30PM -0800, Mike Kravetz wrote: >>> + * vmemmap_rmap_walk - walk vmemmap page table >>> + >>> +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, >>> +

Re: [PATCH v9 04/11] mm/hugetlb: Defer freeing of HugeTLB pages

2020-12-16 Thread Mike Kravetz
e GFP_ATOMIC to allocate the vmemmap pages. > > Signed-off-by: Muchun Song It is unfortunate we need to add this complexitty, but I can not think of another way. One small comment (no required change) below. Reviewed-by: Mike Kravetz > --- > m

Re: [PATCH v9 05/11] mm/hugetlb: Allocate the vmemmap pages associated with each HugeTLB page

2020-12-16 Thread Mike Kravetz
* vmemmap pages successfully, then we can free > + * a HugeTLB page. > + */ > + goto retry; > + } > + list_add_tail(>lru, list); > + } > +} > + -- Mike Kravetz

Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-12 Thread Mike Kravetz
On 11/10/20 7:41 PM, Muchun Song wrote: > On Wed, Nov 11, 2020 at 8:47 AM Mike Kravetz wrote: >> >> On 11/8/20 6:10 AM, Muchun Song wrote: >> I am reading the code incorrectly it does not appear page->lru (of the huge >> page) is being used for

Re: [PATCH] mm: hugetlb: fix type of delta parameter in gather_surplus_pages()

2020-11-18 Thread Mike Kravetz
; __must_hold(_lock) > { > struct list_head surplus_list; Thank you for noticing the type difference. However, if the parameter delta is changed to long then we should also change the local variables in gather_surplus_pages that are used with delta. Specifically, the local va

Re: [PATCH v4 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-11-18 Thread Mike Kravetz
def_bool HUGETLB_PAGE > + depends on X86 > + depends on SPARSEMEM_VMEMMAP > + depends on HAVE_BOOTMEM_INFO_NODE > + help > + When using SPARSEMEM_VMEMMAP, the system can save up some memory Should that read, When using HUGETLB_PAGE_FREE_VMEMMAP, ... as the help message is for this config option. -- Mike Kravetz

Re: [PATCH v4 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate

2020-11-18 Thread Mike Kravetz
0; > > should not be needed. > Actually, we do not initialize other values like resv_huge_pages > or surplus_huge_pages. > > If that is the case, the "else" could go. > > Mike? Correct. Those assignments have been in the code for a very long time. > The changes itself look good to me. > I think that putting all the vemmap stuff into hugetlb-vmemmap.* was > the right choice. Agree! -- Mike Kravetz

Re: [PATCH v4 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate

2020-11-18 Thread Mike Kravetz
| 4 | ---+ | | | > + * | 2M| | 5 | -+ | | > + * | | | 6 | ---+ | > + * | | | 7 | -+ > + * | | +---+ > + * | | > + * | | > + * +---+ -- Mike Kravetz

Re: [PATCH] mm,hugetlb: Remove unneded initialization

2020-11-19 Thread Mike Kravetz
ile changed, 2 deletions(-) Thanks, Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v2] mm: hugetlb: fix type of delta parameter and related local variables in gather_surplus_pages()

2020-11-19 Thread Mike Kravetz
7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) Thank you, Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [External] Re: [PATCH v4 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-19 Thread Mike Kravetz
one before removing the pages of struct pages. This keeps everything 'consistent' as things are remapped. If you want to use one of the 'pages of struct pages' for the new pte page, then there will be a period of time when things are inconsistent. Before setting up the mapping, some code could potentially access that pages of struct pages. I tend to agree that allocating allocating a new page is the safest thing to do here. Or, perhaps someone can think of a way make this safe. -- Mike Kravetz

Re: [PATCH v4 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-19 Thread Mike Kravetz
changed, 85 insertions(+) Thanks for the cleanup. Oscar made some other comments. I only have one additional minor comment below. With those minor cleanups, Acked-by: Mike Kravetz > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c ... > +int vmemmap_pgtable_prealloc(struct hstate *h, struc

Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-12 Thread Mike Kravetz
On 11/12/20 4:35 PM, Mike Kravetz wrote: > On 11/10/20 7:41 PM, Muchun Song wrote: >> On Wed, Nov 11, 2020 at 8:47 AM Mike Kravetz wrote: >>> >>> On 11/8/20 6:10 AM, Muchun Song wrote: >>> I am reading the code incorrectly it does not appear page->l

Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2020-10-20 Thread Mike Kravetz
g seen with the qemu use case. I'm still doing more testing and code inspection to look for other issues. >From 861bcd7d0443f18a5fed3c3ddc5f1c71e78c4ef4 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Tue, 20 Oct 2020 20:21:42 -0700 Subject: [PATCH] hugetlb_cgroup: fix reservation accounting S

Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2020-10-21 Thread Mike Kravetz
> On 21.10.20 15:11, David Hildenbrand wrote: >> On 21.10.20 14:57, Michal Privoznik wrote: >>> On 10/21/20 5:35 AM, Mike Kravetz wrote: >>>> On 10/20/20 6:38 AM, David Hildenbrand wrote: >>>> >>>> It would be good if Mina (at least) would lo

[PATCH] hugetlb_cgroup: fix reservation accounting

2020-10-21 Thread Mike Kravetz
quot;) Cc: Reported-by: Michal Privoznik Co-developed-by: David Hildenbrand Signed-off-by: David Hildenbrand Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 67fc6383995b.

Re: [PATCH rfc 0/2] mm: cma: make cma_release() non-blocking

2020-10-21 Thread Mike Kravetz
he bitmap in cma_clear_bitmap and could block. However, I do not see why cma->lock has to be a mutex. I may be missing something, but I do not see any code protected by the mutex doing anything that could sleep? Could we simply change that mutex to a spinlock? -- Mike Kravetz

Re: [PATCH rfc 0/2] mm: cma: make cma_release() non-blocking

2020-10-22 Thread Mike Kravetz
On 10/21/20 7:33 PM, Roman Gushchin wrote: > On Wed, Oct 21, 2020 at 05:15:53PM -0700, Mike Kravetz wrote: >> On 10/16/20 3:52 PM, Roman Gushchin wrote: >>> This small patchset makes cma_release() non-blocking and simplifies >>> the code in hugetlbfs, where previously

Re: [PATCH rfc 0/2] mm: cma: make cma_release() non-blocking

2020-10-22 Thread Mike Kravetz
it could be used by the hugetlb code to make it simpler. -- Mike Kravetz

[RFC PATCH v2 0/4] hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization

2020-10-26 Thread Mike Kravetz
hinode_lock_write() helper. - Split out addition of hinode_rwsem and helper routines to a separate patch. [1] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/ Mike Kravetz (4): hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync hugetlbfs: add

[RFC PATCH v2 4/4] huegtlbfs: handle page fault/truncate races

2020-10-26 Thread Mike Kravetz
as necessary. File truncation (remove_inode_hugepages) needs to handle page mapping changes that could have happened before locking the page. This could happen if page was added to page cache and later backed out in fault processing. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 34

[RFC PATCH v2 1/4] hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync

2020-10-26 Thread Mike Kravetz
s per hugetlb calculation") commit 87bf91d39bb5 ("hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race") commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Signed-off-by

[RFC PATCH v2 2/4] hugetlbfs: add hinode_rwsem to hugetlb specific inode

2020-10-26 Thread Mike Kravetz
is possible in an attempt to minimize performance impacts. In addition, routines which can be used with lockdep to help with proper locking are also added. Use of the new semaphore and supporting routines will be provided in a subsequent patch. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c

[RFC PATCH v2 3/4] hugetlbfs: use hinode_rwsem for pmd sharing synchronization

2020-10-26 Thread Mike Kravetz
the semaphore if pmd sharing is possible. Also, add lockdep_assert calls to huge_pmd_share/unshare to help catch callers not using the proper locking. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 11 +- include/linux/hugetlb.h | 8 ++-- mm/hugetlb.c| 83

Re: [PATCH] mm/hugetable.c: align some prints

2020-10-26 Thread Mike Kravetz
es from day 1 of their existence? I would prefer not to touch them in case some is depending on current format. -- Mike Kravetz > --- > drivers/base/node.c | 4 ++-- > mm/hugetlb.c| 6 +++--- > 2 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/

Re: [External] Re: [PATCH v2 07/19] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page

2020-10-29 Thread Mike Kravetz
On 10/28/20 11:13 PM, Muchun Song wrote: > On Thu, Oct 29, 2020 at 7:42 AM Mike Kravetz wrote: >> >> On 10/26/20 7:51 AM, Muchun Song wrote: >>> + >>> +static inline spinlock_t *vmemmap_pmd_lockptr(pmd_t *pmd) >>> +{ >>> + static DEFINE_S

Re: [PATCH v5 00/21] Free some vmemmap pages of hugetlb page

2020-11-20 Thread Mike Kravetz
is would eliminate a bunch of the complex code doing page table manipulation. It does not address the issue of struct page pages going away which is being discussed here, but it could be a way to simply the first version of this code. If this is going to be an 'opt in' feature as previously suggested, then eliminating the PMD/huge page vmemmap mapping may be acceptable. My guess is that sysadmins would only 'opt in' if they expect most of system memory to be used by hugetlb pages. We certainly have database and virtualization use cases where this is true. -- Mike Kravetz

Re: [PATCH v3 00/21] Free some vmemmap pages of hugetlb page

2020-11-10 Thread Mike Kravetz
ool. I remember the use case pointed out in commit 099730d67417. It says, "I have a hugetlbfs user which is never explicitly allocating huge pages with 'nr_hugepages'. They only set 'nr_overcommit_hugepages' and then let the pages be allocated from the buddy allocator at fault time." In this case, I suspect they were using 'page fault' allocation for initialization much like someone using /proc/sys/vm/nr_hugepages. So, the overhead may not be as noticeable. -- Mike Kravetz

Re: [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-11-10 Thread Mike Kravetz
is might be a > trade-off between saving up memory and increasing the cost > of certain operations on allocation/free path. > That is why I mentioned it there. Yes, this is somewhat a trade-off. As a config option, this is something that would likely be decided by distros. I almost hate to suggest this, but is it something that an end user would want to decide? Is this something that perhaps should be a boot/kernel command line option? -- Mike Kravetz

Re: [External] Re: [PATCH v3 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate

2020-11-10 Thread Mike Kravetz
> > Hi Mike, what's your opinion? I would be happy to see this in a separate file. As Oscar mentions, the hugetlb.c file/code is already somethat difficult to read and understand. -- Mike Kravetz

Re: [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

2020-11-10 Thread Mike Kravetz
On 11/10/20 11:50 AM, Matthew Wilcox wrote: > On Tue, Nov 10, 2020 at 11:31:31AM -0800, Mike Kravetz wrote: >> On 11/9/20 5:52 AM, Oscar Salvador wrote: >>> On Sun, Nov 08, 2020 at 10:10:55PM +0800, Muchun Song wrote: > > I don't like config options. I like boot opt

Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-10 Thread Mike Kravetz
r(page, HUGETLB_PAGE_DTOR); > set_hugetlb_cgroup(page, NULL); When I saw that comment in previous patch series, I assumed page->lru was being used to store preallocated pages and pages to free. However, unless I am reading the code incorrectly it does not appear page->lru (of the huge page) is being used for this purpose. Is that correct? If it is correct, would using page->lru of the huge page make this code simpler? I am just missing the reason why you are using page_huge_pte(page)->lru -- Mike Kravetz

Re: [PATCH v5 00/21] Free some vmemmap pages of hugetlb page

2020-11-23 Thread Mike Kravetz
On 11/22/20 11:38 PM, Michal Hocko wrote: > On Fri 20-11-20 09:45:12, Mike Kravetz wrote: >> On 11/20/20 1:43 AM, David Hildenbrand wrote: > [...] >>>>> To keep things easy, maybe simply never allow to free these hugetlb pages >>>>> again for now? If they

Re: [PATCH v7 00/15] Free some vmemmap pages of hugetlb page

2020-12-03 Thread Mike Kravetz
As previously mentioned, I feel qualified to review the hugetlb changes and some other closely related changes. However, this patch set is touching quite a few areas and I do not feel qualified to make authoritative statements about them all. I too hope others will take a look. -- Mike Kravetz

Re: [RFC PATCH 01/13] fs/userfaultfd: fix wrong error code on WP & !VM_MAYWRITE

2020-12-01 Thread Mike Kravetz
ugetlbfs: allow > registration of ranges containing huge pages"). > > Fix it. > > Cc: Mike Kravetz > Cc: Jens Axboe > Cc: Andrea Arcangeli > Cc: Peter Xu > Cc: Alexander Viro > Cc: io-ur...@vger.kernel.org > Cc: linux-fsde...@vger.kernel.org > Cc: linu

[PATCH] hugetlb_cgroup: fix offline of hugetlb cgroup with reservations

2020-12-03 Thread Mike Kravetz
ss_offline was noticed. The hstate index is not reinitialized each time through the do-while loop. Fix this as well. Fixes: 1adc4d419aa2 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations") Cc: Reported-by: Adrian Moreno Tested-by: Adrian Moreno Signed-off-by: M

[PATCH] hugetlbfs: fix anon huge page migration race

2020-11-05 Thread Mike Kravetz
/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/ Reported-by: Qian Cai Suggested-by: Hugh Dickins Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Cc: Signed-off-by: Mike Kravetz --- mm/hugetlb.c| 90 +++

Re: [PATCH 0/4] hugetlbfs: use hinode_rwsem for pmd sharing synchronization

2020-11-05 Thread Mike Kravetz
On 11/2/20 4:28 PM, Mike Kravetz wrote: > The RFC series reverted all patches where i_mmap_rwsem was used for > pmd sharing synchronization, and then added code to use hinode_rwsem. > This series ends up with the same code in the end, but is structured > as follows: > > - Rev

Re: [PATCH v3 0/6] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-09-03 Thread Mike Kravetz
ges separately. If not a standalone patch, at least the first patch of the series. This new code will be exercised even if cgroup reservation accounting not enabled, so it is very important than no subtle regressions be introduced. -- Mike Kravetz

Re: [PATCH v2] mm/hugetlb: avoid looping to the same hugepage if !pages and !vmas

2019-09-03 Thread Mike Kravetz
i += pages_per_huge_page(h); > + spin_unlock(ptl); > + continue; > + } > + > same_page: > if (pages) { > pages[i] = mem_map_offset(page, pfn_offset); > With a comment added to the code, Reviewed-by: Mike Kravetz -- Mike Kravetz

<    6   7   8   9   10   11   12   13   14   15   >