Re: [PATCH v3 0/6] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-09-03 Thread Mike Kravetz
On 9/3/19 10:57 AM, Mike Kravetz wrote: > On 8/29/19 12:18 AM, Michal Hocko wrote: >> [Cc cgroups maintainers] >> >> On Wed 28-08-19 10:58:00, Mina Almasry wrote: >>> On Wed, Aug 28, 2019 at 4:23 AM Michal Hocko wrote: >>>> >>>> On Mon 26-0

[PATCH] hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS

2019-08-07 Thread Mike Kravetz
the page table lock and check for huge_pte_none before returning an error. This is the same check that must be made further in the code even if page allocation is successful. Reported-by: Li Wang Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Mike Kravetz Tested-b

Re: [PATCH] hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS

2019-08-07 Thread Mike Kravetz
ptep))) goto backout; -- Mike Kravetz

Re: [PATCH] hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS

2019-08-08 Thread Mike Kravetz
On 8/8/19 12:47 AM, Michal Hocko wrote: > On Thu 08-08-19 09:46:07, Michal Hocko wrote: >> On Wed 07-08-19 17:05:33, Mike Kravetz wrote: >>> Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS >>> in the kernel-v5.2.3 testing. This is caused by a ra

Re: [RFC PATCH v2 4/5] hugetlb_cgroup: Add accounting for shared mappings

2019-08-16 Thread Mike Kravetz
On 8/15/19 4:04 PM, Mina Almasry wrote: > On Wed, Aug 14, 2019 at 9:46 AM Mike Kravetz wrote: >> >> On 8/13/19 4:54 PM, Mike Kravetz wrote: >>> On 8/8/19 4:13 PM, Mina Almasry wrote: >>>> For shared mappings, the pointer to the hugetlb_cgroup to uncharg

Re: [RFC PATCH v2 4/5] hugetlb_cgroup: Add accounting for shared mappings

2019-08-16 Thread Mike Kravetz
On 8/15/19 4:08 PM, Mina Almasry wrote: > On Tue, Aug 13, 2019 at 4:54 PM Mike Kravetz wrote: >>> mm/hugetlb.c | 208 +-- >>> 1 file changed, 170 insertions(+), 38 deletions(-) >>> >>> diff --git

Re: mmotm 2019-05-29-20-52 uploaded

2019-05-30 Thread Mike Kravetz
t would seem to be related to commit 3e2c19f9bef7e > * mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch -- Mike Kravetz

Re: [PATCH -mm] mm, swap: Fix bad swap file entry warning

2019-05-31 Thread Mike Kravetz
the swap devices that may cause warning messages. > > Fixes: 6a946753dbe6 ("mm/swap_state.c: simplify total_swapcache_pages() with > get_swap_device()") > Signed-off-by: "Huang, Ying" Thank you, this eliminates the messages for me: Tested-by: Mike Kravetz -- Mike Kravetz

[RFC PATCH 3/3] hugetlbfs: don't retry when pool page allocations start to fail

2019-07-24 Thread Mike Kravetz
will still succeed if there is memory available, but it will not try as hard to free up memory. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 87 ++-- 1 file changed, 77 insertions(+), 10 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index

[RFC PATCH 2/3] mm, compaction: use MIN_COMPACT_COSTLY_PRIORITY everywhere for costly orders

2019-07-24 Thread Mike Kravetz
. Signed-off-by: Mike Kravetz --- mm/compaction.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 952dc2fb24e5..325b746068d1 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2294,9 +2294,15 @@ static enum

[RFC PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection

2019-07-24 Thread Mike Kravetz
From: Hillf Danton Address the issue of should_continue_reclaim continuing true too often for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. This could happen during hugetlb page allocation causing stalls for minutes or hours. Restructure code so that false will be returned in

[RFC PATCH 0/3] fix hugetlb page allocation stalls

2019-07-24 Thread Mike Kravetz
ton (1): mm, reclaim: make should_continue_reclaim perform dryrun detection Mike Kravetz (2): mm, compaction: use MIN_COMPACT_COSTLY_PRIORITY everywhere for costly orders hugetlbfs: don't retry when pool page allocations start to fail mm/compaction.c | 18 +++--- mm/hugetlb.c

Re: [PATCH] mm/rmap.c: remove set but not used variable 'cstart'

2019-07-24 Thread Mike Kravetz
cdb07bdea28e ("mm/rmap.c: remove redundant variable cend") It appears Commit 0f10851ea475 ("mm/mmu_notifier: avoid double notification when it is useless") is what removed the use of cstart and cend. And, they should have been removed then. > Reported-by: Hulk Robot > Sig

Re: [RFC PATCH 00/16] 1GB THP support on x86_64

2020-09-03 Thread Mike Kravetz
ages to the hugetlb pool and then using them within applications? Or, are you dynamically allocating them at fault time (hugetlb overcommit/surplus)? Latency time for use of such pages includes: - Putting together 1G contiguous - Clearing 1G memory In the 'allocation at fault time' mode you incur both costs at fault time. If using pages from the pool, your only cost at fault time is clearing the page. -- Mike Kravetz

Re: [PATCH] mm/hugetlb: Fix a race between hugetlb sysctl handlers

2020-08-24 Thread Mike Kravetz
le the race', I think it might be acceptable to just put a big semaphore around it. -- Mike Kravetz

Re: [PATCH v2 0/4] mm/hugetlb: Small cleanup and improvement

2020-08-24 Thread Mike Kravetz
Hello Andrew, Unless someone objects, can you add patches 1-3 of this series to your tree. They have been reviewed and are fairly simple cleanups. -- Mike Kravetz On 7/22/20 8:22 PM, Baoquan He wrote: > v1 is here: > https://lore.kernel.org/linux-mm/20200720062623.13135-1-...@redh

Re: [External] Re: [PATCH] mm/hugetlb: Fix a race between hugetlb sysctl handlers

2020-08-25 Thread Mike Kravetz
On 8/24/20 8:01 PM, Muchun Song wrote: > On Tue, Aug 25, 2020 at 5:21 AM Mike Kravetz wrote: >> >> I too am looking at this now and do not completely understand the race. >> It could be that: >> >> hugetlb_sysctl_handler_common >> ... >> table-

Re: [Patch v4 5/7] mm/hugetlb: a page from buddy is not on any list

2020-09-02 Thread Mike Kravetz
On 9/2/20 3:49 AM, Vlastimil Babka wrote: > On 9/1/20 3:46 AM, Wei Yang wrote: >> The page allocated from buddy is not on any list, so just use list_add() >> is enough. >> >> Signed-off-by: Wei Yang >> Reviewed-by: Baoquan He >> Reviewed-by: Mike Kravet

[RFC PATCH] cma: make number of CMA areas dynamic, remove CONFIG_CMA_AREAS

2020-09-02 Thread Mike Kravetz
before normal memory allocators, so use the memblock allocator. Signed-off-by: Mike Kravetz --- arch/arm/mm/dma-mapping.c | 29 --- arch/mips/configs/cu1000-neo_defconfig | 1 - arch/mips/configs/cu1830-neo_defconfig | 1 - include/linux/cma.h

Re: [PATCH v2] mm/hugetlb: Fix a race between hugetlb sysctl handlers

2020-08-28 Thread Mike Kravetz
wframe+0x44/0xa9 > > Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes") > Signed-off-by: Muchun Song Thank you! Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [Patch v2 2/7] mm/hugetlb: remove VM_BUG_ON(!nrg) in get_file_region_entry_from_cache()

2020-08-28 Thread Mike Kravetz
On 8/27/20 8:32 PM, Wei Yang wrote: > We are sure to get a valid file_region, otherwise the > VM_BUG_ON(resv->region_cache_count <= 0) at the very beginning would be > triggered. > > Let's remove the redundant one. > > Signed-off-by: Wei Yang Thank you. Reviewed-

Re: [Patch v2 6/7] mm/hugetlb: return non-isolated page in the loop instead of break and check

2020-08-28 Thread Mike Kravetz
t; Signed-off-by: Wei Yang > Reviewed-by: Mike Kravetz Commit bbe88753bd42 (mm/hugetlb: make hugetlb migration callback CMA aware) in v5.9-rc2 modified dequeue_huge_page_node_exact. This patch will need to be updated to take those changes into account. -- Mike Kravetz

Re: [PATCH v2] mm/hugetlb: Fix a race between hugetlb sysctl handlers

2020-08-28 Thread Mike Kravetz
can modify table->data in the global data structure without any synchronization. Worse yet, is that that value is local to their stacks. That was the root cause of the issue addressed by Muchun's patch. Does that analysis make sense? Or, are we missing something. -- Mike Kravetz

Re: [PATCH 00/10] mm/hugetlb: code refine and simplification

2020-08-07 Thread Mike Kravetz
> nasty free_huge_page > > mm/hugetlb.c | 101 ++- > 1 file changed, 44 insertions(+), 57 deletions(-) Thanks Wei Yang! I'll take a look at these next week. -- Mike Kravetz

Re: KASAN: null-ptr-deref Read in PageHuge

2020-09-18 Thread Mike Kravetz
4002c8 RCX: 00440329 > RDX: RSI: 4000 RDI: 20001000 > RBP: 006ca018 R08: R09: > R10: 0003 R11: 0246 R12: 00401b30 > R13: 00401bc0 R14: R15:

Re: [PATCH V3 6/8] mm: and drivers core: Convert hugetlb_report_node_meminfo to sysfs_emit

2020-09-18 Thread Mike Kravetz
On 9/16/20 1:40 PM, Joe Perches wrote: > Convert the unbound sprintf in hugetlb_report_node_meminfo to use > sysfs_emit_at so that no possible overrun of a PAGE_SIZE buf can occur. > > Signed-off-by: Joe Perches Acked-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v2 07/19] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page

2020-10-28 Thread Mike Kravetz
gd_none(*pgd)) > + return NULL; > + p4d = p4d_offset(pgd, addr); > + if (p4d_none(*p4d)) > + return NULL; > + pud = pud_offset(p4d, addr); > + > + WARN_ON_ONCE(pud_bad(*pud)); > + if (pud_none(*pud) || pud_bad(*pud)) > + return NULL; > + pmd = pmd_offset(pud, addr); > + > + return pmd; > +} That routine is not really hugetlb specific. Perhaps we could move it to sparse-vmemmap.c? Or elsewhere? -- Mike Kravetz

Re: [External] Re: [PATCH v2 05/19] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-10-28 Thread Mike Kravetz
On 10/28/20 12:26 AM, Muchun Song wrote: > On Wed, Oct 28, 2020 at 8:33 AM Mike Kravetz wrote: >> On 10/26/20 7:51 AM, Muchun Song wrote: >> >> I see the following routines follow the pattern for vmemmap manipulation >> in dax. > > Did you mean move those

Re: [RFC PATCH 0/3] Allocate memmap from hotadded memory (per device)

2020-10-28 Thread Mike Kravetz
mentioned. > More eyes on that series would be appreciated. That series will dynamically free and allocate memmap pages as hugetlb pages are allocated or freed. I haven't looked through this series, but my first thought is that we would need to ensure those allocs/frees are directed to t

Re: [PATCH v2 05/19] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-10-28 Thread Mike Kravetz
(page, HUGETLB_PAGE_DTOR); > set_hugetlb_cgroup(page, NULL); > @@ -1783,6 +1892,14 @@ static struct page *alloc_fresh_huge_page(struct > hstate *h, > if (!page) > return NULL; > > + if (vmemmap_pgtable_prealloc(h, page)) { > + if (hstate_is_gigantic(h)) > + free_gigantic_page(page, huge_page_order(h)); > + else > + put_page(page); > + return NULL; > + } > + It seems a bit strange that we will fail a huge page allocation if vmemmap_pgtable_prealloc fails. Not sure, but it almost seems like we shold allow the allocation and log a warning? It is somewhat unfortunate that we need to allocate a page to free pages. > if (hstate_is_gigantic(h)) > prep_compound_gigantic_page(page, huge_page_order(h)); > prep_new_huge_page(h, page, page_to_nid(page)); > -- Mike Kravetz

Re: [RFC] mm/vmstat: Add events for HugeTLB migration

2020-09-28 Thread Mike Kravetz
e)) is_hugetlb = true; else is_thp = true; Although, the compiler may be able to optimize. I did not check. > + > nr_subpages = thp_nr_pages(page); > + if (is_hugetlb) > + nr_subpages = > pages_per_huge_page(page_hstate(page)); Can we just use compound_order() here for all cases? -- Mike Kravetz

Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

2020-10-12 Thread Mike Kravetz
different code. The performance issues discovered here will be taken into account with the new code. However, as previously mentioned additional synchronization is required for functional correctness. As a result, there will be some regression in this code. -

Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

2020-10-12 Thread Mike Kravetz
On 10/12/20 6:59 PM, Xing Zhengjun wrote: > > > On 10/13/2020 1:40 AM, Mike Kravetz wrote: >> On 10/11/20 10:29 PM, Xing Zhengjun wrote: >>> Hi Mike, >>> >>> I re-test it in v5.9-rc8, the regression still existed. It is almost >>>

[RFC PATCH 0/3] hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization

2020-10-13 Thread Mike Kravetz
slate' approach seemed best but I am open to whatever would be easiest to review. [1] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/ Mike Kravetz (3): hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync hugetlbfs: introduce hinode_rwsem for pmd

[RFC PATCH 1/3] hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync

2020-10-13 Thread Mike Kravetz
s per hugetlb calculation") commit 87bf91d39bb5 ("hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race") commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Signed-off-by

[RFC PATCH 3/3] huegtlbfs: handle page fault/truncate races

2020-10-13 Thread Mike Kravetz
as necessary. File truncation (remove_inode_hugepages) needs to handle page mapping changes that could have happened before locking the page. This could happen if page was added to page cache and later backed out in fault processing. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 34

[RFC PATCH 2/3] hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization

2020-10-13 Thread Mike Kravetz
is not taken if the caller knows the target can not possibly be part of a shared pmd. lockdep_assert calls are added to huge_pmd_share and huge_pmd_unshare to help catch callers not using the proper locking. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 8 include/linux/hugetlb.h | 66

[PATCH 0/4] hugetlbfs: use hinode_rwsem for pmd sharing synchronization

2020-11-02 Thread Mike Kravetz
is that this will be easier to review. Mike Kravetz (4): Revert hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race hugetlbfs: add hinode_rwsem to hugetlb specific inode hugetlbfs: use hinode_rwsem for pmd sharing synchronization huegtlbfs: handle page fault/truncate races fs/hugetlbfs/inode.c

[PATCH 3/4] hugetlbfs: use hinode_rwsem for pmd sharing synchronization

2020-11-02 Thread Mike Kravetz
d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Cc: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 31 +-- include/linux/fs.h | 15 include/linux/hugetlb.h | 8 -- mm/hugetlb.c| 188 +++

[PATCH 2/4] hugetlbfs: add hinode_rwsem to hugetlb specific inode

2020-11-02 Thread Mike Kravetz
ensure proper locking are also added. Use of the new semaphore and supporting routines will be provided in a later patch. Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Cc: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 12 inc

[PATCH 1/4] Revert hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race

2020-11-02 Thread Mike Kravetz
sem to address page fault/truncate race") Cc: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 28 mm/hugetlb.c | 23 --- 2 files changed, 20 insertions(+), 31 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetl

[PATCH 4/4] huegtlbfs: handle page fault/truncate races

2020-11-02 Thread Mike Kravetz
c: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 34 -- mm/hugetlb.c | 40 ++-- 2 files changed, 58 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index bc9979382a1e..6b

Re: [PATCH v3 3/5] hugetlb: only set HPageMigratable for migratable hstates

2021-01-29 Thread Mike Kravetz
On 1/28/21 2:15 PM, Andrew Morton wrote: > On Thu, 28 Jan 2021 14:00:29 -0800 Mike Kravetz > wrote: >> >> Michal suggested that comments describing synchronization be added for each >> flag. Since I did 'one patch per flag', that would be an update to each >> pa

Re: [PATCH v2 2/2] mm/hugetlb: refactor subpage recording

2021-01-28 Thread Mike Kravetz
+- > 1 file changed, 28 insertions(+), 21 deletions(-) Thanks for updating this. Reviewed-by: Mike Kravetz I think there still is an open general question about whether we can always assume page structs are contiguous for really big pages. That is outside

Re: [PATCH v3 3/5] hugetlb: only set HPageMigratable for migratable hstates

2021-01-28 Thread Mike Kravetz
On 1/28/21 1:37 PM, Andrew Morton wrote: > On Thu, 28 Jan 2021 06:52:21 +0100 Oscar Salvador wrote: > >> On Wed, Jan 27, 2021 at 03:36:41PM -0800, Mike Kravetz wrote: >>> Yes, this patch is somewhat optional. It should be a minor improvement >>> in cases wh

Re: [External] Re: [PATCH v13 05/12] mm: hugetlb: allocate the vmemmap pages associated with each HugeTLB page

2021-02-01 Thread Mike Kravetz
olding 'big chunks' of memory for a specific purpose and dumping them when needed. They were not doing this with hugetlb pages, but nothing would surprise me. In this series, vmmap freeing is 'opt in' at boot time. I would expect the use cases that want to opt in rarely if ever free/dissolve hugetlb pages. But, I could be wrong. -- Mike Kravetz

Re: [PATCH 4/4] hugetlb: Do early cow when page pinned on src mm

2021-02-03 Thread Mike Kravetz
spin_unlock(src_ptl); > + spin_unlock(dst_ptl); > + prealloc = alloc_huge_page(vma, addr, > 0); One quick que

Re: [PATCH v3 3/5] hugetlb: only set HPageMigratable for migratable hstates

2021-02-03 Thread Mike Kravetz
On 2/1/21 3:49 AM, Michal Hocko wrote: > On Fri 29-01-21 10:46:15, Mike Kravetz wrote: >> On 1/28/21 2:15 PM, Andrew Morton wrote: >>> On Thu, 28 Jan 2021 14:00:29 -0800 Mike Kravetz >>> wrote: >>>> >>>> Michal suggested that comments des

Re: [External] [PATCH v2 2/5] hugetlb: convert page_huge_active() HPageMigratable flag

2021-02-03 Thread Mike Kravetz
On 2/2/21 11:42 PM, Muchun Song wrote: > On Wed, Jan 20, 2021 at 9:33 AM Mike Kravetz wrote: >> >> Signed-off-by: Mike Kravetz > > Hi Mike, > > I found that you may forget to remove set_page_huge_active() > from include/linux/hugetlb.h. > > diff --git a/inc

Re: [PATCH 1/4] hugetlb: Dedup the code to add a new file_region

2021-02-03 Thread Mike Kravetz
- > 1 file changed, 27 insertions(+), 24 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 18f6ee317900..d2859c2aecc9 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c Thanks, that is a pretty straight forward change. A cleanup with no functiona

Re: [PATCH 2/4] hugetlg: Break earlier in add_reservation_in_range() when we can

2021-02-03 Thread Mike Kravetz
*/ > - if (rg->from > t) > + if (rg->from >= t) > break; > > /* Add an entry for last_accounted_offset -> rg->from, and > Changing any of this code makes me nervous. However, I agree with your analysis. The change makes the code match the comment WRT the [from, to) nature of regions. Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH 1/2] mm/hugetlb: grab head page refcount once per group of subpages

2021-01-26 Thread Mike Kravetz
+++--- > 3 files changed, 29 insertions(+), 22 deletions(-) Thanks. Nice straight forward improvement. Reviewed-by: Mike Kravetz -- Mike Kravetz > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index a5d618d08506..0d793486822b 100644 > --- a/include/li

Re: [PATCH 2/2] mm/hugetlb: refactor subpage recording

2021-01-26 Thread Mike Kravetz
so large that we do not guarantee that page++ pointer * arithmetic will work across the entire page. We need something more * specialized. */ static void __copy_gigantic_page(struct page *dst, struct page *src, int nr_pages) -- Mike Kravetz > +

Re: [PATCH] mm: hugetlb: fix missing put_page in gather_surplus_pages()

2021-01-26 Thread Mike Kravetz
lb page on the free list with a count of 1. There is no check in the enqueue code. When we dequeue the page, set_page_refcounted() is used to set the count to 1 without looking at the current value. And, all the other VM_DEBUG macros are off so we mostly do not notice the bug. Thanks again, Reviewed-by: Mike Kravetz -- Mike Kravetz > } > free: >

Re: [PATCH v3 5/5] hugetlb: convert PageHugeFreed to HPageFreed flag

2021-01-27 Thread Mike Kravetz
On 1/27/21 2:41 AM, Michal Hocko wrote: > On Fri 22-01-21 11:52:31, Mike Kravetz wrote: >> Use new hugetlb specific HPageFreed flag to replace the >> PageHugeFreed interfaces. >> >> Signed-off-by: Mike Kravetz >> Reviewed-by: Oscar Salvador >> Reviewed

Re: [PATCH v3 1/5] hugetlb: use page.private for hugetlb specific page flags

2021-01-27 Thread Mike Kravetz
On 1/27/21 2:20 AM, Michal Hocko wrote: > [sorry for jumping in late] > > On Fri 22-01-21 11:52:27, Mike Kravetz wrote: >> As hugetlbfs evolved, state information about hugetlb pages was added. >> One 'convenient' way of doing this was to use available fields in tail >>

Re: [PATCH v3 2/5] hugetlb: convert page_huge_active() HPageMigratable flag

2021-01-27 Thread Mike Kravetz
On 1/27/21 2:25 AM, Michal Hocko wrote: > On Fri 22-01-21 11:52:28, Mike Kravetz wrote: >> Use the new hugetlb page specific flag HPageMigratable to replace the >> page_huge_active interfaces. By it's name, page_huge_active implied >> that a huge page was on the

Re: [PATCH v3 3/5] hugetlb: only set HPageMigratable for migratable hstates

2021-01-27 Thread Mike Kravetz
On 1/27/21 2:35 AM, Michal Hocko wrote: > On Fri 22-01-21 11:52:29, Mike Kravetz wrote: >> The HP_Migratable flag indicates a page is a candidate for migration. >> Only set the flag if the page's hstate supports migration. This allows >> the migration paths to detect non-mig

Re: [PATCH 2/2] mm/hugetlb: refactor subpage recording

2021-01-26 Thread Mike Kravetz
On 1/26/21 11:21 AM, Joao Martins wrote: > On 1/26/21 6:08 PM, Mike Kravetz wrote: >> On 1/25/21 12:57 PM, Joao Martins wrote: >>> >>> +static void record_subpages_vmas(struct page *page, struct vm_area_struct >>> *vma, >>> +

Re: [External] Re: [PATCH v13 05/12] mm: hugetlb: allocate the vmemmap pages associated with each HugeTLB page

2021-01-28 Thread Mike Kravetz
t fails, we use part of the hugepage to >remap. I honestly am not sure about this. This would only happen for pages in NORMAL. The only time using part of the huge page for vmemmap would help is if we are trying to dissolve huge pages to free up memory for other uses. > What's your opinion about this? Should we take this approach? I think trying to solve all the issues that could happen as the result of not being able to dissolve a hugetlb page has made this extremely complex. I know this is something we need to address/solve. We do not want to add more unexpected behavior in corner cases. However, I can not help but think about similar issues today. For example, if a huge page is in use in ZONE_MOVABLE or CMA there is no guarantee that it can be migrated today. Correct? We may need to allocate another huge page for the target of the migration, and there is no guarantee we can do that. -- Mike Kravetz

Re: [PATCH 2/2] mm/hugetlb: refactor subpage recording

2021-01-26 Thread Mike Kravetz
On 1/26/21 4:07 PM, Jason Gunthorpe wrote: > On Tue, Jan 26, 2021 at 01:21:46PM -0800, Mike Kravetz wrote: >> On 1/26/21 11:21 AM, Joao Martins wrote: >>> On 1/26/21 6:08 PM, Mike Kravetz wrote: >>>> On 1/25/21 12:57 PM, Joao Martins wrote: >>>>> >&g

Re: [PATCH] mm/hugetlb: Simplify the calculation of variables

2021-01-26 Thread Mike Kravetz
gt; 1 file changed, 1 insertion(+), 2 deletions(-) Thanks, Reviewed-by: Mike Kravetz -- Mike Kravetz > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index cbf32d2..5e6a6e7 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3367,8 +3367,7 @@ static unsigned in

Re: [PATCH] mm/hugetlb: Fix use after free when subpool max_hpages accounting is not enabled

2021-01-26 Thread Mike Kravetz
by: Miaohe Lin > --- > mm/hugetlb.c | 16 +--- > 1 file changed, 13 insertions(+), 3 deletions(-) Thanks, Reviewed-by: Mike Kravetz -- Mike Kravetz > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 777bc0e45bf3..53ea65d1c5ab 100644 > --- a/mm/hugetlb.c &

Re: [PATCH] mm/hugetlb: remove duplicate codes of setting compound_nr

2021-02-02 Thread Mike Kravetz
set_compound_order(page, 0); > - page[1].compound_nr = 0; I may be reading the code wrong, but set_compound_order(page, 0) will set page[1].compound_nr to the value of 1. That is different than the explicit setting to 0 in the existing code. If that is correct, then you should say

Re: [PATCH v2] mm/hugetlb: remove redundant check in preparing and destroying gigantic page

2021-02-02 Thread Mike Kravetz
b gigantic page' being <= 1 order, so this change makes sense. Thanks, Reviewed-by: Mike Kravetz -- Mike Kravetz >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index a3e4fa2c5e94..dac5db569ccb 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -1219

Re: [PATCH v4 4/9] hugetlb: region_chg provides only cache entry

2019-09-16 Thread Mike Kravetz
e with > region_del exists. > > Signed-off-by: Mina Almasry Thanks. I like this modification as it does simplify the code and could be added as a general cleanup independent of the other changes. Reviewed-by: Mike Kravetz -- Mike Kravetz > --- > mm/hugetlb.c | 63 +---

Re: [PATCH v4 5/9] hugetlb: remove duplicated code

2019-09-16 Thread Mike Kravetz
n_add, and I want to make that change in one place > only. It should improve maintainability anyway on its own. > > Signed-off-by: Mina Almasry Like the previous patch, this is a good improvement indepentent of the rest of the series. Thanks! Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v4 6/9] hugetlb: disable region_add file_region coalescing

2019-09-16 Thread Mike Kravetz
ere done in the region_chg call, and it was relatively easy to do in existing code when region_chg would only need one additional region at most. I'm thinking that we may have to make region_chg allocate the worst case number of regions (t - f)/2, OR change to the code such that region_add could return an error. -- Mike Kravetz

[PATCH] hugetlbfs: hugetlb_fault_mutex_hash cleanup

2019-09-18 Thread Mike Kravetz
, remove it from the definition and all callers. No functional change. Reported-by: Nathan Chancellor Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 4 ++-- include/linux/hugetlb.h | 2 +- mm/hugetlb.c| 10 +- mm/userfaultfd.c| 2 +- 4 files changed, 9

Re: [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge PMD

2019-09-11 Thread Mike Kravetz
you describe above. I have never looked at/for delays in these environments around pmd sharing (page faults), but that does not mean they do not exist. I will try to get the DB group to give me access to one of their large environments for analysis. We may want to consider making the timeout value and disable threshold user configurable. -- Mike Kravetz

Re: [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge PMD

2019-09-11 Thread Mike Kravetz
one already knows. At one time, I thought it was safe to acquire the semaphore in read mode for huge_pmd_share, but write mode for huge_pmd_unshare. See commit b43a99900559. This was reverted along with another patch for other reasons. If we change change from write to read mode, this may have significant impact on the stalls. -- Mike Kravetz

Re: kernel BUG at mm/hugetlb.c:LINE!

2020-05-18 Thread Mike Kravetz
y to overlayfs. IMO - This BUG/report revealed two issues. First is the BUG by mmap'ing a hugetlbfs file on overlayfs. The other is that core mmap code will skip any filesystem specific get_unmapped area routine if on a union/overlay. My patch fixes both, but if we go with a whitelist approach and don't allow hugetlbfs I think we still need to address the filesystem specific get_unmapped area issue. That is easy enough to do by adding a routine to overlayfs which calls the routine for the underlying fs. -- Mike Kravetz

Re: kernel BUG at mm/hugetlb.c:LINE!

2020-05-27 Thread Mike Kravetz
On 5/22/20 3:05 AM, Miklos Szeredi wrote: > On Wed, May 20, 2020 at 10:27:15AM -0700, Mike Kravetz wrote: > >> I am fairly confident it is all about checking limits and alignment. The >> filesystem knows if it can/should align to base or huge page size. DAX has >> som

Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()

2020-05-28 Thread Mike Kravetz
if your patch is applied to the wrong git tree, please drop us a note to help > improve the system. BTW, we also suggest to use '--base' option to specify the > base tree in git format-patch, please see > https://stackoverflow.com/a/37406982] > > url: > https://github.com/0day

Re: [PATCH v1] mm: hwpoison: disable memory error handling on 1GB hugepage

2018-01-29 Thread Mike Kravetz
he 1st madvise() event. > > Do pgd size pages work properly? Adding Anshuman and Aneesh as they added pgd support for power. And, this patch will disable that as well IIUC. This patch makes sense for x86. My only concern/question is for other archs which may have huge page sizes defi

[PATCH 2/3] mm: memfd: split out memfd for use by multiple filesystems

2018-01-29 Thread Mike Kravetz
-or- hugetlbfs, split out the required memfd code to separate files. These files are not used until a subsequent patch which deletes duplicate code in the orifinal files and enables their use. Signed-off-by: Mike Kravetz --- include/linux/memfd.h | 16 +++ mm/memfd.c| 341

[PATCH 3/3] mm: memfd: remove memfd code from shmem files and use new memfd files

2018-01-29 Thread Mike Kravetz
Remove memfd and file sealing routines from shmem.c, and enable the use of the new files (memfd.c and memfd.h). A new config option MEMFD_CREATE is defined that is enabled if TMPFS -or- HUGETLBFS is enabled. Signed-off-by: Mike Kravetz --- fs/Kconfig | 3 + fs/fcntl.c

[PATCH 1/3] mm: hugetlbfs: move HUGETLBFS_I outside #ifdef CONFIG_HUGETLBFS

2018-01-29 Thread Mike Kravetz
HUGETLBFS_I will be referenced (but not used) in code outside #ifdef CONFIG_HUGETLBFS. Move the definition to prevent compiler errors. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 27 --- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git

[PATCH 0/3] restructure memfd code

2018-01-29 Thread Mike Kravetz
this was sent as a RFC, one comment suggested combining patches 2 and 3 so that we would not have 'new unused' files between patches. If this is desired, I can make the change. For me, it is easier to read as separate patches. Mike Kravetz (3): mm: hugetlbfs: move HUGETLBFS_I outside #ifdef

Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

2018-01-30 Thread Mike Kravetz
entry properly works, and > + * - other mm code walking over page table is aware of pud-aligned > + *hwpoison entries. > + */ > + if (huge_page_size(page_hstate(head)) > PMD_SIZE) { > + action_result(pfn, MF_MSG_NON_PMD_HUGE, MF_IGNORED

[PATCH v2 3/3] mm: memfd: remove memfd code from shmem files and use new memfd files

2018-01-30 Thread Mike Kravetz
Remove memfd and file sealing routines from shmem.c, and enable the use of the new files (memfd.c and memfd.h). A new config option MEMFD_CREATE is defined that is enabled if TMPFS -or- HUGETLBFS is enabled. Signed-off-by: Mike Kravetz --- fs/Kconfig | 3 + fs/fcntl.c

Re: [PATCH 3/3] mm: memfd: remove memfd code from shmem files and use new memfd files

2018-01-30 Thread Mike Kravetz
applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Mike-Kravetz/restructure-memfd-code/20180131-023405 > base: git://git.cmpxchg.org/linux-mmotm.git master > reproduce: > # apt-

[PATCH v2 2/3] mm: memfd: split out memfd for use by multiple filesystems

2018-01-30 Thread Mike Kravetz
-off-by: Mike Kravetz --- include/linux/memfd.h | 16 +++ mm/memfd.c| 342 ++ 2 files changed, 358 insertions(+) create mode 100644 include/linux/memfd.h create mode 100644 mm/memfd.c diff --git a/include/linux/memfd.h b/include/linux

[PATCH v2 1/3] mm: hugetlbfs: move HUGETLBFS_I outside #ifdef CONFIG_HUGETLBFS

2018-01-30 Thread Mike Kravetz
HUGETLBFS_I will be referenced (but not used) in code outside #ifdef CONFIG_HUGETLBFS. Move the definition to prevent compiler errors. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 27 --- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git

Re: [RFC] mm/migrate: Add new migration reason MR_HUGETLB

2018-01-30 Thread Mike Kravetz
han MAX_ORDER contiguous pages?". Not sure that we should be adding to the current alloc_contig_range interface until we decide it is something which will be useful long term. -- Mike Kravetz

[PATCH v2 0/3] restructure memfd code

2018-01-30 Thread Mike Kravetz
this was sent as a RFC, one comment suggested combining patches 2 and 3 so that we would not have 'new unused' files between patches. If this is desired, I can make the change. For me, it is easier to read as separate patches. v2: - Fixed sparse warnings inherited from existing code Mike Kravetz (3

Re: [patch v2] mremap.2: Add description of old_size == 0 functionality

2017-09-25 Thread Mike Kravetz
On 09/20/2017 12:25 AM, Michael Kerrisk (man-pages) wrote: > Hello Mike, > > On 09/19/2017 11:42 PM, Mike Kravetz wrote: >> v2: Fix incorrect wording noticed by Jann Horn. >> Remove deprecated and memfd_create discussion as suggested >> by Florian Weimer. >&g

[PATCH 0/1] mm:hugetlbfs: Fix hwpoison reserve accounting

2017-10-19 Thread Mike Kravetz
pages to zero, the poisoned page will be counted as 'surplus'. I was thinking about keeping at least a bad page count (if not a list) to avoid user confusion. It may be overkill as I have not given too much thought to this issue. Anyone else have thoughts here? Mike Kravetz (1): mm:hugetlbfs

[PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting

2017-10-19 Thread Mike Kravetz
epage in unrecoverable memory error") Cc: Naoya Horiguchi Cc: Michal Hocko Cc: Aneesh Kumar Cc: Anshuman Khandual Cc: Andrew Morton Cc: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlb

Re: [PATCH V3] selftests/vm: Add tests validating mremap mirror functionality

2017-10-19 Thread Mike Kravetz
empt to mirror private anon mapping will fail. > > Suggested-by: Mike Kravetz > Signed-off-by: Anshuman Khandual The tests themselves look fine. However, they are pretty simple and could very easily be combined into one 'mremap_mirror.c' file. I would prefer that they be combine

Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting

2017-10-20 Thread Mike Kravetz
On 10/19/2017 07:30 PM, Naoya Horiguchi wrote: > On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote: > > Thank you for addressing this. The patch itself looks good to me, but > the reported issue (negative reserve count) doesn't reproduce in my trial > with v4.14-rc5, so

Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL

2017-10-20 Thread Mike Kravetz
> sorry for the inconvenience. > > On 2017-10-08 18:47 Mike Kravetz wrote: >> You are correct. That check in function vma_to_resize() will prevent >> mremap from growing or relocating hugetlb backed mappings. This check >> existed in the 2.6.0 linux kernel, so this restriction

Re: [PATCH v3 0/9] memfd: add sealing to hugetlb-backed memory

2017-11-14 Thread Mike Kravetz
outstanding issue is sorting out the config option dependencies. Although, IMO this is not a strict requirement for this series. I have addressed this issue in a follow on series: http://lkml.kernel.org/r/20171109014109.21077-1-mike.krav...@oracle.com -- Mike Kravetz On 11/07/2017 04:27 AM, Marc-André

Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting

2017-10-23 Thread Mike Kravetz
On 10/23/2017 12:32 AM, Naoya Horiguchi wrote: > On Fri, Oct 20, 2017 at 10:49:46AM -0700, Mike Kravetz wrote: >> On 10/19/2017 07:30 PM, Naoya Horiguchi wrote: >>> On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote: >>> >>> Thank you for addressi

Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL

2017-10-23 Thread Mike Kravetz
viding a flag to mmap in > order to make hugepages work correctly. Well at least this has a built in fall back mechanism. When using hugetlb(fs) pages, you would need to handle the case where mremap fails due to lack of configured huge pages. I assume your allocator will be for somewhat general application usage. Yet, for the most reliability the user/admin will need to know at boot time how many huge pages will be needed and set that up. -- Mike Kravetz

Re: [RFC] mmap(MAP_CONTIG)

2017-10-24 Thread Mike Kravetz
On 10/23/2017 03:10 PM, Dave Hansen wrote: > On 10/03/2017 04:56 PM, Mike Kravetz wrote: >> mmap(MAP_CONTIG) would have the following semantics: >> - The entire mapping (length size) would be backed by physically contiguous >> pages. >> - If 'length' phys

Re: [Question] Should direct reclaim time be bounded?

2019-07-10 Thread Mike Kravetz
On 7/7/19 10:19 PM, Hillf Danton wrote: > On Mon, 01 Jul 2019 20:15:51 -0700 Mike Kravetz wrote: >> On 7/1/19 1:59 AM, Mel Gorman wrote: >>> >>> I think it would be reasonable to have should_continue_reclaim allow an >>> exit if scanning at higher priori

Re: [Question] Should direct reclaim time be bounded?

2019-07-10 Thread Mike Kravetz
On 7/10/19 12:44 PM, Michal Hocko wrote: > On Wed 10-07-19 11:42:40, Mike Kravetz wrote: > [...] >> As Michal suggested, I'm going to do some testing to see what impact >> dropping the __GFP_RETRY_MAYFAIL flag for these huge page allocations >> will have on the number of pa

Re: [Question] Should direct reclaim time be bounded?

2019-07-12 Thread Mike Kravetz
On 7/11/19 10:47 PM, Hillf Danton wrote: > > On Thu, 11 Jul 2019 02:42:56 +0800 Mike Kravetz wrote: >> >> It is quite easy to hit the condition where: >> nr_reclaimed == 0 && nr_scanned == 0 is true, but we skip the previous test >> > Then skipping ch

Re: [PATCH] mm/hugetlb.c: check the failure case for find_vma

2019-07-25 Thread Mike Kravetz
e routines (or their callers) it has been verified that address is within a vma. In addition, mmap_sem is held so that vmas can not change. Therefore, there should be no way for find_vma to return NULL here. Please let me know if there is something I have overlooked. Otherwise, there is no

<    7   8   9   10   11   12   13   14   15   16   >