Re: [PATCH v3 3/8] mm/hugetlb: unify migration callbacks

2020-06-24 Thread Mike Kravetz
> Signed-off-by: Joonsoo Kim Thanks for consolidating these. Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v2 10/15] docs: hugetlbpage.rst: fix some warnings

2020-06-23 Thread Mike Kravetz
> parameter is preceded by an invalid hugepagesz parameter, it will > be ignored. > -default_hugepagesz - Specify the default huge page size. This parameter can > +default_hugepagesz > + pecify the default huge page size. This parameter can Oops, should be 'Spec

Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

2020-06-22 Thread Mike Kravetz
case of PMD sharing. I'm afraid a regression is unavoidable in that case. I'll put together a patch. -- Mike Kravetz

Re: [PATCH v4 1/2] hugetlb: use f_mode & FMODE_HUGETLBFS to identify hugetlbfs files

2020-06-15 Thread Mike Kravetz
On 6/15/20 12:53 AM, Miklos Szeredi wrote: > On Sat, Jun 13, 2020 at 9:12 PM Mike Kravetz wrote: >> On 6/12/20 11:53 PM, Amir Goldstein wrote: >>> >>> The simplest thing for you to do in order to shush syzbot is what procfs >>> does: >>> /* &g

Re: [PATCH v4 1/2] hugetlb: use f_mode & FMODE_HUGETLBFS to identify hugetlbfs files

2020-06-15 Thread Mike Kravetz
is? My apologies!!! I reviewed my testing and found that it was incorrectly writing to the lower filesystem. Writing to any file in the union will fail. -- Mike Kravetz

Re: [PATCH v4 1/2] hugetlb: use f_mode & FMODE_HUGETLBFS to identify hugetlbfs files

2020-06-13 Thread Mike Kravetz
; So you may only take that option if you do not care about the combination > of hugetlbfs with any of the above. > > overlayfs support of mmap is not as good as one might hope. > overlayfs.rst says: > "If a file residing on a lower layer is opened for read-only and then > memory mapped with MAP_SHARED, then subsequent changes to > the file are not reflected in the memory mapping." > > So if I were you, I wouldn't go trying to fix overlayfs-huguetlb interop... Thanks again, I'll look at something as simple as s_stack_depth. -- Mike Kravetz

Re: [PATCH v4 1/2] hugetlb: use f_mode & FMODE_HUGETLBFS to identify hugetlbfs files

2020-06-12 Thread Mike Kravetz
On 6/11/20 6:58 PM, Al Viro wrote: > On Thu, Jun 11, 2020 at 05:46:43PM -0700, Mike Kravetz wrote: >> The routine is_file_hugepages() checks f_op == hugetlbfs_file_operations >> to determine if the file resides in hugetlbfs. This is problematic when >> the file is on a union

[PATCH v4 2/2] ovl: call underlying get_unmapped_area() routine. propogate FMODE_HUGETLBFS

2020-06-11 Thread Mike Kravetz
t in the BUG as shown in [1]. [1] https://lore.kernel.org/linux-mm/b4684e05a2968...@google.com/ Reported-by: syzbot+d6ec23007e951dadf...@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi Signed-off-by: Mike Kravetz --- fs/overlayfs/file.c | 21 + 1 file cha

[PATCH v4 1/2] hugetlb: use f_mode & FMODE_HUGETLBFS to identify hugetlbfs files

2020-06-11 Thread Mike Kravetz
FS in overlayfs. Suggested-by: Al Viro Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 7 +++ fs/io_uring.c | 2 +- include/linux/fs.h | 3 +++ include/linux/hugetlb.h | 10 -- include/linux/shm.h | 5 - ipc/shm.c

Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()

2020-06-10 Thread Mike Kravetz
w! I knew adding a file op for this was overkill and was looking for other suggestions. -- Mike Kravetz

Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()

2020-06-10 Thread Mike Kravetz
On 6/4/20 2:16 AM, Miklos Szeredi wrote: > On Thu, May 28, 2020 at 11:01 PM Mike Kravetz wrote: >> >> Well yuck! get_unmapped_area is not part of mm_struct if !CONFIG_MMU. >> >> Miklos, would adding '#ifdef CONFIG_MMU' around the overlayfs code be too >&

Re: [PATCH 2/2] mm: hugetlb: fix the name of hugetlb CMA

2020-06-03 Thread Mike Kravetz
different CMA areas. > > Cc: Roman Gushchin > Signed-off-by: Barry Song Thank you Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH 1/2] mm: cma: fix the name of CMA areas

2020-06-03 Thread Mike Kravetz
-ENOMEM if users set name parameter as NULL. > > Cc: Roman Gushchin > Signed-off-by: Barry Song Thank you Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()

2020-05-28 Thread Mike Kravetz
patch is applied to the wrong git tree, please drop us a note to help > improve the system. BTW, we also suggest to use '--base' option to specify the > base tree in git format-patch, please see > https://stackoverflow.com/a/37406982] > > url: > https://githu

Re: kernel BUG at mm/hugetlb.c:LINE!

2020-05-27 Thread Mike Kravetz
On 5/22/20 3:05 AM, Miklos Szeredi wrote: > On Wed, May 20, 2020 at 10:27:15AM -0700, Mike Kravetz wrote: > >> I am fairly confident it is all about checking limits and alignment. The >> filesystem knows if it can/should align to base or huge page size. DAX has >> som

Re: [PATCH 06/11] mm/hugetlb: do not modify user provided gfp_mask

2020-05-21 Thread Mike Kravetz
gs as is done in the existing code does not bother me too much, but that is just my opinion. Adding __gfp_mask for modifications is fine with me if others think it is a good thing. Does dequeue_huge_page_vma() need to be modified so that it will set ac.__gfp_mask before calling dequeue_huge_page_nodemask

Re: [PATCH 05/11] mm/hugetlb: make hugetlb migration target allocation APIs CMA aware

2020-05-21 Thread Mike Kravetz
callee side > to get better result. > > Signed-off-by: Joonsoo Kim Thank you! Avoiding CMA works much better with this new skip_cma field. Acked-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH 04/11] mm/hugetlb: unify hugetlb migration callback function

2020-05-21 Thread Mike Kravetz
alloc_huge_page_nodemask() calling sequences. However, it appears that node (preferred_nid) is always set to something other than NUMA_NO_NODE in those callers. It obviously makes sense to add the field to guarantee no changes to functionality while making the conversions. However, it it is not

Re: [PATCH 03/11] mm/hugetlb: introduce alloc_control structure to simplify migration target allocation APIs

2020-05-21 Thread Mike Kravetz
compound_head(page)), > - preferred_nid, nodemask); > + if (PageHuge(page)) { > + struct hstate *h = page_hstate(page); I assume the removal of compound_head(page) was intentional? Just asking because PageHuge will look at head page while page_hstate will not. So, if passed a non-head page things could go bad. -- Mike Kravetz

Re: [PATCH 02/11] mm/migrate: move migration helper from .h to .c

2020-05-21 Thread Mike Kravetz
On 5/17/20 6:20 PM, js1...@gmail.com wrote: > From: Joonsoo Kim > > It's not performance sensitive function. Move it to .c. > This is a preparation step for future change. > > Signed-off-by: Joonsoo Kim Agreed, this is not performance sensitive and can be moved.

Re: kernel BUG at mm/hugetlb.c:LINE!

2020-05-20 Thread Mike Kravetz
On 5/20/20 4:20 AM, Miklos Szeredi wrote: > On Tue, May 19, 2020 at 2:35 AM Mike Kravetz wrote: >> >> On 5/18/20 4:41 PM, Colin Walters wrote: >>> >>> On Tue, May 12, 2020, at 11:04 AM, Miklos Szeredi wrote: >>> >>>>> However, in this

Re: kernel BUG at mm/hugetlb.c:LINE!

2020-05-18 Thread Mike Kravetz
dding whitelist capability to overlayfs. IMO - This BUG/report revealed two issues. First is the BUG by mmap'ing a hugetlbfs file on overlayfs. The other is that core mmap code will skip any filesystem specific get_unmapped area routine if on a union/overlay. My patch fixes both, but if we go with a whitelist approach and don't allow hugetlbfs I think we still need to address the filesystem specific get_unmapped area issue. That is easy enough to do by adding a routine to overlayfs which calls the routine for the underlying fs. -- Mike Kravetz

Re: kernel BUG at mm/hugetlb.c:LINE!

2020-05-18 Thread Mike Kravetz
On 5/18/20 4:12 AM, Miklos Szeredi wrote: > On Sat, May 16, 2020 at 12:15 AM Mike Kravetz wrote: >> Any suggestions on how to move forward? It seems like there may be the >> need for a real_file() routine? I see a d_real dentry_op was added to >> deal with this issue fo

Re: [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs

2020-05-17 Thread Mike Kravetz
apped_area. >> +*/ >> + if (mm->get_unmapped_area == arch_get_unmapped_area_topdown) >> + return hugetlb_get_unmapped_area_topdown(file, addr, len, >> + pgoff, flags); >> + return hugetlb_get_unmapped_area_botto

Re: kernel BUG at mm/hugetlb.c:LINE!

2020-05-15 Thread Mike Kravetz
On 5/12/20 11:11 AM, Mike Kravetz wrote: > On 5/12/20 8:04 AM, Miklos Szeredi wrote: >> On Tue, Apr 7, 2020 at 12:06 AM Mike Kravetz wrote: >>> On 4/5/20 8:06 PM, syzbot wrote: >>> >>> The routine is_file_hugepages() is just comparing the file ops to huegt

Re: [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs

2020-05-15 Thread Mike Kravetz
hat we call the bottomup routine in this default case. In reality, this does not impact powerpc as that architecture has it's own hugetlb_get_unmapped_area routine. Because of this, I suggest we add a comment above this code and switch the if/else order. For example, + /* +* Use mm->get_unmapped_area value as a hint to use topdown routine. +* If architectures have special needs, they should define their own +* version of hugetlb_get_unmapped_area. +*/ + if (mm->get_unmapped_area == arch_get_unmapped_area_topdown) + return hugetlb_get_unmapped_area_topdown(file, addr, len, + pgoff, flags); + return hugetlb_get_unmapped_area_bottomup(file, addr, len, + pgoff, flags); Thoughts? -- Mike Kravetz > } > #endif > >

Re: stable-rc 5.4: libhugetlbfs fallocate_stress.sh: Unable to handle kernel paging request at virtual address ffff00006772f000

2020-05-14 Thread Mike Kravetz
, but that is pretty straight forward. I'm guessing this may not reproduce easily. To help reproduce, you could change the #define FALLOCATE_ITERATIONS 10 in .../libhugetlbfs/tests/fallocate_stress.c to a larger number to force the stress test to run longer. -- Mike Kravetz

Re: kernel BUG at mm/hugetlb.c:LINE!

2020-05-12 Thread Mike Kravetz
On 5/12/20 8:04 AM, Miklos Szeredi wrote: > On Tue, Apr 7, 2020 at 12:06 AM Mike Kravetz wrote: >> On 4/5/20 8:06 PM, syzbot wrote: >> >> The routine is_file_hugepages() is just comparing the file ops to huegtlbfs: >> >> if (file-&g

Re: [PATCH V3 3/3] mm/hugetlb: Define a generic fallback for arch_clear_hugepage_flags()

2020-05-11 Thread Mike Kravetz
> Cc: Palmer Dabbelt > Cc: Heiko Carstens > Cc: Vasily Gorbik > Cc: Christian Borntraeger > Cc: Yoshinori Sato > Cc: Rich Felker > Cc: "David S. Miller" > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: "H. Peter Anvin&q

Re: [PATCH V3 2/3] mm/hugetlb: Define a generic fallback for is_hugepage_only_range()

2020-05-11 Thread Mike Kravetz
On 5/10/20 8:14 PM, Anshuman Khandual wrote: > On 05/09/2020 03:52 AM, Mike Kravetz wrote: >> On 5/7/20 8:07 PM, Anshuman Khandual wrote: >> >> Did you try building without CONFIG_HUGETLB_PAGE defined? I'm guessing > > Yes I did for multiple platforms (s39

Re: [PATCH V3 1/3] arm64/mm: Drop __HAVE_ARCH_HUGE_PTEP_GET

2020-05-11 Thread Mike Kravetz
On 5/10/20 9:02 PM, Anshuman Khandual wrote: > On 05/09/2020 03:39 AM, Mike Kravetz wrote: >> On 5/7/20 8:07 PM, Anshuman Khandual wrote: >> I know you made this change in response to Will's comment. And, since >> changes were made to consistently use READ_ONCE in arm

Re: [PATCH V3 2/3] mm/hugetlb: Define a generic fallback for is_hugepage_only_range()

2020-05-08 Thread Mike Kravetz
> Cc: Palmer Dabbelt > Cc: Heiko Carstens > Cc: Vasily Gorbik > Cc: Christian Borntraeger > Cc: Yoshinori Sato > Cc: Rich Felker > Cc: "David S. Miller" > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: "H. Peter Anvin" &

Re: [PATCH V3 1/3] arm64/mm: Drop __HAVE_ARCH_HUGE_PTEP_GET

2020-05-08 Thread Mike Kravetz
s not used before. Could this possibly introduce inconsistencies in their use of READ_ONCE? To be honest, I am not very good at identifying any possible issues this could cause. However, it does seem possible. Will was nervous about dropping this from arm64. I'm just a little nervous about adding it to other architectures. -- Mike Kravetz

[PATCH v4 4/4] hugetlbfs: clean up command line processing

2020-04-28 Thread Mike Kravetz
. However the bootmem allocator required for gigantic allocations is not available at this time. Signed-off-by: Mike Kravetz Acked-by: Gerald Schaefer [s390] Acked-by: Will Deacon Tested-by: Sandipan Das --- .../admin-guide/kernel-parameters.txt | 40 +++-- Documentation/admi

[PATCH v4 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-04-28 Thread Mike Kravetz
ed by some architectures to set up ALL huge pages sizes. Signed-off-by: Mike Kravetz Acked-by: Mina Almasry Reviewed-by: Peter Xu Acked-by: Gerald Schaefer [s390] Acked-by: Will Deacon --- arch/arm64/mm/hugetlbpage.c | 15 --- arch/powerpc/mm/hugetlbpage.c | 15 ---

[PATCH v4 3/4] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate

2020-04-28 Thread Mike Kravetz
t routine processing "hugepagesz=". After this, calls to size_to_hstate() in arch specific code can be removed and hugetlb_add_hstate can be called without worrying about warning messages. Signed-off-by: Mike Kravetz Acked-by: Mina Almasry Acked-by: Gerald Schaefer [s390] Acked-by: Will De

[PATCH v4 0/4] Clean up hugetlb boot command line processing

2020-04-28 Thread Mike Kravetz
an arch independent routine. - Clean up command line processing to follow desired semantics and document those semantics. [1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpe...@huawei.com Mike Kravetz (4): hugetlbfs: add arch_hugetlb_valid_size hugetlbfs: move hugepagesz= parsi

[PATCH v4 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-04-28 Thread Mike Kravetz
"hugepagesz=" in arch specific code to a common routine in arch independent code. Signed-off-by: Mike Kravetz Acked-by: Gerald Schaefer [s390] Acked-by: Will Deacon --- arch/arm64/mm/hugetlbpage.c | 17 + arch/powerpc/mm/hugetlbpage.c | 20 +--- arc

Re: [PATCH] hugetlbfs: add O_TMPFILE support

2019-10-22 Thread Mike Kravetz
On 10/22/19 12:09 AM, Piotr Sarna wrote: > On 10/21/19 7:17 PM, Mike Kravetz wrote: >> On 10/15/19 4:37 PM, Mike Kravetz wrote: >>> On 10/15/19 3:50 AM, Michal Hocko wrote: >>>> On Tue 15-10-19 11:01:12, Piotr Sarna wrote: >>>>> With hugetlbfs, a co

Re: [PATCH v6 5/9] hugetlb: disable region_add file_region coalescing

2019-10-21 Thread Mike Kravetz
resv->region_cache_count++; > - goto retry_locked; > } I know that I suggested allocating the worst case number of entries, but this is going to be too much of a hit for existing hugetlbfs users. It is not uncommon for DBs to have a shared areas in excess of 1TB mapped by hugetlbfs. With this new scheme, the above while loop will allocate over a half million file region entries and end up only using one. I think we need to step back and come up with a different approach. Let me give it some more thought before throwing out ideas that may waste more of your time. Sorry. -- Mike Kravetz

Re: [PATCH] hugetlbfs: add O_TMPFILE support

2019-10-21 Thread Mike Kravetz
On 10/15/19 4:37 PM, Mike Kravetz wrote: > On 10/15/19 3:50 AM, Michal Hocko wrote: >> On Tue 15-10-19 11:01:12, Piotr Sarna wrote: >>> With hugetlbfs, a common pattern for mapping anonymous huge pages >>> is to create a temporary file first. >> >> Really?

Re: [PATCH] hugetlb: Add nohugepages parameter to prevent hugepages creation

2019-10-18 Thread Mike Kravetz
om kdump > perspective. The trick part is exactly preventing the sysctl to get applied > heh > Please do let us know if this can be done in tooling. I am not opposed to the approach taken in your v2 patch as it essentially uses the hugepages_supported() functionality that exists today. However, it seems that other distros have ways around this issue. As such, I would prefer if the issue was addressed in the tooling. -- Mike Kravetz

Re: [PATCH] hugetlbfs: fix error handling in init_hugetlbfs_fs()

2019-10-17 Thread Mike Kravetz
Sorry for noise, left off David On 10/17/19 5:08 PM, Mike Kravetz wrote: > Cc: David > On 10/17/19 3:38 AM, Chengguang Xu wrote: >> In order to avoid using incorrect mnt, we should set >> mnt to NULL when we get error from mount_one_hugetlbfs(). >> >> Signed-off-by

Re: [PATCH] hugetlbfs: fix error handling in init_hugetlbfs_fs()

2019-10-17 Thread Mike Kravetz
x hstate. It now does that for the '0' hstate, and 0 is not always equal to default_hstate_idx. David was that intentional or an oversight? I can fix up, just wanted to make sure there was not some reason for the change. -- Mike Kravetz

Re: [PATCH V2] mm/page_alloc: Add alloc_contig_pages()

2019-10-16 Thread Mike Kravetz
ns in parallel. The new interface is pretty straight forward, but the idea was to stress the underlying code. In fact, it did identify issues with isolation which were corrected. I exercised this new interface in the same way and am happy to report that no issues were detected. -- Mike Kravetz

Re: [PATCH] hugetlb: Fix clang compilation warning

2019-10-16 Thread Mike Kravetz
/ext4/ialloc.o > > Fix the warning adding parentheses around the sizeof(u32) expression. > > Cc: Mike Kravetz > Signed-off-by: Vincenzo Frascino Thanks, However, this is already addressed in Andrew's tree. https://ozlabs.org/~akpm/mmotm/broken-out/hugetlbfs-hugetlb_fault_mutex_hash-cleanup.patch -- Mike Kravetz

Re: [PATCH] hugetlbfs: add O_TMPFILE support

2019-10-15 Thread Mike Kravetz
s implemented. So, that is why it does not make (more) use of that option. The implementation looks to be straight forward. However, I really do not want to add more functionality to hugetlbfs unless there is specific use case that needs it. -- Mike Kravetz

Re: [PATCH v1] hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()

2019-10-15 Thread Mike Kravetz
.@dhcp22.suse.cz > > Reported-by: Michal Hocko > Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to > zones until online") # visible after d0dc12e86b319 > Cc: sta...@vger.kernel.org # v4.13+ > Cc: Anshuman Khandual > Cc: Mike Kravetz >

Re: [PATCH] hugetlb: Add nohugepages parameter to prevent hugepages creation

2019-10-14 Thread Mike Kravetz
ified. Perhaps just a level of naming indirection. This would use the existing code to prevent all hugetlb usage. It seems like there may be some discussion about 'the right' way to do kdump. I can't add to that discussion, but if such an option as nohugepages is needed, I can help. -- Mike Kravetz

Re: [PATCH v5 0/7] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-10-14 Thread Mike Kravetz
On 10/11/19 1:41 PM, Mina Almasry wrote: > On Fri, Oct 11, 2019 at 12:10 PM Mina Almasry wrote: >> >> On Mon, Sep 23, 2019 at 10:47 AM Mike Kravetz >> wrote: >>> >>> On 9/19/19 3:24 PM, Mina Almasry wrote: >> >> Mike, note your suggestion a

Re: [PATCH -next] userfaultfd: remove set but not used variable 'h'

2019-10-09 Thread Mike Kravetz
On 10/9/19 8:30 PM, Wei Yang wrote: > On Wed, Oct 09, 2019 at 07:25:18PM -0700, Mike Kravetz wrote: >> On 10/9/19 6:23 PM, Wei Yang wrote: >>> On Wed, Oct 09, 2019 at 05:45:57PM -0700, Mike Kravetz wrote: >>>> On 10/9/19 5:27 AM, YueHaibing wrote: >>>

Re: [PATCH -next] userfaultfd: remove set but not used variable 'h'

2019-10-09 Thread Mike Kravetz
On 10/9/19 6:23 PM, Wei Yang wrote: > On Wed, Oct 09, 2019 at 05:45:57PM -0700, Mike Kravetz wrote: >> On 10/9/19 5:27 AM, YueHaibing wrote: >>> Fixes gcc '-Wunused-but-set-variable' warning: >>> >>> mm/userfaultfd.c: In function '__mcopy_at

Re: [PATCH -next] userfaultfd: remove set but not used variable 'h'

2019-10-09 Thread Mike Kravetz
> > It is not used since commit 78911d0e18ac ("userfaultfd: use vma_pagesize > for all huge page size calculation") > Thanks! That should have been removed with the recent cleanups. > Signed-off-by: YueHaibing Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively reclaim

2019-10-07 Thread Mike Kravetz
hat if b39d0ee2632d went forward there should be an exception for __GFP_RETRY_MAYFAIL requests. [1] https://lkml.kernel.org/r/3468b605-a3a9-6978-9699-57c52a90b...@oracle.com -- Mike Kravetz

Re: [PATCH] mm, hugetlb: allow hugepage allocations to excessively reclaim

2019-10-07 Thread Mike Kravetz
b39d0ee2632d to cause regressions and noticable behavior changes. My quick/limited testing in [1] was insufficient. It was also mentioned that if something like b39d0ee2632d went forward, I would like exemptions for __GFP_RETRY_MAYFAIL requests as in this patch. > > [mho...@suse.com: rewo

Re: [PATCH v5 0/7] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-09-27 Thread Mike Kravetz
On 9/27/19 3:51 PM, Mina Almasry wrote: > On Fri, Sep 27, 2019 at 2:59 PM Mike Kravetz wrote: >> >> On 9/26/19 5:55 PM, Mina Almasry wrote: >>> Provided we keep the existing controller untouched, should the new >>> controller track: >>> >>> 1.

Re: [PATCH v5 0/7] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-09-27 Thread Mike Kravetz
> fits all. > > I think the only sticking point left is whether an added controller > can support both cgroup-v2 and cgroup-v1. If I could get confirmation > on that I'll provide a patchset. Sorry, but I can not provide cgroup expertise. -- Mike Kravetz

Re: [PATCH v5 4/7] hugetlb: disable region_add file_region coalescing

2019-09-27 Thread Mike Kravetz
ress -= regions_needed; Consider this example, - region_chg(1,2) adds_in_progress = 1 cache entries 1 - region_chg(3,4) adds_in_progress = 2 cache entries 2 - region_chg(5,6) adds_in_progress = 3 cache entries 3 At this point, no region descriptors are in the map because only region_chg has been called. - region_chg(0,6) adds_in_progress = 4 cache entries 4 Is that correct so far? Then the following sequence happens, - region_add(1,2) adds_in_progress = 3 cache entries 3 - region_add(3,4) adds_in_progress = 2 cache entries 2 - region_add(5,6) adds_in_progress = 1 cache entries 1 list of region descriptors is: [1->2] [3->4] [5->6] - region_add(0,6) This is going to require 3 cache entries but only one is in the cache. I think we are going to BUG in get_file_region_entry_from_cache() the second time it is called from add_reservation_in_range(). I stopped looking at the code here as things will need to change if this is a real issue. -- Mike Kravetz

Re: [PATCH v5 0/7] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-09-26 Thread Mike Kravetz
ion of reservations and allocations? If a combined controller will work for new use cases, that would be my preference. Of course, I have not prototyped such a controller so there may be issues when we get into the details. For a reservation only or combined controller, the region_* changes proposed by Mina would be used. -- Mike Kravetz

Re: [PATCH v5 0/7] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-09-23 Thread Mike Kravetz
On 9/23/19 12:18 PM, Mina Almasry wrote: > On Mon, Sep 23, 2019 at 10:47 AM Mike Kravetz wrote: >> >> On 9/19/19 3:24 PM, Mina Almasry wrote: >>> Patch series implements hugetlb_cgroup reservation usage and limits, which >>> track hugetlb reservations rath

Re: [PATCH v5 0/7] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-09-23 Thread Mike Kravetz
sers. I really would like to get feedback from anyone that knows how the existing hugetlb cgroup controller may be used today. Comments from Aneesh would be very welcome to know if reservations were considered in development of the existing code. -- Mike Kravetz

[PATCH] hugetlbfs: hugetlb_fault_mutex_hash cleanup

2019-09-18 Thread Mike Kravetz
longer used. So, remove it from the definition and all callers. No functional change. Reported-by: Nathan Chancellor Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 4 ++-- include/linux/hugetlb.h | 2 +- mm/hugetlb.c| 10 +- mm/userfaultfd.c| 2 +- 4 f

Re: [PATCH v4 6/9] hugetlb: disable region_add file_region coalescing

2019-09-16 Thread Mike Kravetz
ere done in the region_chg call, and it was relatively easy to do in existing code when region_chg would only need one additional region at most. I'm thinking that we may have to make region_chg allocate the worst case number of regions (t - f)/2, OR change to the code such that region_add could return an error. -- Mike Kravetz

Re: [PATCH v4 5/9] hugetlb: remove duplicated code

2019-09-16 Thread Mike Kravetz
n_add, and I want to make that change in one place > only. It should improve maintainability anyway on its own. > > Signed-off-by: Mina Almasry Like the previous patch, this is a good improvement indepentent of the rest of the series. Thanks! Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v4 4/9] hugetlb: region_chg provides only cache entry

2019-09-16 Thread Mike Kravetz
h > region_del exists. > > Signed-off-by: Mina Almasry Thanks. I like this modification as it does simplify the code and could be added as a general cleanup independent of the other changes. Reviewed-by: Mike Kravetz -- Mike Kravetz > --- > mm/hugetlb.c | 63 +---

Re: [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge PMD

2019-09-12 Thread Mike Kravetz
tes the long stalls? If so, can you try the simple change of taking the semaphore in read mode in huge_pmd_share. -- Mike Kravetz

Re: [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge PMD

2019-09-11 Thread Mike Kravetz
to ask the question in case someone already knows. At one time, I thought it was safe to acquire the semaphore in read mode for huge_pmd_share, but write mode for huge_pmd_unshare. See commit b43a99900559. This was reverted along with another patch for other reasons. If we change change from write to read mode, this may have significant impact on the stalls. -- Mike Kravetz

Re: [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge PMD

2019-09-11 Thread Mike Kravetz
igger than you describe above. I have never looked at/for delays in these environments around pmd sharing (page faults), but that does not mean they do not exist. I will try to get the DB group to give me access to one of their large environments for analysis. We may want to consider making the timeout value and disable threshold user configurable. -- Mike Kravetz

Re: [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed

2019-09-05 Thread Mike Kravetz
Patch 3 in that series causes allocations to fail sooner in the case of COMPACT_DEFERRED: http://lkml.kernel.org/r/20190806014744.15446-4-mike.krav...@oracle.com hugetlb allocations have the __GFP_RETRY_MAYFAIL flag set. They are willing to retry and wait and callers are aware of this. Even though my limited testing did not show regressions caused by this patch, I would prefer if the quick exit did not apply to __GFP_RETRY_MAYFAIL requests. -- Mike Kravetz

Re: [PATCH v3 0/6] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-09-03 Thread Mike Kravetz
On 9/3/19 10:57 AM, Mike Kravetz wrote: > On 8/29/19 12:18 AM, Michal Hocko wrote: >> [Cc cgroups maintainers] >> >> On Wed 28-08-19 10:58:00, Mina Almasry wrote: >>> On Wed, Aug 28, 2019 at 4:23 AM Michal Hocko wrote: >>>> >>>> On Mon 26-0

Re: [PATCH v2] mm/hugetlb: avoid looping to the same hugepage if !pages and !vmas

2019-09-03 Thread Mike Kravetz
gt; + i += pages_per_huge_page(h); > + spin_unlock(ptl); > + continue; > + } > + > same_page: > if (pages) { > pages[i] = mem_map_offset(page, pfn_offset); > With a comment added to the code, Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v3 0/6] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-09-03 Thread Mike Kravetz
_* changes separately. If not a standalone patch, at least the first patch of the series. This new code will be exercised even if cgroup reservation accounting not enabled, so it is very important than no subtle regressions be introduced. -- Mike Kravetz

Re: [RFC PATCH v2 4/5] hugetlb_cgroup: Add accounting for shared mappings

2019-08-16 Thread Mike Kravetz
On 8/15/19 4:08 PM, Mina Almasry wrote: > On Tue, Aug 13, 2019 at 4:54 PM Mike Kravetz wrote: >>> mm/hugetlb.c | 208 +-- >>> 1 file changed, 170 insertions(+), 38 deletions(-) >>> >>> diff --git

Re: [RFC PATCH v2 4/5] hugetlb_cgroup: Add accounting for shared mappings

2019-08-16 Thread Mike Kravetz
On 8/15/19 4:04 PM, Mina Almasry wrote: > On Wed, Aug 14, 2019 at 9:46 AM Mike Kravetz wrote: >> >> On 8/13/19 4:54 PM, Mike Kravetz wrote: >>> On 8/8/19 4:13 PM, Mina Almasry wrote: >>>> For shared mappings, the pointer to the hugetlb_cgroup to uncharge li

Re: [RFC PATCH v2 4/5] hugetlb_cgroup: Add accounting for shared mappings

2019-08-14 Thread Mike Kravetz
On 8/13/19 4:54 PM, Mike Kravetz wrote: > On 8/8/19 4:13 PM, Mina Almasry wrote: >> For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives >> in the resv_map entries, in file_region->reservation_counter. >> >> When a file_region entry is added to t

Re: [RFC PATCH v2 4/5] hugetlb_cgroup: Add accounting for shared mappings

2019-08-13 Thread Mike Kravetz
> + if (!dry_run) { > + list_del(&rg->link); > + kfree(rg); Is it possible that the region struct we are deleting pointed to a reservation_counter? Perhaps even for another cgroup? Just concerned with the way regions are coalesced that we may be deleting counters. -- Mike Kravetz

Re: [RFC PATCH v2 0/5] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-08-13 Thread Mike Kravetz
On 8/10/19 3:01 PM, Mina Almasry wrote: > On Sat, Aug 10, 2019 at 11:58 AM Mike Kravetz wrote: >> >> On 8/9/19 12:42 PM, Mina Almasry wrote: >>> On Fri, Aug 9, 2019 at 10:54 AM Mike Kravetz >>> wrote: >>>> On 8/8/19 4:13 PM, Mina Almasry wrote: >&

Re: [RFC PATCH v2 0/5] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-08-10 Thread Mike Kravetz
On 8/9/19 12:42 PM, Mina Almasry wrote: > On Fri, Aug 9, 2019 at 10:54 AM Mike Kravetz wrote: >> On 8/8/19 4:13 PM, Mina Almasry wrote: >>> Problem: >>> Currently tasks attempting to allocate more hugetlb memory than is >>> available get >>> a f

Re: [RFC PATCH] hugetlbfs: Add hugetlb_cgroup reservation limits

2019-08-09 Thread Mike Kravetz
On 8/9/19 1:57 PM, Mina Almasry wrote: > On Fri, Aug 9, 2019 at 1:39 PM Mike Kravetz wrote: >> >> On 8/9/19 11:05 AM, Mina Almasry wrote: >>> On Fri, Aug 9, 2019 at 4:27 AM Michal Koutný wrote: >>>>> Alternatives considered: >>>>> [...] >

Re: [RFC PATCH] hugetlbfs: Add hugetlb_cgroup reservation limits

2019-08-09 Thread Mike Kravetz
hose 7 > pages, and will SIGBUS you when you try to access the remaining 2 > pages. So the problem persists. Folks would still like to know they > are crossing the limits on mmap time. If you got the failure at mmap time in the MAP_POPULATE case would this be useful? Just thinking that would be a relatively simple change. -- Mike Kravetz

Re: [RFC PATCH v2 0/5] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-08-09 Thread Mike Kravetz
ntents of the page cache to the resv_map to determine how many reservations were actually consumed. I did not look close enough to determine the code drops reservation usage counts as pages are added to shared mappings. -- Mike Kravetz

Re: [PATCH] hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS

2019-08-08 Thread Mike Kravetz
On 8/8/19 12:47 AM, Michal Hocko wrote: > On Thu 08-08-19 09:46:07, Michal Hocko wrote: >> On Wed 07-08-19 17:05:33, Mike Kravetz wrote: >>> Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS >>> in the kernel-v5.2.3 testing. This is caused by a ra

Re: [PATCH] hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS

2019-08-07 Thread Mike Kravetz
ptep))) goto backout; -- Mike Kravetz

[PATCH] hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS

2019-08-07 Thread Mike Kravetz
page table lock and check for huge_pte_none before returning an error. This is the same check that must be made further in the code even if page allocation is successful. Reported-by: Li Wang Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Mike Kravetz Tested-b

[PATCH v2 2/4] mm, reclaim: cleanup should_continue_reclaim()

2019-08-05 Thread Mike Kravetz
as been scanned" with nr_scanned == 0 didn't really work. Signed-off-by: Vlastimil Babka Acked-by: Mike Kravetz Signed-off-by: Mike Kravetz --- Commit message reformatted to avoid line wrap. mm/vmscan.c | 43 ++- 1 file changed, 14 insertions(

[PATCH v2 3/4] mm, compaction: raise compaction priority after it withdrawns

2019-08-05 Thread Mike Kravetz
From: Vlastimil Babka Mike Kravetz reports that "hugetlb allocations could stall for minutes or hours when should_compact_retry() would return true more often then it should. Specifically, this was in the case where compact_result was COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no pro

[PATCH v2 4/4] hugetlbfs: don't retry when pool page allocations start to fail

2019-08-05 Thread Mike Kravetz
will still succeed if there is memory available, but it will not try as hard to free up memory. Signed-off-by: Mike Kravetz --- v2 - Removed __GFP_NORETRY from bit mask allocations and added more comments. OK to pass NULL to NODEMASK_FREE. mm/hugetlb.c | 89

[PATCH v2 1/4] mm, reclaim: make should_continue_reclaim perform dryrun detection

2019-08-05 Thread Mike Kravetz
as we could. Cc: Mike Kravetz Cc: Mel Gorman Cc: Michal Hocko Cc: Vlastimil Babka Cc: Johannes Weiner Signed-off-by: Hillf Danton Tested-by: Mike Kravetz Acked-by: Mel Gorman Acked-by: Vlastimil Babka Signed-off-by: Mike Kravetz --- v2 - Updated commit message and added SOB. mm/vmscan.c

[PATCH v2 0/4] address hugetlb page allocation stalls

2019-08-05 Thread Mike Kravetz
n (1): mm, reclaim: make should_continue_reclaim perform dryrun detection Mike Kravetz (1): hugetlbfs: don't retry when pool page allocations start to fail Vlastimil Babka (2): mm, reclaim: cleanup should_continue_reclaim() mm, compaction: raise compaction priority after it withdrawns

Re: [PATCH 3/3] hugetlbfs: don't retry when pool page allocations start to fail

2019-08-05 Thread Mike Kravetz
On 8/5/19 2:28 AM, Vlastimil Babka wrote: > On 8/3/19 12:39 AM, Mike Kravetz wrote: >> When allocating hugetlbfs pool pages via /proc/sys/vm/nr_hugepages, >> the pages will be interleaved between all nodes of the system. If >> nodes are not equal, it is quite possible fo

Re: [PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection

2019-08-05 Thread Mike Kravetz
On 8/5/19 3:57 AM, Vlastimil Babka wrote: > On 8/5/19 10:42 AM, Vlastimil Babka wrote: >> On 8/3/19 12:39 AM, Mike Kravetz wrote: >>> From: Hillf Danton >>> >>> Address the issue of should_continue_reclaim continuing true too often >>> for __GFP_

Re: [PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection

2019-08-05 Thread Mike Kravetz
On 8/5/19 1:42 AM, Vlastimil Babka wrote: > On 8/3/19 12:39 AM, Mike Kravetz wrote: >> From: Hillf Danton >> >> Address the issue of should_continue_reclaim continuing true too often >> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. >> This

[PATCH 0/3] address hugetlb page allocation stalls

2019-08-02 Thread Mike Kravetz
] http://lkml.kernel.org/r/d38a095e-dc39-7e82-bb76-2c9247929...@oracle.com [2] http://lkml.kernel.org/r/20190724175014.9935-1-mike.krav...@oracle.com Hillf Danton (1): mm, reclaim: make should_continue_reclaim perform dryrun detection Mike Kravetz (1): hugetlbfs: don't retry when pool page

[PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection

2019-08-02 Thread Mike Kravetz
are not enough inactive lru pages left to satisfy the costly allocation. We can give up reclaiming pages too if we see dryrun occur, with the certainty of plenty of inactive pages. IOW with dryrun detected, we are sure we have reclaimed as many pages as we could. Cc: Mike Kravetz Cc: Mel Gorman

[PATCH 3/3] hugetlbfs: don't retry when pool page allocations start to fail

2019-08-02 Thread Mike Kravetz
will still succeed if there is memory available, but it will not try as hard to free up memory. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 86 ++-- 1 file changed, 76 insertions(+), 10 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index

[PATCH 2/3] mm, compaction: raise compaction priority after it withdrawns

2019-08-02 Thread Mike Kravetz
From: Vlastimil Babka Mike Kravetz reports that "hugetlb allocations could stall for minutes or hours when should_compact_retry() would return true more often then it should. Specifically, this was in the case where compact_result was COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no pro

Re: [RFC PATCH 2/3] mm, compaction: use MIN_COMPACT_COSTLY_PRIORITY everywhere for costly orders

2019-08-02 Thread Mike Kravetz
On 8/2/19 5:05 AM, Vlastimil Babka wrote: > > On 8/1/19 10:33 PM, Mike Kravetz wrote: >> On 8/1/19 6:01 AM, Vlastimil Babka wrote: >>> Could you try testing the patch below instead? It should hopefully >>> eliminate the stalls. If it makes hugepage allocation give u

Re: [RFC PATCH 2/3] mm, compaction: use MIN_COMPACT_COSTLY_PRIORITY everywhere for costly orders

2019-08-01 Thread Mike Kravetz
HP requests. Any suggestions on how to test that? -- Mike Kravetz > 8< > diff --git a/include/linux/compaction.h b/include/linux/compaction.h > index 9569e7c786d3..b8bfe8d5d2e9 100644 > --- a/include/linux/compaction.h > +++ b/include/linux/compaction.h > @@ -129,11 +129,7 @

Re: [RFC PATCH 3/3] hugetlbfs: don't retry when pool page allocations start to fail

2019-07-31 Thread Mike Kravetz
On 7/31/19 6:23 AM, Vlastimil Babka wrote: > On 7/25/19 7:15 PM, Mike Kravetz wrote: >> On 7/25/19 1:13 AM, Mel Gorman wrote: >>> On Wed, Jul 24, 2019 at 10:50:14AM -0700, Mike Kravetz wrote: >>> >>> set_max_huge_pages can fail the NODEMASK_ALLOC() alloc which

Re: [RFC PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection

2019-07-31 Thread Mike Kravetz
pages and none of those are reclaimed. Can we not get nr_scanned == 0 on an arbitrary chunk of the LRU? I must be missing something, because I do not see how nr_scanned == 0 guarantees a full scan. -- Mike Kravetz

<    1   2   3   4   5   6   7   8   9   10   >