Re: [RFC PATCH 2/3] mm, compaction: use MIN_COMPACT_COSTLY_PRIORITY everywhere for costly orders

2019-07-31 Thread Mike Kravetz
On 7/31/19 5:06 AM, Vlastimil Babka wrote: > On 7/24/19 7:50 PM, Mike Kravetz wrote: >> For PAGE_ALLOC_COSTLY_ORDER allocations, MIN_COMPACT_COSTLY_PRIORITY is >> minimum (highest priority). Other places in the compaction code key off >> of MIN_COMPACT_PRIORITY. Costly o

Re: [RFC PATCH 3/3] hugetlbfs: don't retry when pool page allocations start to fail

2019-07-25 Thread Mike Kravetz
On 7/25/19 1:13 AM, Mel Gorman wrote: > On Wed, Jul 24, 2019 at 10:50:14AM -0700, Mike Kravetz wrote: >> When allocating hugetlbfs pool pages via /proc/sys/vm/nr_hugepages, >> the pages will be interleaved between all nodes of the system. If >> nodes are not equal, it is q

Re: [PATCH] mm/hugetlb.c: check the failure case for find_vma

2019-07-25 Thread Mike Kravetz
e routines (or their callers) it has been verified that address is within a vma. In addition, mmap_sem is held so that vmas can not change. Therefore, there should be no way for find_vma to return NULL here. Please let me know if there is something I have overlooked. Otherwise, there is no

Re: [PATCH] mm/rmap.c: remove set but not used variable 'cstart'

2019-07-24 Thread Mike Kravetz
t; commit cdb07bdea28e ("mm/rmap.c: remove redundant variable cend") It appears Commit 0f10851ea475 ("mm/mmu_notifier: avoid double notification when it is useless") is what removed the use of cstart and cend. And, they should have been removed then. > Reported-by: Hulk Robot

[RFC PATCH 0/3] fix hugetlb page allocation stalls

2019-07-24 Thread Mike Kravetz
f Danton (1): mm, reclaim: make should_continue_reclaim perform dryrun detection Mike Kravetz (2): mm, compaction: use MIN_COMPACT_COSTLY_PRIORITY everywhere for costly orders hugetlbfs: don't retry when pool page allocations start to fail mm/compaction.c | 18 +++--- mm/h

[RFC PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection

2019-07-24 Thread Mike Kravetz
From: Hillf Danton Address the issue of should_continue_reclaim continuing true too often for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. This could happen during hugetlb page allocation causing stalls for minutes or hours. Restructure code so that false will be returned in t

[RFC PATCH 2/3] mm, compaction: use MIN_COMPACT_COSTLY_PRIORITY everywhere for costly orders

2019-07-24 Thread Mike Kravetz
. Signed-off-by: Mike Kravetz --- mm/compaction.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 952dc2fb24e5..325b746068d1 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2294,9 +2294,15 @@ static enum

[RFC PATCH 3/3] hugetlbfs: don't retry when pool page allocations start to fail

2019-07-24 Thread Mike Kravetz
will still succeed if there is memory available, but it will not try as hard to free up memory. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 87 ++-- 1 file changed, 77 insertions(+), 10 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index

Re: [Question] Should direct reclaim time be bounded?

2019-07-12 Thread Mike Kravetz
On 7/11/19 10:47 PM, Hillf Danton wrote: > > On Thu, 11 Jul 2019 02:42:56 +0800 Mike Kravetz wrote: >> >> It is quite easy to hit the condition where: >> nr_reclaimed == 0 && nr_scanned == 0 is true, but we skip the previous test >> > Then skipping check

Re: [Question] Should direct reclaim time be bounded?

2019-07-10 Thread Mike Kravetz
On 7/10/19 12:44 PM, Michal Hocko wrote: > On Wed 10-07-19 11:42:40, Mike Kravetz wrote: > [...] >> As Michal suggested, I'm going to do some testing to see what impact >> dropping the __GFP_RETRY_MAYFAIL flag for these huge page allocations >> will have on the number

Re: [Question] Should direct reclaim time be bounded?

2019-07-10 Thread Mike Kravetz
On 7/7/19 10:19 PM, Hillf Danton wrote: > On Mon, 01 Jul 2019 20:15:51 -0700 Mike Kravetz wrote: >> On 7/1/19 1:59 AM, Mel Gorman wrote: >>> >>> I think it would be reasonable to have should_continue_reclaim allow an >>> exit if scanning at higher priority

Re: [Question] Should direct reclaim time be bounded?

2019-07-04 Thread Mike Kravetz
On 7/4/19 4:09 AM, Michal Hocko wrote: > On Wed 03-07-19 16:54:35, Mike Kravetz wrote: >> On 7/3/19 2:43 AM, Mel Gorman wrote: >>> Indeed. I'm getting knocked offline shortly so I didn't give this the >>> time it deserves but it appears that part of this pro

Re: [Question] Should direct reclaim time be bounded?

2019-07-03 Thread Mike Kravetz
and __GFP_NORETRY and back to hopefully take into account transient conditions. >From 528c52397301f02acb614c610bd65f0f9a107481 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Wed, 3 Jul 2019 13:36:24 -0700 Subject: [PATCH] hugetlbfs: don't retry when pool page allocations start to fail When alloc

Re: [Question] Should direct reclaim time be bounded?

2019-07-01 Thread Mike Kravetz
On 7/1/19 1:59 AM, Mel Gorman wrote: > On Fri, Jun 28, 2019 at 11:20:42AM -0700, Mike Kravetz wrote: >> On 4/24/19 7:35 AM, Vlastimil Babka wrote: >>> On 4/23/19 6:39 PM, Mike Kravetz wrote: >>>>> That being said, I do not think __GFP_RETRY_MAYFAIL is wrong h

Re: [Question] Should direct reclaim time be bounded?

2019-06-28 Thread Mike Kravetz
On 4/24/19 7:35 AM, Vlastimil Babka wrote: > On 4/23/19 6:39 PM, Mike Kravetz wrote: >>> That being said, I do not think __GFP_RETRY_MAYFAIL is wrong here. It >>> looks like there is something wrong in the reclaim going on. >> >> Ok, I will start digging into that.

Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

2019-06-27 Thread Mike Kravetz
On 6/24/19 2:53 PM, Mike Kravetz wrote: > On 6/24/19 2:30 PM, Qian Cai wrote: >> So the problem is that ipcget_public() has held the semaphore "ids->rwsem" >> for >> too long seems unnecessarily and then goes to sleep sometimes due to direct >> rec

Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

2019-06-24 Thread Mike Kravetz
T1315] el0_svc_handler+0x19c/0x26c > [ 788.922088][ T1315] el0_svc+0x8/0xc > > Ideally, it seems only ipc_findkey() and newseg() in this path needs to hold > the > semaphore to protect concurrency access, so it could just be converted to a > spinlock instead. I do not have enough experience with this ipc code to comment on your proposed change. But, I will look into it. [1] https://lkml.org/lkml/2019/4/23/2 -- Mike Kravetz

Re: [PATCH v3 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge

2019-06-18 Thread Mike Kravetz
age is suitable for > dissolving, where we should return success for !PageHuge() case because > the given hugepage is considered as already dissolved. > > This change also affects other callers of dissolve_free_huge_page(), > which are cleaned up together. > > Reported-by: Che

Re: [PATCH v3 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails

2019-06-18 Thread Mike Kravetz
origuchi Thanks for the updates, Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH 2/3] hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race

2019-06-14 Thread Mike Kravetz
s patch? > I hope you do nothing with this as the patch is not upstream. -- Mike Kravetz

Re: [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge

2019-06-11 Thread Mike Kravetz
ast the PageHuge(page) check before calling dissolve_free_huge_page(). dissolve_free_huge_pages is called as part of memory offline processing. We do not know if the memory to be offlined contains huge pages or not. With your changes, we are taking hugetlb_lock on each call to dissolve_free_huge_page just to discover that the page is not a huge page. You 'could' add a PageHuge(page) check to dissolve_free_huge_page before taking the lock. However, you would need to check again after taking the lock. -- Mike Kravetz

Re: [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails

2019-06-10 Thread Mike Kravetz
ng to fix it. > > Signed-off-by: Naoya Horiguchi > Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining") > Cc: # v4.19+ Reviewed-by: Mike Kravetz To follow-up on Andrew's comment/question about user visible effects. Without this fix, there are cases where madvise

Re: question: should_compact_retry limit

2019-06-05 Thread Mike Kravetz
On 6/5/19 12:58 AM, Vlastimil Babka wrote: > On 6/5/19 1:30 AM, Mike Kravetz wrote: >> While looking at some really long hugetlb page allocation times, I noticed >> instances where should_compact_retry() was returning true more often that >> I expected. In one allocation atte

question: should_compact_retry limit

2019-06-04 Thread Mike Kravetz
goto out; } Just curious, is this intentional? -- Mike Kravetz

Re: [PATCH -mm] mm, swap: Fix bad swap file entry warning

2019-05-31 Thread Mike Kravetz
the swap devices that may cause warning messages. > > Fixes: 6a946753dbe6 ("mm/swap_state.c: simplify total_swapcache_pages() with > get_swap_device()") > Signed-off-by: "Huang, Ying" Thank you, this eliminates the messages for me: Tested-by: Mike Kravetz -- Mike Kravetz

Re: mmotm 2019-05-29-20-52 uploaded

2019-05-30 Thread Mike Kravetz
ould seem to be related to commit 3e2c19f9bef7e > * mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch -- Mike Kravetz

Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

2019-05-29 Thread Mike Kravetz
On 5/28/19 2:49 AM, Wanpeng Li wrote: > Cc Paolo, > Hi all, > On Wed, 14 Feb 2018 at 06:34, Mike Kravetz wrote: >> >> On 02/12/2018 06:48 PM, Michael Ellerman wrote: >>> Andrew Morton writes: >>> >>>> On Thu, 08 Feb 2018 12:30:45 + Punit

Re: [PATCH v1] mm: hugetlb: soft-offline: fix wrong return value of soft offline

2019-05-29 Thread Mike Kravetz
allers of dissolve_free_huge_page(), > which are also cleaned up by this patch. It may just be me, but I am having a hard time separating the fix for this issue from the change to the dissolve_free_huge_page routine. Would it be more clear or possible to create separate patches for these? -- Mike Kravetz

Re: [PATCH, RFC 2/2] Implement sharing/unsharing of PMDs for FS/DAX

2019-05-10 Thread Mike Kravetz
lb specific. I do not know if any of this applies in the case of DAX. -- Mike Kravetz

Re: [PATCH] hugetlbfs: always use address space in inode for resv_map pointer

2019-05-09 Thread Mike Kravetz
On 5/9/19 4:11 PM, Andrew Morton wrote: > On Wed, 8 May 2019 13:16:09 -0700 Mike Kravetz > wrote: > >>> I think it is better to add fixes label, like: >>> Fixes: 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map") >>> >>> Since the c

Re: [PATCH] hugetlbfs: always use address space in inode for resv_map pointer

2019-05-08 Thread Mike Kravetz
On 5/8/19 12:10 AM, yuyufen wrote: > On 2019/4/20 4:44, Mike Kravetz wrote: >> Continuing discussion about commit 58b6e5e8f1ad ("hugetlbfs: fix memory >> leak for resv_map") brought up the issue that inode->i_mapping may not >> point to the address space embedded

Re: [PATCH v2] mm/hugetlb: Don't put_page in lock of hugetlb_lock

2019-05-06 Thread Mike Kravetz
d-off-by: Kai Shen > Signed-off-by: Feilong Lin > Reported-by: Wang Wang > Acked-by: Michal Hocko Good catch. Sorry, for the late reply. Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [Question] Should direct reclaim time be bounded?

2019-04-23 Thread Mike Kravetz
On 4/23/19 12:19 AM, Michal Hocko wrote: > On Mon 22-04-19 21:07:28, Mike Kravetz wrote: >> In our distro kernel, I am thinking about making allocations try "less hard" >> on nodes where we start to see failures. less hard == NORETRY/NORECLAIM. >> I was going t

[Question] Should direct reclaim time be bounded?

2019-04-22 Thread Mike Kravetz
stuck, or trying really hard. My question is, "Is this expected or should direct reclaim be somewhat bounded?" With __alloc_pages_slowpath getting 'stuck' in direct reclaim, the documented behavior for huge page allocation is not going to happen. -- Mike Kravetz

[PATCH] hugetlbfs: always use address space in inode for resv_map pointer

2019-04-19 Thread Mike Kravetz
r to explicitly get it from the address space embedded within the inode. In addition, add more comments in the code to indicate why this is being done. Reported-by: Yufen Yu Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 11 +-- mm/hugetlb.c | 19 ++-

Re: [PATCH v2 2/2] hugetlb: use same fault hash key for shared and private mappings

2019-04-11 Thread Mike Kravetz
On 3/28/19 4:47 PM, Mike Kravetz wrote: > hugetlb uses a fault mutex hash table to prevent page faults of the > same pages concurrently. The key for shared and private mappings is > different. Shared keys off address_space and file index. Private > keys off mm and virtual address.

Re: [PATCH v2 0/2] A couple hugetlbfs fixes

2019-04-08 Thread Mike Kravetz
On 4/8/19 12:48 PM, Davidlohr Bueso wrote: > On Thu, 28 Mar 2019, Mike Kravetz wrote: > >> - A BUG can be triggered (not easily) due to temporarily mapping a >> page before doing a COW. > > But you actually _have_ seen it? Do you have the traces? I ask > not becaus

Re: [PATCH] mm/hugetlb: Get rid of NODEMASK_ALLOC

2019-04-02 Thread Mike Kravetz
ned-off-by: Oscar Salvador Not a huge deal, but a few typos in the commit message. Thanks for the clean up. Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH] mm/hugetlb: Get rid of NODEMASK_ALLOC

2019-04-02 Thread Mike Kravetz
t; have been -stableified? I also think not, but I > bet it happens anyway). I don't see a great reason for sending to stable. IIRC, nobody actually hit this issue: it was found through code inspection. -- Mike Kravetz

[PATCH] hugetlbfs: fix memory leak for resv_map

2019-04-01 Thread Mike Kravetz
tructures are only needed for inodes which can have associated page allocations. To fix the leak, only allocate resv_map for those inodes which could possibly be associated with page allocations. Reported-by: Yufen Yu Suggested-by: Yufen Yu Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 2

[PATCH v2 2/2] hugetlb: use same fault hash key for shared and private mappings

2019-03-28 Thread Mike Kravetz
getlb: improve page-fault scalability"). Since every hugetlb mapping (even anon and private) is actually a file mapping, just use the address_space index key for all mappings. This results in potentially more hash collisions. However, this should not be the common case. Signed-off-by: Mi

[PATCH v2 1/2] huegtlbfs: on restore reserve error path retain subpool reservation

2019-03-28 Thread Mike Kravetz
Size Used Avail Use% Mounted on nodev 4.0G -4.0M 4.1G- /opt/hugepool To fix, when freeing a huge page do not adjust filesystem usage if PagePrivate() is set to indicate the reservation should be restored. Signed-off-by: Mike Kravetz --- mm/huge

[PATCH v2 0/2] A couple hugetlbfs fixes

2019-03-28 Thread Mike Kravetz
nd is very hard to hit/reproduce. v2 - Update definition and all callers of hugetlb_fault_mutex_hash as the arguments mm and vma are no longer used or necessary. Mike Kravetz (2): huegtlbfs: on restore reserve error path retain subpool reservation hugetlb: use same fault hash key for

[PATCH REBASED] hugetlbfs: fix potential over/underflow setting node specific nr_hugepages

2019-03-28 Thread Mike Kravetz
, simply return ENOMEM. Reported-by: Jing Xiangfeng Signed-off-by: Mike Kravetz --- This was sent upstream during 5.1 merge window, but dropped as it was based on an earlier version of Alex Ghiti's patch which was dropped. Now rebased on top of Alex Ghiti's "[PATCH v8 0/4] Fix fre

Re: [PATCH] include/linux/hugetlb.h: Convert to use vm_fault_t

2019-03-18 Thread Mike Kravetz
nvert to return vm_fault_t type for hugetlb_fault() > when CONFIG_HUGETLB_PAGE =n. > > Signed-off-by: Souptick Joarder Thanks for fixing this. The BUG() here and in several other places in this file is unnecessary and IMO should be cleaned up. But that is beyond the scope of this fix. Added to my to do list. Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-13 Thread Mike Kravetz
On 3/13/19 4:55 PM, Andrea Arcangeli wrote: > On Wed, Mar 13, 2019 at 01:01:40PM -0700, Mike Kravetz wrote: >> On 3/13/19 11:52 AM, Andrea Arcangeli wrote: >>> Unless somebody suggests a consistent way to make hugetlbfs "just >>> work" (like we could achi

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-13 Thread Mike Kravetz
tup process enable uffd for all users. Correct? This may be too simple, and I don't really like group access, but how about just defining a uffd group? If you are in the group you can make uffd system calls. -- Mike Kravetz

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-13 Thread Mike Kravetz
On 3/12/19 11:00 PM, Peter Xu wrote: > On Tue, Mar 12, 2019 at 12:59:34PM -0700, Mike Kravetz wrote: >> On 3/11/19 2:36 AM, Peter Xu wrote: >>> >>> The "kvm" entry is a bit special here only to make sure that existing >>> users like QEMU/KVM won'

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-12 Thread Mike Kravetz
in controls who can have access to hugetlbfs, so I think adding code to the open routine as in patch 2 of this series would seem to work. However, I can imagine more special cases being added for other users. And, once you have more than one special case then you may want to combine them. For example, kvm and hugetlbfs together. -- Mike Kravetz

Re: [PATCH 2/2] hugetlb: use same fault hash key for shared and private mappings

2019-03-08 Thread Mike Kravetz
On 3/8/19 2:48 PM, Mike Kravetz wrote: > mm/hugetlb.c | 9 ++--- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 64ef640126cd..0527732c71f0 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@

[PATCH 1/2] huegtlbfs: on restore reserve error path retain subpool reservation

2019-03-08 Thread Mike Kravetz
Size Used Avail Use% Mounted on nodev 4.0G -4.0M 4.1G- /opt/hugepool To fix, when freeing a huge page do not adjust filesystem usage if PagePrivate() is set to indicate the reservation should be restored. Signed-off-by: Mike Kravetz --- mm/huge

[PATCH 0/2] A couple hugetlbfs fixes

2019-03-08 Thread Mike Kravetz
nd is very hard to hit/reproduce. Mike Kravetz (2): huegtlbfs: on restore reserve error path retain subpool reservation hugetlb: use same fault hash key for shared and private mappings mm/hugetlb.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) -- 2.17.2

[PATCH 2/2] hugetlb: use same fault hash key for shared and private mappings

2019-03-08 Thread Mike Kravetz
getlb: improve page-fault scalability"). Since every hugetlb mapping (even anon and private) is actually a file mapping, just use the address_space index key for all mappings. This results in potentially more hash collisions. However, this should not be the common case. Signed-off-by: Mi

Re: [PATCH v6 4/4] hugetlb: allow to free gigantic pages regardless of the configuration

2019-03-08 Thread Mike Kravetz
> Signed-off-by: Alexandre Ghiti > Acked-by: David S. Miller [sparc] Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH v2] hugetlbfs: fix memory leak for resv_map

2019-03-07 Thread Mike Kravetz
Adding others on Cc to see if they have comments or opinions. On 3/6/19 3:52 PM, Mike Kravetz wrote: > On 3/5/19 10:10 PM, Yufen Yu wrote: >> When .mknod create a block device file in hugetlbfs, it will >> allocate an inode, and kmalloc a 'struct resv_map' in resv_map_a

Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

2019-03-06 Thread Mike Kravetz
On 3/6/19 1:41 AM, Oscar Salvador wrote: > On Mon, Mar 04, 2019 at 08:15:40PM -0800, Mike Kravetz wrote: >> In addition, the code in __nr_hugepages_store_common() which tries to >> handle the case of not being able to allocate a node mask would likely >> result in incorrect b

Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

2019-03-05 Thread Mike Kravetz
On 3/5/19 1:16 PM, Andrew Morton wrote: > On Mon, 4 Mar 2019 20:15:40 -0800 Mike Kravetz > wrote: > >> Andrew, this is on top of Alexandre Ghiti's "hugetlb: allow to free gigantic >> pages regardless of the configuration" patch. Both patches modify >&

Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

2019-03-04 Thread Mike Kravetz
On 3/4/19 4:03 PM, Naoya Horiguchi wrote: > On Tue, Feb 26, 2019 at 04:03:23PM -0800, Mike Kravetz wrote: >> On 2/26/19 2:36 PM, Andrew Morton wrote: > ... >>>> >>>> + } else { >>>>/* >>>> - * per node hstate attrib

Re: [PATCH v4 4/4] hugetlb: allow to free gigantic pages regardless of the configuration

2019-03-01 Thread Mike Kravetz
On 3/1/19 5:21 AM, Alexandre Ghiti wrote: > On 03/01/2019 07:25 AM, Alex Ghiti wrote: >> On 2/28/19 5:26 PM, Mike Kravetz wrote: >>> On 2/28/19 12:23 PM, Dave Hansen wrote: >>>> On 2/28/19 11:50 AM, Mike Kravetz wrote: >>>>> On 2/28/19

Re: [RFC PATCH] mm,memory_hotplug: Unlock 1GB-hugetlb on x86_64

2019-02-27 Thread Mike Kravetz
when we want to allocate/use it. But, you would at least catch 'most' cases of looping forever. > But I would rather not convulate has_unmovable_pages() with such checks and > "trust" > the administrator. Agree -- Mike Kravetz

Re: [PATCH] huegtlbfs: fix races and page leaks during migration

2019-02-26 Thread Mike Kravetz
On 2/25/19 11:44 PM, Naoya Horiguchi wrote: > Hi Mike, > > On Thu, Feb 21, 2019 at 11:11:06AM -0800, Mike Kravetz wrote: ... >> From: Mike Kravetz >> Date: Thu, 21 Feb 2019 11:01:04 -0800 >> Subject: [PATCH] huegtlbfs: fix races and page leaks during migration > &

Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

2019-02-26 Thread Mike Kravetz
conversion routines. So, chances are good we would operate on the wrong node. The same goes for a request to 'free' huge pages. Since, we can't allocate a node mask we are likely to free them from the wrong node. Unless my reasoning above is incorrect, I think that final else block in __nr_hugepages_store_common() is wrong. Any additional thoughts? -- Mike Kravetz

Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

2019-02-26 Thread Mike Kravetz
On 2/25/19 10:21 PM, David Rientjes wrote: > On Tue, 26 Feb 2019, Jing Xiangfeng wrote: >> On 2019/2/26 3:17, David Rientjes wrote: >>> On Mon, 25 Feb 2019, Mike Kravetz wrote: >>> >>>> Ok, what about just moving the calculation/check inside the l

Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

2019-02-25 Thread Mike Kravetz
> On 2/24/19 7:17 PM, David Rientjes wrote: >> On Sun, 24 Feb 2019, Mike Kravetz wrote: >>>> @@ -2423,7 +2423,14 @@ static ssize_t __nr_hugepages_store_common(bool >>>> obey_mempolicy, >>>> * per node hstate attribute: adjust count to globa

Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

2019-02-25 Thread Mike Kravetz
On 2/24/19 7:17 PM, David Rientjes wrote: > On Sun, 24 Feb 2019, Mike Kravetz wrote: > >>> User can change a node specific hugetlb count. i.e. >>> /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages >>> the calculated value of count is a tot

Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

2019-02-24 Thread Mike Kravetz
gned-off-by: Jing Xiangfeng Thank you. Acked-by: Mike Kravetz > --- > mm/hugetlb.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index afef616..6688894 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2423

[LSF/MM ATTEND] MM track: contig allocation, thp numa, userfaultfd

2019-02-22 Thread Mike Kravetz
interest in Andrea's NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE proposal. - Userfaultfd and Peter Xu approach to write protect support. -- Mike Kravetz

Re: [PATCH v3] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

2019-02-22 Thread Mike Kravetz
* If user specified count causes overflow, set to * largest possible value. */ -- Mike Kravetz > + if (count < old_count) > + count = ULONG_MAX; > init_nodemask_of_node(nodes_allowed, nid); > } else > nodes_allowed = &node_states[N_MEMORY]; >

Re: [RFC PATCH] mm,memory_hotplug: Unlock 1GB-hugetlb on x86_64

2019-02-21 Thread Mike Kravetz
; > - page_huge_active(head)) > + if (page_huge_active(head)) I'm confused as to why the removal of the hugepage_migration_supported() check is required. Seems that commit aa9d95fa40a2 ("mm/hugetlb: enable arch specific huge page size support for migration") should make the check work as desired for all architectures. -- Mike Kravetz

Re: [PATCH] huegtlbfs: fix races and page leaks during migration

2019-02-21 Thread Mike Kravetz
On 2/20/19 10:09 PM, Andrew Morton wrote: > On Tue, 12 Feb 2019 14:14:00 -0800 Mike Kravetz > wrote: > > cc:stable. It would be nice to get some review of this one, please? > >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index a80832487981..f859e319e3eb 100644 >

Re: [RFC PATCH 00/31] Generating physically contiguous memory after page allocation

2019-02-19 Thread Mike Kravetz
On 2/19/19 9:19 PM, Zi Yan wrote: > On 19 Feb 2019, at 19:18, Mike Kravetz wrote: >> Another high level question. One of the benefits of this approach is >> that exchanging pages does not require N free pages as you describe >> above. This assumes that the vma which

Re: [RFC PATCH 00/31] Generating physically contiguous memory after page allocation

2019-02-19 Thread Mike Kravetz
On 2/19/19 6:33 PM, Zi Yan wrote: > On 19 Feb 2019, at 17:42, Mike Kravetz wrote: > >> On 2/15/19 2:08 PM, Zi Yan wrote: >> >> Thanks for working on this issue! >> >> I have not yet had a chance to take a look at the code. However, I do have >> some

Re: [RFC PATCH 00/31] Generating physically contiguous memory after page allocation

2019-02-19 Thread Mike Kravetz
d to keep trying? My apologies if this is addressed in the code. This was just one of the first thoughts that came to mine when giving the series a quick look. -- Mike Kravetz

Re: [PATCH] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

2019-02-19 Thread Mike Kravetz
ytes to overflow the node specific counts. In the case of a user entering a crazy high value and causing an overflow, an error return might not be out of line. Another option would be to simply set count to ULONG_MAX if we detect overflow (or UINT_MAX if we are paranoid) and continue on. This may be more in line with user's intention of allocating as many huge pages as possible. Thoughts? -- Mike Kravetz

Re: [PATCH] huegtlbfs: fix races and page leaks during migration

2019-02-13 Thread Mike Kravetz
On 2/12/19 2:14 PM, Mike Kravetz wrote: > > Hugetlb pages can also be leaked at migration time if the pages are > associated with a file in an explicitly mounted hugetlbfs filesystem. > For example, a test program which hole punches, faults and migrates > pages in such a file (1

Re: [PATCH] mm,memory_hotplug: Explicitly pass the head to isolate_huge_page

2019-02-12 Thread Mike Kravetz
sequent changes I suspsect this no longer works. > This check doesn't make much sense in principle. Why should we bail out > based on a section size? We are offlining a pfn range. All that we care > about is whether the hugetlb is migrateable. Yes. Do note that the do_migrate_range is only called from __offline_pages with a start_pfn that was returned by scan_movable_pages. scan_movable_pages has the hugepage_migration_supported check for PageHuge pages. So, it would seem to be redundant to do another check in do_migrate_range. -- Mike Kravetz

[PATCH] huegtlbfs: fix races and page leaks during migration

2019-02-12 Thread Mike Kravetz
22309c ("mm: hugetlb: introduce page_huge_active") Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 12 mm/hugetlb.c | 9 ++--- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 32920a10100e

Re: [PATCH] huegtlbfs: fix page leak during migration of file pages

2019-02-11 Thread Mike Kravetz
On 2/11/19 6:24 PM, Naoya Horiguchi wrote: > On Mon, Feb 11, 2019 at 03:06:27PM -0800, Mike Kravetz wrote: >> While looking at this, I think there is another issue. When a hugetlb >> page is migrated, we do not migrate the 'page_huge_active' state of the >> page. T

Re: [PATCH] huegtlbfs: fix page leak during migration of file pages

2019-02-11 Thread Mike Kravetz
On 2/7/19 11:31 PM, Naoya Horiguchi wrote: > On Thu, Feb 07, 2019 at 09:50:30PM -0800, Mike Kravetz wrote: >> On 2/7/19 6:31 PM, Naoya Horiguchi wrote: >>> On Thu, Feb 07, 2019 at 10:50:55AM -0800, Mike Kravetz wrote: >>>> On 1/30/19 1:14 PM, Mike Kravetz wrote: &g

Re: [PATCH] huegtlbfs: fix page leak during migration of file pages

2019-02-07 Thread Mike Kravetz
On 2/7/19 6:31 PM, Naoya Horiguchi wrote: > On Thu, Feb 07, 2019 at 10:50:55AM -0800, Mike Kravetz wrote: >> On 1/30/19 1:14 PM, Mike Kravetz wrote: >>> +++ b/fs/hugetlbfs/inode.c >>> @@ -859,6 +859,16 @@ static int hugetlbfs_migrate_page(struct address_spac

Re: [PATCH] huegtlbfs: fix page leak during migration of file pages

2019-02-07 Thread Mike Kravetz
On 1/30/19 1:14 PM, Mike Kravetz wrote: > Files can be created and mapped in an explicitly mounted hugetlbfs > filesystem. If pages in such files are migrated, the filesystem > usage will not be decremented for the associated pages. This can > result in mmap or page allocation fa

Re: [PATCH -next] hugetlbfs: a terminator for hugetlb_param_specs[]

2019-02-04 Thread Mike Kravetz
91.658122] do_mount+0x11f0/0x1640 > [ 91.658125] ksys_mount+0xc0/0xd0 > [ 91.658129] __arm64_sys_mount+0xcc/0xe4 > [ 91.658137] el0_svc_handler+0x28c/0x338 > [ 91.681740] el0_svc+0x8/0xc > > Fixes: 2284cf59cbce ("hugetlbfs: Convert to fs_context") > Signed-off-by:

Re: [PATCH] huegtlbfs: fix page leak during migration of file pages

2019-02-01 Thread Mike Kravetz
c_inode list") > 8b26ef98da33 ("f2fs: use rw_semaphore for nat entry lock") > 8c402946f074 ("f2fs: introduce the number of inode entries") > 9be32d72becc ("f2fs: do retry operations with cond_resched") > 9e4ded3f309e ("f2fs: act

[PATCH RFC v3 0/2] hugetlbfs: use i_mmap_rwsem for more synchronization

2019-02-01 Thread Mike Kravetz
code for page faults and it got so ugly and complicated I went down the path of adding synchronization to avoid the races. Suggestions on how to proceed would be appreciated. If you think the following patches are not too ugly, comments on those would also be welcome. Mike Kravetz (2):

[PATCH RFC v3 2/2] hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race

2019-02-01 Thread Mike Kravetz
mplified and removed. remove_inode_hugepages no longer needs to take hugetlb_fault_mutex in the case of truncation. Comments are expanded to explain reasoning behind locking. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 32 ++-- mm/hugetlb.c | 23 +++-

[PATCH RFC v3 1/2] hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization

2019-02-01 Thread Mike Kravetz
ges we do not have an associated vma. A new routine _get_hugetlb_page_mapping() will use anon_vma to get address_space in these cases. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 2 + include/linux/fs.h | 5 ++ include/linux/hugetlb.h | 8 +

Re: [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE

2019-01-30 Thread Mike Kravetz
support for normal base pages: not too well :). Once base page support is finalized, I suspect I will be involved in hugetlbfs support. -- Mike Kravetz

[PATCH] huegtlbfs: fix page leak during migration of file pages

2019-01-30 Thread Mike Kravetz
: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 10 ++ mm/migrate.c | 10 -- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 32

Re: [LSF/MM TOPIC] Page flags, can we free up space ?

2019-01-22 Thread Mike Kravetz
discuss my finding to > get people opinions on the matter. > > > I think everyone interested in mm will be interested in this topic :) Explicitly adding Matthew on Cc as I am pretty sure he has been working in this area. -- Mike Kravetz

Re: [PATCH] hugetlb: allow to free gigantic pages regardless of the configuration

2019-01-17 Thread Mike Kravetz
ntime allocation of gigantic pages is not supported, > one can still allocate boottime gigantic pages if the architecture supports > it. > > Signed-off-by: Alexandre Ghiti Thank you for doing this! Reviewed-by: Mike Kravetz > --- a/include/linux/gfp.h > +++ b/i

Re: [RFC PATCH] mm: align anon mmap for THP

2019-01-15 Thread Mike Kravetz
On 1/15/19 12:24 AM, Kirill A. Shutemov wrote: > On Mon, Jan 14, 2019 at 10:54:45AM -0800, Mike Kravetz wrote: >> On 1/14/19 7:35 AM, Steven Sistare wrote: >>> On 1/11/2019 6:28 PM, Mike Kravetz wrote: >>>> On 1/11/19 1:55 PM, Kirill A. Shutemov wrote: >>&g

Re: [RFC PATCH] mm: align anon mmap for THP

2019-01-14 Thread Mike Kravetz
On 1/14/19 7:35 AM, Steven Sistare wrote: > On 1/11/2019 6:28 PM, Mike Kravetz wrote: >> On 1/11/19 1:55 PM, Kirill A. Shutemov wrote: >>> On Fri, Jan 11, 2019 at 08:10:03PM +, Mike Kravetz wrote: >>>> At LPC last year, Boaz Harrosh asked why he had to '

Re: [RFC PATCH] mm: align anon mmap for THP

2019-01-11 Thread Mike Kravetz
On 1/11/19 1:55 PM, Kirill A. Shutemov wrote: > On Fri, Jan 11, 2019 at 08:10:03PM +0000, Mike Kravetz wrote: >> At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops' >> to get an address returned by mmap() suitably aligned for THP. It seems >&g

[RFC PATCH] mm: align anon mmap for THP

2019-01-11 Thread Mike Kravetz
difies the common vm_unmapped_area routine. It may be too simplistic, but I wanted to throw out some code while asking if something like this has ever been considered. Signed-off-by: Mike Kravetz --- include/linux/huge_mm.h | 6 ++ include/linux/mm.h | 3 +++ mm/mmap.c

Re: [PATCH 1/1] mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT

2019-01-08 Thread Mike Kravetz
asn't in the FOLL_NOWAIT case. > > Fixes: ce53053ce378 ("kvm: switch get_user_page_nowait() to > get_user_pages_unlocked()") > Signed-off-by: Andrea Arcangeli > Tested-by: "Dr. David Alan Gilbert" > Reported-by: "Dr. David Alan Gilbert" Tha

[PATCH 2/2] hugetlbfs: Revert "use i_mmap_rwsem for more pmd sharing synchronization"

2019-01-03 Thread Mike Kravetz
hugetlbfs synchronization issues. Therefore, revert this patch while working an another approach to the underlying issues. Reported-by: Jan Stancek Signed-off-by: Mike Kravetz --- mm/hugetlb.c| 64 +++-- mm/memory-failure.c | 16 ++-- mm

[PATCH 1/2] hugetlbfs: Revert "Use i_mmap_rwsem to fix page fault/truncate race"

2019-01-03 Thread Mike Kravetz
issue. Reported-by: Jan Stancek Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 61 mm/hugetlb.c | 21 +++ 2 files changed, 44 insertions(+), 38 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index

Re: [LKP] [hugetlbfs] 9c83282117: vm-scalability.throughput -4.3% regression

2019-01-02 Thread Mike Kravetz
ng synchronization") > url: > https://github.com/0day-ci/linux/commits/Mike-Kravetz/hugetlbfs-use-i_mmap_rwsem-for-better-synchronization/20181223-095226 > > > in testcase: vm-scalability > on test machine: 104 threads Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz > with 64G me

Re: bug report: hugetlbfs: use i_mmap_rwsem for more pmd sharing, synchronization

2018-12-27 Thread Mike Kravetz
On 12/27/18 6:45 PM, Andrew Morton wrote: > On Thu, 27 Dec 2018 11:24:31 -0800 Mike Kravetz > wrote: >> It would be better to make an explicit check for mapping != null before >> calling i_mmap_lock_write/try_to_unmap. In this way, unrelated changes to >> code above will

Re: bug report: hugetlbfs: use i_mmap_rwsem for more pmd sharing, synchronization

2018-12-27 Thread Mike Kravetz
On 12/27/18 3:44 AM, Colin Ian King wrote: > Hi, > > Static analysis with CoverityScan on linux-next detected a potential > null pointer dereference with the following commit: > > From d8a1051ed4ba55679ef24e838a1942c9c40f0a14 Mon Sep 17 00:00:00 2001 > From: Mike Kravetz

<    2   3   4   5   6   7   8   9   10   11   >