> Signed-off-by: Joonsoo Kim
Thanks for consolidating these.
Reviewed-by: Mike Kravetz
--
Mike Kravetz
> parameter is preceded by an invalid hugepagesz parameter, it will
> be ignored.
> -default_hugepagesz - Specify the default huge page size. This parameter can
> +default_hugepagesz
> + pecify the default huge page size. This parameter can
Oops, should be 'Spec
case
of PMD sharing. I'm afraid a regression is unavoidable in that case.
I'll put together a patch.
--
Mike Kravetz
On 6/15/20 12:53 AM, Miklos Szeredi wrote:
> On Sat, Jun 13, 2020 at 9:12 PM Mike Kravetz wrote:
>> On 6/12/20 11:53 PM, Amir Goldstein wrote:
>>>
>>> The simplest thing for you to do in order to shush syzbot is what procfs
>>> does:
>>> /*
&g
is?
My apologies!!!
I reviewed my testing and found that it was incorrectly writing to the
lower filesystem. Writing to any file in the union will fail.
--
Mike Kravetz
; So you may only take that option if you do not care about the combination
> of hugetlbfs with any of the above.
>
> overlayfs support of mmap is not as good as one might hope.
> overlayfs.rst says:
> "If a file residing on a lower layer is opened for read-only and then
> memory mapped with MAP_SHARED, then subsequent changes to
> the file are not reflected in the memory mapping."
>
> So if I were you, I wouldn't go trying to fix overlayfs-huguetlb interop...
Thanks again,
I'll look at something as simple as s_stack_depth.
--
Mike Kravetz
On 6/11/20 6:58 PM, Al Viro wrote:
> On Thu, Jun 11, 2020 at 05:46:43PM -0700, Mike Kravetz wrote:
>> The routine is_file_hugepages() checks f_op == hugetlbfs_file_operations
>> to determine if the file resides in hugetlbfs. This is problematic when
>> the file is on a union
t in the BUG as shown in [1].
[1] https://lore.kernel.org/linux-mm/b4684e05a2968...@google.com/
Reported-by: syzbot+d6ec23007e951dadf...@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi
Signed-off-by: Mike Kravetz
---
fs/overlayfs/file.c | 21 +
1 file cha
FS in overlayfs.
Suggested-by: Al Viro
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c| 7 +++
fs/io_uring.c | 2 +-
include/linux/fs.h | 3 +++
include/linux/hugetlb.h | 10 --
include/linux/shm.h | 5 -
ipc/shm.c
w!
I knew adding a file op for this was overkill and was looking for other
suggestions.
--
Mike Kravetz
On 6/4/20 2:16 AM, Miklos Szeredi wrote:
> On Thu, May 28, 2020 at 11:01 PM Mike Kravetz wrote:
>>
>> Well yuck! get_unmapped_area is not part of mm_struct if !CONFIG_MMU.
>>
>> Miklos, would adding '#ifdef CONFIG_MMU' around the overlayfs code be too
>&
different CMA areas.
>
> Cc: Roman Gushchin
> Signed-off-by: Barry Song
Thank you
Reviewed-by: Mike Kravetz
--
Mike Kravetz
-ENOMEM if users set name parameter as NULL.
>
> Cc: Roman Gushchin
> Signed-off-by: Barry Song
Thank you
Reviewed-by: Mike Kravetz
--
Mike Kravetz
patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see
> https://stackoverflow.com/a/37406982]
>
> url:
> https://githu
On 5/22/20 3:05 AM, Miklos Szeredi wrote:
> On Wed, May 20, 2020 at 10:27:15AM -0700, Mike Kravetz wrote:
>
>> I am fairly confident it is all about checking limits and alignment. The
>> filesystem knows if it can/should align to base or huge page size. DAX has
>> som
gs as is done in the existing code does not bother me
too much, but that is just my opinion. Adding __gfp_mask for modifications
is fine with me if others think it is a good thing.
Does dequeue_huge_page_vma() need to be modified so that it will set
ac.__gfp_mask before calling dequeue_huge_page_nodemask
callee side
> to get better result.
>
> Signed-off-by: Joonsoo Kim
Thank you!
Avoiding CMA works much better with this new skip_cma field.
Acked-by: Mike Kravetz
--
Mike Kravetz
alloc_huge_page_nodemask() calling sequences.
However, it appears that node (preferred_nid) is always set to something
other than NUMA_NO_NODE in those callers.
It obviously makes sense to add the field to guarantee no changes to
functionality while making the conversions. However, it it is not
compound_head(page)),
> - preferred_nid, nodemask);
> + if (PageHuge(page)) {
> + struct hstate *h = page_hstate(page);
I assume the removal of compound_head(page) was intentional? Just asking
because PageHuge will look at head page while page_hstate will not. So,
if passed a non-head page things could go bad.
--
Mike Kravetz
On 5/17/20 6:20 PM, js1...@gmail.com wrote:
> From: Joonsoo Kim
>
> It's not performance sensitive function. Move it to .c.
> This is a preparation step for future change.
>
> Signed-off-by: Joonsoo Kim
Agreed, this is not performance sensitive and can be moved.
On 5/20/20 4:20 AM, Miklos Szeredi wrote:
> On Tue, May 19, 2020 at 2:35 AM Mike Kravetz wrote:
>>
>> On 5/18/20 4:41 PM, Colin Walters wrote:
>>>
>>> On Tue, May 12, 2020, at 11:04 AM, Miklos Szeredi wrote:
>>>
>>>>> However, in this
dding whitelist
capability to overlayfs.
IMO - This BUG/report revealed two issues. First is the BUG by mmap'ing
a hugetlbfs file on overlayfs. The other is that core mmap code will skip
any filesystem specific get_unmapped area routine if on a union/overlay.
My patch fixes both, but if we go with a whitelist approach and don't allow
hugetlbfs I think we still need to address the filesystem specific
get_unmapped area issue. That is easy enough to do by adding a routine to
overlayfs which calls the routine for the underlying fs.
--
Mike Kravetz
On 5/18/20 4:12 AM, Miklos Szeredi wrote:
> On Sat, May 16, 2020 at 12:15 AM Mike Kravetz wrote:
>> Any suggestions on how to move forward? It seems like there may be the
>> need for a real_file() routine? I see a d_real dentry_op was added to
>> deal with this issue fo
apped_area.
>> +*/
>> + if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
>> + return hugetlb_get_unmapped_area_topdown(file, addr, len,
>> + pgoff, flags);
>> + return hugetlb_get_unmapped_area_botto
On 5/12/20 11:11 AM, Mike Kravetz wrote:
> On 5/12/20 8:04 AM, Miklos Szeredi wrote:
>> On Tue, Apr 7, 2020 at 12:06 AM Mike Kravetz wrote:
>>> On 4/5/20 8:06 PM, syzbot wrote:
>>>
>>> The routine is_file_hugepages() is just comparing the file ops to huegt
hat we call the bottomup routine in this
default case.
In reality, this does not impact powerpc as that architecture has it's
own hugetlb_get_unmapped_area routine.
Because of this, I suggest we add a comment above this code and switch
the if/else order. For example,
+ /*
+* Use mm->get_unmapped_area value as a hint to use topdown routine.
+* If architectures have special needs, they should define their own
+* version of hugetlb_get_unmapped_area.
+*/
+ if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
+ return hugetlb_get_unmapped_area_topdown(file, addr, len,
+ pgoff, flags);
+ return hugetlb_get_unmapped_area_bottomup(file, addr, len,
+ pgoff, flags);
Thoughts?
--
Mike Kravetz
> }
> #endif
>
>
, but that is pretty
straight forward.
I'm guessing this may not reproduce easily. To help reproduce, you could
change the
#define FALLOCATE_ITERATIONS 10
in .../libhugetlbfs/tests/fallocate_stress.c to a larger number to force
the stress test to run longer.
--
Mike Kravetz
On 5/12/20 8:04 AM, Miklos Szeredi wrote:
> On Tue, Apr 7, 2020 at 12:06 AM Mike Kravetz wrote:
>> On 4/5/20 8:06 PM, syzbot wrote:
>>
>> The routine is_file_hugepages() is just comparing the file ops to huegtlbfs:
>>
>> if (file-&g
> Cc: Palmer Dabbelt
> Cc: Heiko Carstens
> Cc: Vasily Gorbik
> Cc: Christian Borntraeger
> Cc: Yoshinori Sato
> Cc: Rich Felker
> Cc: "David S. Miller"
> Cc: Thomas Gleixner
> Cc: Ingo Molnar
> Cc: Borislav Petkov
> Cc: "H. Peter Anvin&q
On 5/10/20 8:14 PM, Anshuman Khandual wrote:
> On 05/09/2020 03:52 AM, Mike Kravetz wrote:
>> On 5/7/20 8:07 PM, Anshuman Khandual wrote:
>>
>> Did you try building without CONFIG_HUGETLB_PAGE defined? I'm guessing
>
> Yes I did for multiple platforms (s39
On 5/10/20 9:02 PM, Anshuman Khandual wrote:
> On 05/09/2020 03:39 AM, Mike Kravetz wrote:
>> On 5/7/20 8:07 PM, Anshuman Khandual wrote:
>> I know you made this change in response to Will's comment. And, since
>> changes were made to consistently use READ_ONCE in arm
> Cc: Palmer Dabbelt
> Cc: Heiko Carstens
> Cc: Vasily Gorbik
> Cc: Christian Borntraeger
> Cc: Yoshinori Sato
> Cc: Rich Felker
> Cc: "David S. Miller"
> Cc: Thomas Gleixner
> Cc: Ingo Molnar
> Cc: Borislav Petkov
> Cc: "H. Peter Anvin"
&
s not used before. Could this possibly
introduce inconsistencies in their use of READ_ONCE? To be honest, I
am not very good at identifying any possible issues this could cause.
However, it does seem possible.
Will was nervous about dropping this from arm64. I'm just a little nervous
about adding it to other architectures.
--
Mike Kravetz
. However the bootmem allocator required
for gigantic allocations is not available at this time.
Signed-off-by: Mike Kravetz
Acked-by: Gerald Schaefer [s390]
Acked-by: Will Deacon
Tested-by: Sandipan Das
---
.../admin-guide/kernel-parameters.txt | 40 +++--
Documentation/admi
ed by some
architectures to set up ALL huge pages sizes.
Signed-off-by: Mike Kravetz
Acked-by: Mina Almasry
Reviewed-by: Peter Xu
Acked-by: Gerald Schaefer [s390]
Acked-by: Will Deacon
---
arch/arm64/mm/hugetlbpage.c | 15 ---
arch/powerpc/mm/hugetlbpage.c | 15 ---
t routine processing "hugepagesz=".
After this, calls to size_to_hstate() in arch specific code can be
removed and hugetlb_add_hstate can be called without worrying about
warning messages.
Signed-off-by: Mike Kravetz
Acked-by: Mina Almasry
Acked-by: Gerald Schaefer [s390]
Acked-by: Will De
an arch independent routine.
- Clean up command line processing to follow desired semantics and
document those semantics.
[1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpe...@huawei.com
Mike Kravetz (4):
hugetlbfs: add arch_hugetlb_valid_size
hugetlbfs: move hugepagesz= parsi
"hugepagesz=" in arch specific code to a common
routine in arch independent code.
Signed-off-by: Mike Kravetz
Acked-by: Gerald Schaefer [s390]
Acked-by: Will Deacon
---
arch/arm64/mm/hugetlbpage.c | 17 +
arch/powerpc/mm/hugetlbpage.c | 20 +---
arc
On 10/22/19 12:09 AM, Piotr Sarna wrote:
> On 10/21/19 7:17 PM, Mike Kravetz wrote:
>> On 10/15/19 4:37 PM, Mike Kravetz wrote:
>>> On 10/15/19 3:50 AM, Michal Hocko wrote:
>>>> On Tue 15-10-19 11:01:12, Piotr Sarna wrote:
>>>>> With hugetlbfs, a co
resv->region_cache_count++;
> - goto retry_locked;
> }
I know that I suggested allocating the worst case number of entries, but this
is going to be too much of a hit for existing hugetlbfs users. It is not
uncommon for DBs to have a shared areas in excess of 1TB mapped by hugetlbfs.
With this new scheme, the above while loop will allocate over a half million
file region entries and end up only using one.
I think we need to step back and come up with a different approach. Let me
give it some more thought before throwing out ideas that may waste more of
your time. Sorry.
--
Mike Kravetz
On 10/15/19 4:37 PM, Mike Kravetz wrote:
> On 10/15/19 3:50 AM, Michal Hocko wrote:
>> On Tue 15-10-19 11:01:12, Piotr Sarna wrote:
>>> With hugetlbfs, a common pattern for mapping anonymous huge pages
>>> is to create a temporary file first.
>>
>> Really?
om kdump
> perspective. The trick part is exactly preventing the sysctl to get applied
> heh
>
Please do let us know if this can be done in tooling.
I am not opposed to the approach taken in your v2 patch as it essentially
uses the hugepages_supported() functionality that exists today. However,
it seems that other distros have ways around this issue. As such, I would
prefer if the issue was addressed in the tooling.
--
Mike Kravetz
Sorry for noise, left off David
On 10/17/19 5:08 PM, Mike Kravetz wrote:
> Cc: David
> On 10/17/19 3:38 AM, Chengguang Xu wrote:
>> In order to avoid using incorrect mnt, we should set
>> mnt to NULL when we get error from mount_one_hugetlbfs().
>>
>> Signed-off-by
x hstate. It now does
that for the '0' hstate, and 0 is not always equal to default_hstate_idx.
David was that intentional or an oversight? I can fix up, just wanted to
make sure there was not some reason for the change.
--
Mike Kravetz
ns
in parallel. The new interface is pretty straight forward, but the idea
was to stress the underlying code. In fact, it did identify issues with
isolation which were corrected.
I exercised this new interface in the same way and am happy to report that
no issues were detected.
--
Mike Kravetz
/ext4/ialloc.o
>
> Fix the warning adding parentheses around the sizeof(u32) expression.
>
> Cc: Mike Kravetz
> Signed-off-by: Vincenzo Frascino
Thanks,
However, this is already addressed in Andrew's tree.
https://ozlabs.org/~akpm/mmotm/broken-out/hugetlbfs-hugetlb_fault_mutex_hash-cleanup.patch
--
Mike Kravetz
s implemented. So, that is why it does not make (more) use of that
option.
The implementation looks to be straight forward. However, I really do
not want to add more functionality to hugetlbfs unless there is specific
use case that needs it.
--
Mike Kravetz
.@dhcp22.suse.cz
>
> Reported-by: Michal Hocko
> Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to
> zones until online") # visible after d0dc12e86b319
> Cc: sta...@vger.kernel.org # v4.13+
> Cc: Anshuman Khandual
> Cc: Mike Kravetz
>
ified. Perhaps just a level of naming
indirection. This would use the existing code to prevent all hugetlb usage.
It seems like there may be some discussion about 'the right' way to
do kdump. I can't add to that discussion, but if such an option as
nohugepages is needed, I can help.
--
Mike Kravetz
On 10/11/19 1:41 PM, Mina Almasry wrote:
> On Fri, Oct 11, 2019 at 12:10 PM Mina Almasry wrote:
>>
>> On Mon, Sep 23, 2019 at 10:47 AM Mike Kravetz
>> wrote:
>>>
>>> On 9/19/19 3:24 PM, Mina Almasry wrote:
>>
>> Mike, note your suggestion a
On 10/9/19 8:30 PM, Wei Yang wrote:
> On Wed, Oct 09, 2019 at 07:25:18PM -0700, Mike Kravetz wrote:
>> On 10/9/19 6:23 PM, Wei Yang wrote:
>>> On Wed, Oct 09, 2019 at 05:45:57PM -0700, Mike Kravetz wrote:
>>>> On 10/9/19 5:27 AM, YueHaibing wrote:
>>>
On 10/9/19 6:23 PM, Wei Yang wrote:
> On Wed, Oct 09, 2019 at 05:45:57PM -0700, Mike Kravetz wrote:
>> On 10/9/19 5:27 AM, YueHaibing wrote:
>>> Fixes gcc '-Wunused-but-set-variable' warning:
>>>
>>> mm/userfaultfd.c: In function '__mcopy_at
>
> It is not used since commit 78911d0e18ac ("userfaultfd: use vma_pagesize
> for all huge page size calculation")
>
Thanks! That should have been removed with the recent cleanups.
> Signed-off-by: YueHaibing
Reviewed-by: Mike Kravetz
--
Mike Kravetz
hat if b39d0ee2632d went
forward there should be an exception for __GFP_RETRY_MAYFAIL requests.
[1] https://lkml.kernel.org/r/3468b605-a3a9-6978-9699-57c52a90b...@oracle.com
--
Mike Kravetz
b39d0ee2632d to cause regressions and noticable
behavior changes.
My quick/limited testing in [1] was insufficient. It was also mentioned that
if something like b39d0ee2632d went forward, I would like exemptions for
__GFP_RETRY_MAYFAIL requests as in this patch.
>
> [mho...@suse.com: rewo
On 9/27/19 3:51 PM, Mina Almasry wrote:
> On Fri, Sep 27, 2019 at 2:59 PM Mike Kravetz wrote:
>>
>> On 9/26/19 5:55 PM, Mina Almasry wrote:
>>> Provided we keep the existing controller untouched, should the new
>>> controller track:
>>>
>>> 1.
> fits all.
>
> I think the only sticking point left is whether an added controller
> can support both cgroup-v2 and cgroup-v1. If I could get confirmation
> on that I'll provide a patchset.
Sorry, but I can not provide cgroup expertise.
--
Mike Kravetz
ress -= regions_needed;
Consider this example,
- region_chg(1,2)
adds_in_progress = 1
cache entries 1
- region_chg(3,4)
adds_in_progress = 2
cache entries 2
- region_chg(5,6)
adds_in_progress = 3
cache entries 3
At this point, no region descriptors are in the map because only
region_chg has been called.
- region_chg(0,6)
adds_in_progress = 4
cache entries 4
Is that correct so far?
Then the following sequence happens,
- region_add(1,2)
adds_in_progress = 3
cache entries 3
- region_add(3,4)
adds_in_progress = 2
cache entries 2
- region_add(5,6)
adds_in_progress = 1
cache entries 1
list of region descriptors is:
[1->2] [3->4] [5->6]
- region_add(0,6)
This is going to require 3 cache entries but only one is in the cache.
I think we are going to BUG in get_file_region_entry_from_cache() the
second time it is called from add_reservation_in_range().
I stopped looking at the code here as things will need to change if this
is a real issue.
--
Mike Kravetz
ion of reservations and allocations? If a combined
controller will work for new use cases, that would be my preference. Of
course, I have not prototyped such a controller so there may be issues when
we get into the details. For a reservation only or combined controller,
the region_* changes proposed by Mina would be used.
--
Mike Kravetz
On 9/23/19 12:18 PM, Mina Almasry wrote:
> On Mon, Sep 23, 2019 at 10:47 AM Mike Kravetz wrote:
>>
>> On 9/19/19 3:24 PM, Mina Almasry wrote:
>>> Patch series implements hugetlb_cgroup reservation usage and limits, which
>>> track hugetlb reservations rath
sers.
I really would like to get feedback from anyone that knows how the existing
hugetlb cgroup controller may be used today. Comments from Aneesh would
be very welcome to know if reservations were considered in development of the
existing code.
--
Mike Kravetz
longer used. So, remove it from the
definition and all callers.
No functional change.
Reported-by: Nathan Chancellor
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c| 4 ++--
include/linux/hugetlb.h | 2 +-
mm/hugetlb.c| 10 +-
mm/userfaultfd.c| 2 +-
4 f
ere done in the
region_chg call, and it was relatively easy to do in existing code when
region_chg would only need one additional region at most.
I'm thinking that we may have to make region_chg allocate the worst case
number of regions (t - f)/2, OR change to the code such that region_add
could return an error.
--
Mike Kravetz
n_add, and I want to make that change in one place
> only. It should improve maintainability anyway on its own.
>
> Signed-off-by: Mina Almasry
Like the previous patch, this is a good improvement indepentent of the
rest of the series. Thanks!
Reviewed-by: Mike Kravetz
--
Mike Kravetz
h
> region_del exists.
>
> Signed-off-by: Mina Almasry
Thanks. I like this modification as it does simplify the code and could
be added as a general cleanup independent of the other changes.
Reviewed-by: Mike Kravetz
--
Mike Kravetz
> ---
> mm/hugetlb.c | 63 +---
tes the long stalls?
If so, can you try the simple change of taking the semaphore in read mode
in huge_pmd_share.
--
Mike Kravetz
to ask the question in case someone already
knows.
At one time, I thought it was safe to acquire the semaphore in read mode for
huge_pmd_share, but write mode for huge_pmd_unshare. See commit b43a99900559.
This was reverted along with another patch for other reasons.
If we change change from write to read mode, this may have significant impact
on the stalls.
--
Mike Kravetz
igger than you describe above. I have never looked at/for delays in
these environments around pmd sharing (page faults), but that does not mean
they do not exist. I will try to get the DB group to give me access to one
of their large environments for analysis.
We may want to consider making the timeout value and disable threshold user
configurable.
--
Mike Kravetz
Patch 3 in that series causes allocations to fail sooner in the case of
COMPACT_DEFERRED:
http://lkml.kernel.org/r/20190806014744.15446-4-mike.krav...@oracle.com
hugetlb allocations have the __GFP_RETRY_MAYFAIL flag set. They are willing
to retry and wait and callers are aware of this. Even though my limited
testing did not show regressions caused by this patch, I would prefer if the
quick exit did not apply to __GFP_RETRY_MAYFAIL requests.
--
Mike Kravetz
On 9/3/19 10:57 AM, Mike Kravetz wrote:
> On 8/29/19 12:18 AM, Michal Hocko wrote:
>> [Cc cgroups maintainers]
>>
>> On Wed 28-08-19 10:58:00, Mina Almasry wrote:
>>> On Wed, Aug 28, 2019 at 4:23 AM Michal Hocko wrote:
>>>>
>>>> On Mon 26-0
gt; + i += pages_per_huge_page(h);
> + spin_unlock(ptl);
> + continue;
> + }
> +
> same_page:
> if (pages) {
> pages[i] = mem_map_offset(page, pfn_offset);
>
With a comment added to the code,
Reviewed-by: Mike Kravetz
--
Mike Kravetz
_*
changes separately. If not a standalone patch, at least the first patch of
the series. This new code will be exercised even if cgroup reservation
accounting not enabled, so it is very important than no subtle regressions
be introduced.
--
Mike Kravetz
On 8/15/19 4:08 PM, Mina Almasry wrote:
> On Tue, Aug 13, 2019 at 4:54 PM Mike Kravetz wrote:
>>> mm/hugetlb.c | 208 +--
>>> 1 file changed, 170 insertions(+), 38 deletions(-)
>>>
>>> diff --git
On 8/15/19 4:04 PM, Mina Almasry wrote:
> On Wed, Aug 14, 2019 at 9:46 AM Mike Kravetz wrote:
>>
>> On 8/13/19 4:54 PM, Mike Kravetz wrote:
>>> On 8/8/19 4:13 PM, Mina Almasry wrote:
>>>> For shared mappings, the pointer to the hugetlb_cgroup to uncharge li
On 8/13/19 4:54 PM, Mike Kravetz wrote:
> On 8/8/19 4:13 PM, Mina Almasry wrote:
>> For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives
>> in the resv_map entries, in file_region->reservation_counter.
>>
>> When a file_region entry is added to t
> + if (!dry_run) {
> + list_del(&rg->link);
> + kfree(rg);
Is it possible that the region struct we are deleting pointed to
a reservation_counter? Perhaps even for another cgroup?
Just concerned with the way regions are coalesced that we may be
deleting counters.
--
Mike Kravetz
On 8/10/19 3:01 PM, Mina Almasry wrote:
> On Sat, Aug 10, 2019 at 11:58 AM Mike Kravetz wrote:
>>
>> On 8/9/19 12:42 PM, Mina Almasry wrote:
>>> On Fri, Aug 9, 2019 at 10:54 AM Mike Kravetz
>>> wrote:
>>>> On 8/8/19 4:13 PM, Mina Almasry wrote:
>&
On 8/9/19 12:42 PM, Mina Almasry wrote:
> On Fri, Aug 9, 2019 at 10:54 AM Mike Kravetz wrote:
>> On 8/8/19 4:13 PM, Mina Almasry wrote:
>>> Problem:
>>> Currently tasks attempting to allocate more hugetlb memory than is
>>> available get
>>> a f
On 8/9/19 1:57 PM, Mina Almasry wrote:
> On Fri, Aug 9, 2019 at 1:39 PM Mike Kravetz wrote:
>>
>> On 8/9/19 11:05 AM, Mina Almasry wrote:
>>> On Fri, Aug 9, 2019 at 4:27 AM Michal Koutný wrote:
>>>>> Alternatives considered:
>>>>> [...]
>
hose 7
> pages, and will SIGBUS you when you try to access the remaining 2
> pages. So the problem persists. Folks would still like to know they
> are crossing the limits on mmap time.
If you got the failure at mmap time in the MAP_POPULATE case would this
be useful?
Just thinking that would be a relatively simple change.
--
Mike Kravetz
ntents of the page cache to the resv_map to determine how
many reservations were actually consumed. I did not look close enough to
determine the code drops reservation usage counts as pages are added to shared
mappings.
--
Mike Kravetz
On 8/8/19 12:47 AM, Michal Hocko wrote:
> On Thu 08-08-19 09:46:07, Michal Hocko wrote:
>> On Wed 07-08-19 17:05:33, Mike Kravetz wrote:
>>> Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS
>>> in the kernel-v5.2.3 testing. This is caused by a ra
ptep)))
goto backout;
--
Mike Kravetz
page table lock and check for huge_pte_none before
returning an error. This is the same check that must be made further
in the code even if page allocation is successful.
Reported-by: Li Wang
Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
Signed-off-by: Mike Kravetz
Tested-b
as been scanned" with nr_scanned == 0 didn't really work.
Signed-off-by: Vlastimil Babka
Acked-by: Mike Kravetz
Signed-off-by: Mike Kravetz
---
Commit message reformatted to avoid line wrap.
mm/vmscan.c | 43 ++-
1 file changed, 14 insertions(
From: Vlastimil Babka
Mike Kravetz reports that "hugetlb allocations could stall for minutes
or hours when should_compact_retry() would return true more often then
it should. Specifically, this was in the case where compact_result was
COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no pro
will still succeed if there is memory available, but it will not try
as hard to free up memory.
Signed-off-by: Mike Kravetz
---
v2 - Removed __GFP_NORETRY from bit mask allocations and added more
comments. OK to pass NULL to NODEMASK_FREE.
mm/hugetlb.c | 89
as we could.
Cc: Mike Kravetz
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Vlastimil Babka
Cc: Johannes Weiner
Signed-off-by: Hillf Danton
Tested-by: Mike Kravetz
Acked-by: Mel Gorman
Acked-by: Vlastimil Babka
Signed-off-by: Mike Kravetz
---
v2 - Updated commit message and added SOB.
mm/vmscan.c
n (1):
mm, reclaim: make should_continue_reclaim perform dryrun detection
Mike Kravetz (1):
hugetlbfs: don't retry when pool page allocations start to fail
Vlastimil Babka (2):
mm, reclaim: cleanup should_continue_reclaim()
mm, compaction: raise compaction priority after it withdrawns
On 8/5/19 2:28 AM, Vlastimil Babka wrote:
> On 8/3/19 12:39 AM, Mike Kravetz wrote:
>> When allocating hugetlbfs pool pages via /proc/sys/vm/nr_hugepages,
>> the pages will be interleaved between all nodes of the system. If
>> nodes are not equal, it is quite possible fo
On 8/5/19 3:57 AM, Vlastimil Babka wrote:
> On 8/5/19 10:42 AM, Vlastimil Babka wrote:
>> On 8/3/19 12:39 AM, Mike Kravetz wrote:
>>> From: Hillf Danton
>>>
>>> Address the issue of should_continue_reclaim continuing true too often
>>> for __GFP_
On 8/5/19 1:42 AM, Vlastimil Babka wrote:
> On 8/3/19 12:39 AM, Mike Kravetz wrote:
>> From: Hillf Danton
>>
>> Address the issue of should_continue_reclaim continuing true too often
>> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
>> This
] http://lkml.kernel.org/r/d38a095e-dc39-7e82-bb76-2c9247929...@oracle.com
[2] http://lkml.kernel.org/r/20190724175014.9935-1-mike.krav...@oracle.com
Hillf Danton (1):
mm, reclaim: make should_continue_reclaim perform dryrun detection
Mike Kravetz (1):
hugetlbfs: don't retry when pool page
are not
enough inactive lru pages left to satisfy the costly allocation.
We can give up reclaiming pages too if we see dryrun occur, with the
certainty of plenty of inactive pages. IOW with dryrun detected, we are
sure we have reclaimed as many pages as we could.
Cc: Mike Kravetz
Cc: Mel Gorman
will still succeed if there is memory available, but it will not try
as hard to free up memory.
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 86 ++--
1 file changed, 76 insertions(+), 10 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index
From: Vlastimil Babka
Mike Kravetz reports that "hugetlb allocations could stall for minutes or hours
when should_compact_retry() would return true more often then it should.
Specifically, this was in the case where compact_result was COMPACT_DEFERRED
and COMPACT_PARTIAL_SKIPPED and no pro
On 8/2/19 5:05 AM, Vlastimil Babka wrote:
>
> On 8/1/19 10:33 PM, Mike Kravetz wrote:
>> On 8/1/19 6:01 AM, Vlastimil Babka wrote:
>>> Could you try testing the patch below instead? It should hopefully
>>> eliminate the stalls. If it makes hugepage allocation give u
HP
requests. Any suggestions on how to test that?
--
Mike Kravetz
> 8<
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 9569e7c786d3..b8bfe8d5d2e9 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -129,11 +129,7 @
On 7/31/19 6:23 AM, Vlastimil Babka wrote:
> On 7/25/19 7:15 PM, Mike Kravetz wrote:
>> On 7/25/19 1:13 AM, Mel Gorman wrote:
>>> On Wed, Jul 24, 2019 at 10:50:14AM -0700, Mike Kravetz wrote:
>>>
>>> set_max_huge_pages can fail the NODEMASK_ALLOC() alloc which
pages and none of those are reclaimed.
Can we not get nr_scanned == 0 on an arbitrary chunk of the LRU?
I must be missing something, because I do not see how nr_scanned == 0
guarantees a full scan.
--
Mike Kravetz
501 - 600 of 1147 matches
Mail list logo