t; Signed-off-by: Wei Yang
I agree with Baoquan He in that this is more of a style change. Certainly
there is the potential to avoid an extra check and that is always good.
The real value in this patch (IMO) is removal of the stale comment.
Reviewed-by: Mike Kravetz
--
Mike Kravetz
On 8/7/20 2:12 AM, Wei Yang wrote:
> set_hugetlb_cgroup_[rsvd] just manipulate page local data, which is not
> necessary to be protected by hugetlb_lock.
>
> Let's take this out.
>
> Signed-off-by: Wei Yang
Thanks!
Reviewed-by: Mike Kravetz
--
Mike Kravetz
ol resize which itself could cause surplus
to exceed overcommit.
IMO both approaches are valid.
- Advantage of temporary page is that it can not cause surplus to exceed
overcommit. Disadvantage is as mentioned in the comment 'abuse of temporary
page'.
- Advantage of this patch is that it uses existing counters. Disadvantage
is that it can momentarily cause surplus to exceed overcommit.
Unless someone has a strong opinion, I prefer the changes in this patch.
--
Mike Kravetz
Cc: Michal
On 8/10/20 7:11 PM, Baoquan He wrote:
> Hi Mike,
>
> On 07/23/20 at 11:21am, Mike Kravetz wrote:
>> On 7/23/20 2:11 AM, Baoquan He wrote:
> ...
>>>> But is kernel expected to warn for all such situations where the user
>>>> requested r
tlb pages.
https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg02407.html
More care/coordination would be needed to support double mapping with
this new option. However, this series provides a boot option to disable
freeing of unneeded page structs.
--
Mike Kravetz
On 9/15/20 5:59 AM, Muchun Song wrote:
> Move bootmem info registration common API to individual bootmem_info.c
> for later patch use.
>
> Signed-off-by: Muchun Song
This is just code movement.
Acked-by: Mike Kravetz
--
Mike Kravetz
> ---
> arch/x86/mm/init_64.c
On 9/15/20 5:59 AM, Muchun Song wrote:
> In the later patch, we will use {get,put}_page_bootmem() to initialize
> the page for vmemmap or free vmemmap page to buddy. So move them out of
> CONFIG_MEMORY_HOTPLUG_SPARSE.
>
> Signed-off-by: Muchun Song
More code movement.
Acked-b
that support it, say Y here.
> +
> + If unsure, say N.
> +
I could be wrong, but I believe the convention is introduce the config
option at the same time code which depends on the option is introduced.
Therefore, it might be better to combine with the next patch.
Also, it looks like most of your development was done on x86. Should
this option be limited to x86 only for now?
--
Mike Kravetz
ree_kbytes is not updated to 11334 because user defined
value 9 is preferred
# cat /proc/sys/vm/min_free_kbytes
90112
--
Mike Kravetz
On 9/30/20 1:47 PM, Vijay Balakrishna wrote:
> On 9/30/2020 11:20 AM, Mike Kravetz wrote:
>> On 9/29/20 9:49 AM, Vijay Balakrishna wrote:
>>
>> Sorry for jumping in so late. Should we use this as an opportunity to
>> also fix up the messages logged when (re)calcu
at, but hopefully avoids some confusion.
--
Mike Kravetz
> So we introduce a new nr_free_vmemmap_pages field in the hstate to
> indicate how many vmemmap pages associated with a hugetlb page that we
> can free to buddy system.
>
> Signed-off-by: Muchun Song
> ---
> inc
migration statistics. While here, this updates current trace event
> 'mm_migrate_pages' to accommodate now available HugeTLB based statistics.
>
> Cc: Daniel Jordan
> Cc: Zi Yan
> Cc: John Hubbard
> Cc: Mike Kravetz
> Cc: Oscar Salvador
> Cc: Andrew Morton
> Cc: linux
ks for having a look. I started poking around myself but,
> being new to cgroup code, I even failed to understand why that code gets
> triggered though the hugetlb controller isn't even enabled.
>
> I assume you at least have to make sure that there is
> a page populated (MMAP_POPULATE, or rea
On 10/14/20 11:31 AM, Mike Kravetz wrote:
> On 10/14/20 11:18 AM, David Hildenbrand wrote:
>
> FWIW - I ran libhugetlbfs tests which do a bunch of hole punching
> with (and without) hugetlb controller enabled and did not see this issue.
>
I took a closer look aft
On 10/15/20 4:05 PM, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Tue, Oct 13, 2020 at 04:10:59PM -0700, Mike Kravetz wrote:
>> Due to pmd sharing, the huge PTE pointer returned by huge_pte_alloc
>> may not be valid. This can happen if a call to huge_pmd_unshare for
>> the same p
On 9/29/20 2:58 PM, Mike Kravetz wrote:
> On 9/15/20 5:59 AM, Muchun Song wrote:
>> Hi all,
>>
>> This patch series will free some vmemmap pages(struct page structures)
>> associated with each hugetlbpage when preallocated to save memory.
> ...
>> T
> But my belief (best confirmed by you running your tests with a
> suitably placed BUG_ON or WARN_ON) is that you'll never find a
> PageAnon in a vma_shareable() area, so will never need try_to_unmap()
> to unshare a pagetable in the PageAnon case, so won't need i_mmap_rwsem
> for
On 10/8/20 10:50 PM, Hugh Dickins wrote:
> On Thu, 8 Oct 2020, Mike Kravetz wrote:
>> On 10/7/20 8:21 PM, Hugh Dickins wrote:
>>>
>>> Mike, j'accuse... your 5.7 commit c0d0381ade79 ("hugetlbfs:
>>> use i_mmap_rwsem for more pmd sharing synchronization"
On 10/9/20 3:23 PM, Hugh Dickins wrote:
> On Fri, 9 Oct 2020, Mike Kravetz wrote:
>> On 10/8/20 10:50 PM, Hugh Dickins wrote:
>>>
>>> It's a problem I've faced before in tmpfs, keeping a hold on the
>>> mapping while page lock is dropped. Quite awkward: ig
On 9/15/20 9:32 PM, Christoph Hellwig wrote:
> On Wed, Sep 02, 2020 at 08:02:04PM -0700, Mike Kravetz wrote:
>> --- a/arch/arm/mm/dma-mapping.c
>> +++ b/arch/arm/mm/dma-mapping.c
>> @@ -383,25 +383,34 @@ postcore_initcall(atomic_pool_init);
>> struct dma_contig_early_re
On 9/16/20 2:14 AM, Song Bao Hua (Barry Song) wrote:
>>> -Original Message-
>>> From: Mike Kravetz [mailto:mike.krav...@oracle.com]
>>> Sent: Wednesday, September 16, 2020 8:57 AM
>>> To: linux...@kvack.org; linux-kernel@vger.kernel.org;
>>>
n Xing and improved performance by 20
something percent. That seems in line with this report/improvement.
Perhaps the tooling is not always accurate in determining the commit which
causes the performance changes?
Perhaps I am misreading information in the reports?
--
Mike Kravetz
lb ctl_table
entries are defined and initialized. This is not something you introduced.
The unnecessary assignments are in the existing code. However, there is no
need to carry them forward.
--
Mike Kravetz
sted' it by allocating
an 'extra' page and freeing it via this method in alloc_surplus_huge_page.
>From 864c5f8ef4900c95ca3f6f2363a85f3cb25e793e Mon Sep 17 00:00:00 2001
From: Mike Kravetz
Date: Tue, 11 Aug 2020 12:45:41 -0700
Subject: [PATCH] hugetlb: optimize race error return in
alloc_s
of
this, let's just leave things as they are and not add the message.
It is pretty clear that a user needs to read the value after writing to
determine if all pages were allocated. The log message would add little
benefit to the end user.
--
Mike Kravetz
On 8/11/20 4:19 PM, Wei Yang wrote:
> On Tue, Aug 11, 2020 at 02:43:28PM -0700, Mike Kravetz wrote:
>> Subject: [PATCH] hugetlb: optimize race error return in
>> alloc_surplus_huge_page
>>
>> The routine alloc_surplus_huge_page() could race with with a pool
>&
On 9/3/20 6:58 PM, Song Bao Hua (Barry Song) wrote:
>
>> -Original Message-
>> From: Mike Kravetz [mailto:mike.krav...@oracle.com]
>> Sent: Thursday, September 3, 2020 3:02 PM
>> To: linux...@kvack.org; linux-kernel@vger.kernel.org;
>> linux-arm-ke
break;
Previously, when encountering a PageHWPoison(page) the loop would continue
and check the next page in the list. It now breaks the loop and returns
NULL. Is not this a change in behavior? Perhaps you want to change that
break to a continue. Or, restructure in some other wa
t; Signed-off-by: Wei Yang
Thank you!
Reviewed-by: Mike Kravetz
--
Mike Kravetz
ppropriate
locking.
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization")
Cc:
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 4
1 file changed, 4 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 590111ea6975..0f6716422a53 100644
--- a/m
On 8/3/20 3:52 PM, Matthew Wilcox wrote:
> On Mon, Aug 03, 2020 at 03:43:35PM -0700, Mike Kravetz wrote:
>> Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
>> synchronization") requires callers of huge_pte_alloc to hold i_mmap_rwsem
>> in at le
On 8/3/20 4:00 PM, Mike Kravetz wrote:
> On 8/3/20 3:52 PM, Matthew Wilcox wrote:
>> On Mon, Aug 03, 2020 at 03:43:35PM -0700, Mike Kravetz wrote:
>>> Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
>>> synchronization") requi
asn't too worried because of
the limited hugetlb use case. However, this series is adding another user
of per-node CMA areas.
With more users, should try to sync up number of CMA areas and number of
nodes? Or, perhaps I am worrying about nothing?
--
Mike Kravetz
On 8/21/20 1:39 AM, Xing Zhengjun wrote:
>
>
> On 6/26/2020 5:33 AM, Mike Kravetz wrote:
>> On 6/22/20 3:01 PM, Mike Kravetz wrote:
>>> On 6/21/20 5:55 PM, kernel test robot wrote:
>>>> Greeting,
>>>>
>>>> FYI, we noticed a -33.4%
On 8/21/20 1:47 PM, Song Bao Hua (Barry Song) wrote:
>
>
>> -Original Message-
>> From: Song Bao Hua (Barry Song)
>> Sent: Saturday, August 22, 2020 7:27 AM
>> To: 'Mike Kravetz' ; h...@lst.de;
>> m.szyprow...@samsung.com; robin.mur...@arm.com;
On 8/21/20 2:02 PM, Mike Kravetz wrote:
> Would you be willing to test this series on top of 34ae204f1851? I will need
> to rebase the series to take the changes made by 34ae204f1851 into account.
Actually, the series in this thread will apply/run cleanly on top of
34ae204f1851. N
have been included with commit 34ae204f1851
("hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem").
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 15 +++
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 81
before normal memory allocators, so use the memblock
allocator.
Acked-by: Roman Gushchin
Signed-off-by: Mike Kravetz
---
rfc->v1
- Made minor changes suggested by Song Bao Hua (Barry Song)
- Removed check for late calls to cma_init_reserved_mem that was part
of RFC.
- Added ACK f
; 2 files changed, 38 insertions(+)
Patch looks fine with updated commit message.
Acked-by: Mike Kravetz
--
Mike Kravetz
currently
assume a zero return code indicates success. Change the callers to look
for true to indicate success. No functional change, only code cleanup.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c| 4 ++--
include/linux/hugetlb.h | 2 +-
mm/hugetlb.c| 37
routine hugetlbfs_set_page_dirty with
__set_page_dirty_no_writeback as it addresses both of these issues.
Suggested-by: Matthew Wilcox
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c | 13 +
1 file changed, 1 insertion(+), 12 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs
r_page_bootmem_info is enabled if HUGETLB_PAGE_FREE_VMEMMAP
> is defined.
>
> Signed-off-by: Muchun Song
> ---
> arch/x86/mm/init_64.c | 2 +-
> fs/Kconfig| 18 ++
> 2 files changed, 19 insertions(+), 1 deletion(-)
Thanks for updating,
Acked-by: Mike Kravetz
--
Mike Kravetz
t with reuse addr and all subsequent pages in the range are mapped to
reuse addr. I know it is not very generic or flexible. But, it might be
easier to understand than the adjustments (+- PAGE_SIZE) currently being
made in the code.
Just a thought.
--
Mike Kravetz
= HUGEPAGE_REPORTING_CAPACITY;
> +
> + /* update budget to reflect call to report function */
> + budget--;
> +
> + /* reacquire zone lock and resume processing */
> + spin_lock_irq(_lock);
> +
> + /* flush reported pages from the sg list */
> + hugepage_reporting_drain(prdev, h, sgl,
> + HUGEPAGE_REPORTING_CAPACITY, !ret);
> +
> + /*
> + * Reset next to first entry, the old next isn't valid
> + * since we dropped the lock to report the pages
> + */
> + next = list_first_entry(list, struct page, lru);
> +
> + /* exit on error */
> + if (ret)
> + break;
> + }
> +
> + /* Rotate any leftover pages to the head of the freelist */
> + if (>lru != list && !list_is_first(>lru, list))
> + list_rotate_to_front(>lru, list);
> +
> + spin_unlock_irq(_lock);
> +
> + return ret;
> +}
--
Mike Kravetz
an be allocated from the buddy is
(MAX_ORDER - 1). So, the check should be '>='.
--
Mike Kravetz
avior. Correct?
>> On some systems, hugetlb pages are a precious resource and the sysadmin
>> carefully configures the number needed by applications. Removing a hugetlb
>> page (even for a very short period of time) could cause serious application
>> failure.
>
> That' true, especially for 1G pages. Any suggestions?
> Let the hugepage allocator be aware of this situation and retry ?
I would hate to add that complexity to the allocator.
This question is likely based on my lack of understanding of virtio-balloon
usage and this reporting mechanism. But, why do the hugetlb pages have to
be 'temporarily' allocated for reporting purposes?
--
Mike Kravetz
+)
Thanks!
Reviewed-by: Mike Kravetz
--
Mike Kravetz
);
> + putback_active_hugepage(page);
I'm curious why you used putback_active_hugepage() here instead of simply
calling set_page_huge_active() before the put_page()?
When the page was allocated, it was placed on the active list (alloc_huge_page).
Therefore, the hug
in_unlock(_lock)
> + * spin_lock(_lock)
> + * enqueue_huge_page(page)
> + * // It is wrong, the page is already freed
> + * spin_unlock(_lock)
> + *
> + * The race window is bet
queue. Is it acceptable
to keep retrying in that case? In addition, the 'Free some vmemmap' series
may slow the free_huge_page path even more.
In these worst case scenarios, I am not sure we want to just spin retrying.
--
Mike Kravetz
>
> Signed-off-by: Muchun Song
> ---
>
the buddy allocator.
>
> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
> hugepage")
> Signed-off-by: Muchun Song
> ---
> mm/hugetlb.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Thanks!
Reviewed-by: Mike Kravetz
--
Mike Kravetz
ged, 1 deletion(-)
Thanks!
Reviewed-by: Mike Kravetz
--
Mike Kravetz
On 1/4/21 6:44 PM, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 6:40 AM Mike Kravetz wrote:
>>
>> On 1/3/21 10:58 PM, Muchun Song wrote:
>>> Because we only can isolate a active page via isolate_huge_page()
>>> and hugetlbfs_fallocate() forget to mark it
On 1/4/21 6:55 PM, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 8:02 AM Mike Kravetz wrote:
>>
>> On 1/3/21 10:58 PM, Muchun Song wrote:
>>> There is a race condition between __free_huge_page()
>>> and dissolve_free_huge_page().
>>>
>>> CPU0:
On 1/4/21 7:46 PM, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 11:14 AM Muchun Song wrote:
>>
>> On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz wrote:
>>>
>>> On 1/3/21 10:58 PM, Muchun Song wrote:
>>>> When dissolve_free_huge_page() races with __free
.com
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization")
Cc:
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 22 +-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d029d938d26d.
On 12/14/20 5:06 PM, Mike Kravetz wrote:
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d029d938d26d..8713f8ef0f4c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4106,10 +4106,30 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm,
> struct v
an be saved for each 1GB HugeTLB page.
When a HugeTLB page is allocated or freed, the vmemmap array
representing the range associated with the page will need to be
remapped. When a page is allocated, vmemmap pages are freed
after remapping. When a page
On 12/15/20 5:03 PM, Mike Kravetz wrote:
> On 12/13/20 7:45 AM, Muchun Song wrote:
>> diff --git a/fs/Kconfig b/fs/Kconfig
>> index 976e8b9033c4..4c3a9c614983 100644
>> --- a/fs/Kconfig
>> +++ b/fs/Kconfig
>> @@ -245,6 +245,21 @@ config HUGETLBFS
>> config
the name
implies this routine will reuse vmemmap pages. Perhaps, it makes more sense
to rename as 'vmemmap_remap_free'? It will first remap, then free vmemmap.
But, then I looked at the code above and perhaps you are using the word
'_reuse' because the page before the range will be reused? The vmemmap
p
On 12/16/20 2:25 PM, Oscar Salvador wrote:
> On Wed, Dec 16, 2020 at 02:08:30PM -0800, Mike Kravetz wrote:
>>> + * vmemmap_rmap_walk - walk vmemmap page table
>>> +
>>> +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
>>> +
e GFP_ATOMIC to allocate the vmemmap pages.
>
> Signed-off-by: Muchun Song
It is unfortunate we need to add this complexitty, but I can not think
of another way. One small comment (no required change) below.
Reviewed-by: Mike Kravetz
> ---
> m
* vmemmap pages successfully, then we can free
> + * a HugeTLB page.
> + */
> + goto retry;
> + }
> + list_add_tail(>lru, list);
> + }
> +}
> +
--
Mike Kravetz
On 11/10/20 7:41 PM, Muchun Song wrote:
> On Wed, Nov 11, 2020 at 8:47 AM Mike Kravetz wrote:
>>
>> On 11/8/20 6:10 AM, Muchun Song wrote:
>> I am reading the code incorrectly it does not appear page->lru (of the huge
>> page) is being used for
; __must_hold(_lock)
> {
> struct list_head surplus_list;
Thank you for noticing the type difference.
However, if the parameter delta is changed to long then we should also change
the local variables in gather_surplus_pages that are used with delta.
Specifically, the local va
def_bool HUGETLB_PAGE
> + depends on X86
> + depends on SPARSEMEM_VMEMMAP
> + depends on HAVE_BOOTMEM_INFO_NODE
> + help
> + When using SPARSEMEM_VMEMMAP, the system can save up some memory
Should that read,
When using HUGETLB_PAGE_FREE_VMEMMAP, ...
as the help message is for this config option.
--
Mike Kravetz
0;
>
> should not be needed.
> Actually, we do not initialize other values like resv_huge_pages
> or surplus_huge_pages.
>
> If that is the case, the "else" could go.
>
> Mike?
Correct. Those assignments have been in the code for a very long time.
> The changes itself look good to me.
> I think that putting all the vemmap stuff into hugetlb-vmemmap.* was
> the right choice.
Agree!
--
Mike Kravetz
| 4 | ---+ | | |
> + * | 2M| | 5 | -+ | |
> + * | | | 6 | ---+ |
> + * | | | 7 | -+
> + * | | +---+
> + * | |
> + * | |
> + * +---+
--
Mike Kravetz
ile changed, 2 deletions(-)
Thanks,
Reviewed-by: Mike Kravetz
--
Mike Kravetz
7 ---
> 1 file changed, 4 insertions(+), 3 deletions(-)
Thank you,
Reviewed-by: Mike Kravetz
--
Mike Kravetz
one before removing
the pages of struct pages. This keeps everything 'consistent' as things
are remapped.
If you want to use one of the 'pages of struct pages' for the new pte
page, then there will be a period of time when things are inconsistent.
Before setting up the mapping, some code could potentially access that
pages of struct pages.
I tend to agree that allocating allocating a new page is the safest thing
to do here. Or, perhaps someone can think of a way make this safe.
--
Mike Kravetz
changed, 85 insertions(+)
Thanks for the cleanup.
Oscar made some other comments. I only have one additional minor comment
below.
With those minor cleanups,
Acked-by: Mike Kravetz
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
...
> +int vmemmap_pgtable_prealloc(struct hstate *h, struc
On 11/12/20 4:35 PM, Mike Kravetz wrote:
> On 11/10/20 7:41 PM, Muchun Song wrote:
>> On Wed, Nov 11, 2020 at 8:47 AM Mike Kravetz wrote:
>>>
>>> On 11/8/20 6:10 AM, Muchun Song wrote:
>>> I am reading the code incorrectly it does not appear page->l
g seen with the qemu
use case.
I'm still doing more testing and code inspection to look for other issues.
>From 861bcd7d0443f18a5fed3c3ddc5f1c71e78c4ef4 Mon Sep 17 00:00:00 2001
From: Mike Kravetz
Date: Tue, 20 Oct 2020 20:21:42 -0700
Subject: [PATCH] hugetlb_cgroup: fix reservation accounting
S
> On 21.10.20 15:11, David Hildenbrand wrote:
>> On 21.10.20 14:57, Michal Privoznik wrote:
>>> On 10/21/20 5:35 AM, Mike Kravetz wrote:
>>>> On 10/20/20 6:38 AM, David Hildenbrand wrote:
>>>>
>>>> It would be good if Mina (at least) would lo
quot;)
Cc:
Reported-by: Michal Privoznik
Co-developed-by: David Hildenbrand
Signed-off-by: David Hildenbrand
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 20 +++-
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 67fc6383995b.
he bitmap in cma_clear_bitmap and could
block. However, I do not see why cma->lock has to be a mutex. I may be
missing something, but I do not see any code protected by the mutex doing
anything that could sleep?
Could we simply change that mutex to a spinlock?
--
Mike Kravetz
On 10/21/20 7:33 PM, Roman Gushchin wrote:
> On Wed, Oct 21, 2020 at 05:15:53PM -0700, Mike Kravetz wrote:
>> On 10/16/20 3:52 PM, Roman Gushchin wrote:
>>> This small patchset makes cma_release() non-blocking and simplifies
>>> the code in hugetlbfs, where previously
it could be used by the hugetlb
code to make it simpler.
--
Mike Kravetz
hinode_lock_write() helper.
- Split out addition of hinode_rwsem and helper routines to a separate
patch.
[1]
https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/
Mike Kravetz (4):
hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync
hugetlbfs: add
as necessary.
File truncation (remove_inode_hugepages) needs to handle page mapping
changes that could have happened before locking the page. This could
happen if page was added to page cache and later backed out in fault
processing.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c | 34
s per hugetlb calculation")
commit 87bf91d39bb5 ("hugetlbfs: Use i_mmap_rwsem to address page
fault/truncate race")
commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization")
Signed-off-by
is possible in an attempt to
minimize performance impacts. In addition, routines which can be used with
lockdep to help with proper locking are also added.
Use of the new semaphore and supporting routines will be provided in a
subsequent patch.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c
the semaphore if pmd sharing is possible. Also,
add lockdep_assert calls to huge_pmd_share/unshare to help catch callers
not using the proper locking.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c| 11 +-
include/linux/hugetlb.h | 8 ++--
mm/hugetlb.c| 83
es from day 1 of their existence? I would prefer
not to touch them in case some is depending on current format.
--
Mike Kravetz
> ---
> drivers/base/node.c | 4 ++--
> mm/hugetlb.c| 6 +++---
> 2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/
On 10/28/20 11:13 PM, Muchun Song wrote:
> On Thu, Oct 29, 2020 at 7:42 AM Mike Kravetz wrote:
>>
>> On 10/26/20 7:51 AM, Muchun Song wrote:
>>> +
>>> +static inline spinlock_t *vmemmap_pmd_lockptr(pmd_t *pmd)
>>> +{
>>> + static DEFINE_S
is would eliminate a bunch
of the complex code doing page table manipulation. It does not address
the issue of struct page pages going away which is being discussed here,
but it could be a way to simply the first version of this code. If this
is going to be an 'opt in' feature as previously suggested, then eliminating
the PMD/huge page vmemmap mapping may be acceptable. My guess is that
sysadmins would only 'opt in' if they expect most of system memory to be used
by hugetlb pages. We certainly have database and virtualization use cases
where this is true.
--
Mike Kravetz
ool. I remember the
use case pointed out in commit 099730d67417. It says, "I have a hugetlbfs
user which is never explicitly allocating huge pages with 'nr_hugepages'.
They only set 'nr_overcommit_hugepages' and then let the pages be allocated
from the buddy allocator at fault time." In this case, I suspect they were
using 'page fault' allocation for initialization much like someone using
/proc/sys/vm/nr_hugepages. So, the overhead may not be as noticeable.
--
Mike Kravetz
is might be a
> trade-off between saving up memory and increasing the cost
> of certain operations on allocation/free path.
> That is why I mentioned it there.
Yes, this is somewhat a trade-off.
As a config option, this is something that would likely be decided by
distros. I almost hate to suggest this, but is it something that an
end user would want to decide? Is this something that perhaps should
be a boot/kernel command line option?
--
Mike Kravetz
>
> Hi Mike, what's your opinion?
I would be happy to see this in a separate file. As Oscar mentions, the
hugetlb.c file/code is already somethat difficult to read and understand.
--
Mike Kravetz
On 11/10/20 11:50 AM, Matthew Wilcox wrote:
> On Tue, Nov 10, 2020 at 11:31:31AM -0800, Mike Kravetz wrote:
>> On 11/9/20 5:52 AM, Oscar Salvador wrote:
>>> On Sun, Nov 08, 2020 at 10:10:55PM +0800, Muchun Song wrote:
>
> I don't like config options. I like boot opt
r(page, HUGETLB_PAGE_DTOR);
> set_hugetlb_cgroup(page, NULL);
When I saw that comment in previous patch series, I assumed page->lru was
being used to store preallocated pages and pages to free. However, unless
I am reading the code incorrectly it does not appear page->lru (of the huge
page) is being used for this purpose. Is that correct?
If it is correct, would using page->lru of the huge page make this code
simpler? I am just missing the reason why you are using
page_huge_pte(page)->lru
--
Mike Kravetz
On 11/22/20 11:38 PM, Michal Hocko wrote:
> On Fri 20-11-20 09:45:12, Mike Kravetz wrote:
>> On 11/20/20 1:43 AM, David Hildenbrand wrote:
> [...]
>>>>> To keep things easy, maybe simply never allow to free these hugetlb pages
>>>>> again for now? If they
As previously mentioned, I feel qualified to review the hugetlb changes
and some other closely related changes. However, this patch set is
touching quite a few areas and I do not feel qualified to make authoritative
statements about them all. I too hope others will take a look.
--
Mike Kravetz
ugetlbfs: allow
> registration of ranges containing huge pages").
>
> Fix it.
>
> Cc: Mike Kravetz
> Cc: Jens Axboe
> Cc: Andrea Arcangeli
> Cc: Peter Xu
> Cc: Alexander Viro
> Cc: io-ur...@vger.kernel.org
> Cc: linux-fsde...@vger.kernel.org
> Cc: linu
ss_offline was noticed.
The hstate index is not reinitialized each time through the do-while loop.
Fix this as well.
Fixes: 1adc4d419aa2 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb
reservations")
Cc:
Reported-by: Adrian Moreno
Tested-by: Adrian Moreno
Signed-off-by: M
/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/
Reported-by: Qian Cai
Suggested-by: Hugh Dickins
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization")
Cc:
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c| 90 +++
On 11/2/20 4:28 PM, Mike Kravetz wrote:
> The RFC series reverted all patches where i_mmap_rwsem was used for
> pmd sharing synchronization, and then added code to use hinode_rwsem.
> This series ends up with the same code in the end, but is structured
> as follows:
>
> - Rev
ges separately. If not a standalone patch, at least the first patch of
the series. This new code will be exercised even if cgroup reservation
accounting not enabled, so it is very important than no subtle regressions
be introduced.
--
Mike Kravetz
i += pages_per_huge_page(h);
> + spin_unlock(ptl);
> + continue;
> + }
> +
> same_page:
> if (pages) {
> pages[i] = mem_map_offset(page, pfn_offset);
>
With a comment added to the code,
Reviewed-by: Mike Kravetz
--
Mike Kravetz
1001 - 1100 of 2070 matches
Mail list logo