ions.
- If we see this being used for more general purpose remapping, then we
should go with a more general purpose implementation.
Again, just my opinion.
--
Mike Kravetz
e/need for this macro in the following
patch. Looks like Oscar has already done that, and free_bootmem_page
will now use VM_BUG_ON_PAGE. So, this patch can be dropped.
--
Mike Kravetz
On 1/13/21 6:45 AM, Matthew Wilcox wrote:
> On Mon, Jan 11, 2021 at 01:01:50PM -0800, Mike Kravetz wrote:
>> +if (hpage_spool(page)) {
>
> Could this be named hpage_subpool(), please?
>
Of course!
--
Mike Kravetz
On 1/13/21 5:54 AM, Oscar Salvador wrote:
> On Mon, Jan 11, 2021 at 01:01:50PM -0800, Mike Kravetz wrote:
>> As hugetlbfs evolved, state information about hugetlb pages was added.
>> One 'convenient' way of doing this was to use available fields in tail
>> pages. Ove
of that code. Free huge pages have a ref
count of 0, and this code is checking for ref count of 1.
That is a long way of saying that I still agree with this patch. The
only modification/suggestion would be enhancing the commit message as
suggested by Michal.
--
Mike Kravetz
f fail.
However, we are very unlikely to ever exercise this code. Adding an
optimization that is unlikely to be exercised is certainly questionable.
Memory offline is the only code that could benefit from this optimization.
As someone with more memory offline user experience, what is your opinion
Mic
nel.org
> ---
> mm/hugetlb.c | 26 ++
> 1 file changed, 26 insertions(+)
Thanks,
It is unfortunate that we have to add more huge page state information
to fix this issue. However, I believe we have explored all other options.
Reviewed-by: Mike Kravetz
--
Mike Kravetz
associated with those regions
are shared by all the mappings. Suitably aligned means 'on a 1GB boundary'
and 1GB in size.
When pmds are shared, your mappings will never see a 'minor fault'. This
is because the PMD (page table entries) is shared.
--
Mike Kravetz
; ---
> fs/hugetlbfs/inode.c| 3 ++-
> include/linux/hugetlb.h | 2 ++
> mm/hugetlb.c| 2 +-
> 3 files changed, 5 insertions(+), 2 deletions(-)
Thanks.
Although page_huge_active is declared in page-flags.h, I much prefer the
declaration of set_page_huge_active to be in
On 1/11/21 3:08 PM, Peter Xu wrote:
> On Mon, Jan 11, 2021 at 02:42:48PM -0800, Mike Kravetz wrote:
>> On 1/7/21 11:04 AM, Axel Rasmussen wrote:
>>> Overview
>>>
>>>
>>> This series adds a new userfaultfd registration mode,
>>
patches.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c| 11 +++---
include/linux/hugetlb.h | 2 ++
mm/hugetlb.c| 80 +++--
3 files changed, 67 insertions(+), 26 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index
Use new hugetlb specific flag HPageTempSurplus to replace the
PageHugeTemporary() interfaces.
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 38 +-
1 file changed, 9 insertions(+), 29 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 34ce82f4823c
, it can not be
static. Therefore, a new set of hugetlb page flag macros is added for
non-static flag functions.
Signed-off-by: Mike Kravetz
---
include/linux/hugetlb.h| 17 +++
include/linux/page-flags.h | 6
mm/hugetlb.c | 60
feature is enabled.
[1] https://lore.kernel.org/r/20210106084739.63318-1-songmuc...@bytedance.com
[2] https://lore.kernel.org/r/20210110124017.86750-4-songmuc...@bytedance.com
Mike Kravetz (3):
hugetlb: use page.private for hugetlb specific page flags
hugetlb: convert page_huge_active
ger a BUG_ON which is in the
> page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the
> VM_BUG_ON_PAGE.
>
> Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()")
> Signed-off-by: Muchun Song
> Reviewed-by: Mike Kravetz
> ---
> mm
touch
> the page->mapping. So we need to check PageHuge().
> The check and llist_add() should be protected by
> hugetlb_lock. But we cannot do that. Right? If dissolve
> happens after it is linked to the list. We also should
> remove it from the list (hpage_freelist). It seems to make
> the thing more complex.
You are correct. This is also an issue/potential problem with this
race. It seems that adding the state information might be the least
complex way to address issue.
--
Mike Kravetz
On 1/7/21 12:40 AM, Michal Hocko wrote:
> On Wed 06-01-21 12:58:29, Mike Kravetz wrote:
>> On 1/6/21 8:56 AM, Michal Hocko wrote:
>>> On Wed 06-01-21 16:47:36, Muchun Song wrote:
>>>> There is a race condition between __free_huge_page()
>>>> and
to_nid(page);
> +
> + list_move(>lru, >hugepage_freelists[nid]);
> +}
The above routines move pages between the free and active lists without any
update to free page counts. How does that work? Will the number of entries
on the free list get out of sync with the free_huge_pages counters?
--
Mike Kravetz
.
>
> Fixes: a3437870160c ("hugetlb: new sysfs interface")
> Signed-off-by: Miaohe Lin
> Cc:
> ---
> mm/hugetlb.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
Thanks, this is a potential issue that should be fixed.
Reviewed-by: Mike Kravetz
This
-off-by: Miaohe Lin
> Cc:
> ---
> mm/hugetlb.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Thanks!
Reviewed-by: Mike Kravetz
--
Mike Kravetz
On 1/6/21 12:02 PM, Michal Hocko wrote:
> On Wed 06-01-21 11:30:25, Mike Kravetz wrote:
>> On 1/6/21 8:35 AM, Michal Hocko wrote:
>>> On Wed 06-01-21 16:47:35, Muchun Song wrote:
>>>> Because we only can isolate a active page via isolate_huge_page()
>>>>
d_gigantic_page(struct page *page,
>> unsigned int order) { }
>> #endif
>>
>> +/*
>> + * Because we reuse the mapping field of some tail page structs, we should
>> + * reset those mapping to initial value before @
tion could race with the page fault
and the page could be migrated before being added to the page table of
the faulting task. This was an issue when hugetlb_no_page set_page_huge_active
right after allocating and clearing the huge page. Commit cb6acd01e2e4
moved the set_page_huge_active after adding the page to the page table
to address this issue.
--
Mike Kravetz
On 1/4/21 7:46 PM, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 11:14 AM Muchun Song wrote:
>>
>> On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz wrote:
>>>
>>> On 1/3/21 10:58 PM, Muchun Song wrote:
>>>> When dissolve_free_huge_page() races with __free
On 1/4/21 6:55 PM, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 8:02 AM Mike Kravetz wrote:
>>
>> On 1/3/21 10:58 PM, Muchun Song wrote:
>>> There is a race condition between __free_huge_page()
>>> and dissolve_free_huge_page().
>>>
>>> CPU0:
On 1/4/21 6:44 PM, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 6:40 AM Mike Kravetz wrote:
>>
>> On 1/3/21 10:58 PM, Muchun Song wrote:
>>> Because we only can isolate a active page via isolate_huge_page()
>>> and hugetlbfs_fallocate() forget to mark it
ged, 1 deletion(-)
Thanks!
Reviewed-by: Mike Kravetz
--
Mike Kravetz
the buddy allocator.
>
> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
> hugepage")
> Signed-off-by: Muchun Song
> ---
> mm/hugetlb.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Thanks!
Reviewed-by: Mike Kravetz
--
Mike Kravetz
queue. Is it acceptable
to keep retrying in that case? In addition, the 'Free some vmemmap' series
may slow the free_huge_page path even more.
In these worst case scenarios, I am not sure we want to just spin retrying.
--
Mike Kravetz
>
> Signed-off-by: Muchun Song
> ---
>
in_unlock(_lock)
> + * spin_lock(_lock)
> + * enqueue_huge_page(page)
> + * // It is wrong, the page is already freed
> + * spin_unlock(_lock)
> + *
> + * The race window is bet
);
> + putback_active_hugepage(page);
I'm curious why you used putback_active_hugepage() here instead of simply
calling set_page_huge_active() before the put_page()?
When the page was allocated, it was placed on the active list (alloc_huge_page).
Therefore, the hug
+)
Thanks!
Reviewed-by: Mike Kravetz
--
Mike Kravetz
avior. Correct?
>> On some systems, hugetlb pages are a precious resource and the sysadmin
>> carefully configures the number needed by applications. Removing a hugetlb
>> page (even for a very short period of time) could cause serious application
>> failure.
>
> That' true, especially for 1G pages. Any suggestions?
> Let the hugepage allocator be aware of this situation and retry ?
I would hate to add that complexity to the allocator.
This question is likely based on my lack of understanding of virtio-balloon
usage and this reporting mechanism. But, why do the hugetlb pages have to
be 'temporarily' allocated for reporting purposes?
--
Mike Kravetz
an be allocated from the buddy is
(MAX_ORDER - 1). So, the check should be '>='.
--
Mike Kravetz
= HUGEPAGE_REPORTING_CAPACITY;
> +
> + /* update budget to reflect call to report function */
> + budget--;
> +
> + /* reacquire zone lock and resume processing */
> + spin_lock_irq(_lock);
> +
> + /* flush reported pages from the sg list */
> + hugepage_reporting_drain(prdev, h, sgl,
> + HUGEPAGE_REPORTING_CAPACITY, !ret);
> +
> + /*
> + * Reset next to first entry, the old next isn't valid
> + * since we dropped the lock to report the pages
> + */
> + next = list_first_entry(list, struct page, lru);
> +
> + /* exit on error */
> + if (ret)
> + break;
> + }
> +
> + /* Rotate any leftover pages to the head of the freelist */
> + if (>lru != list && !list_is_first(>lru, list))
> + list_rotate_to_front(>lru, list);
> +
> + spin_unlock_irq(_lock);
> +
> + return ret;
> +}
--
Mike Kravetz
t with reuse addr and all subsequent pages in the range are mapped to
reuse addr. I know it is not very generic or flexible. But, it might be
easier to understand than the adjustments (+- PAGE_SIZE) currently being
made in the code.
Just a thought.
--
Mike Kravetz
r_page_bootmem_info is enabled if HUGETLB_PAGE_FREE_VMEMMAP
> is defined.
>
> Signed-off-by: Muchun Song
> ---
> arch/x86/mm/init_64.c | 2 +-
> fs/Kconfig| 18 ++
> 2 files changed, 19 insertions(+), 1 deletion(-)
Thanks for updating,
Acked-by: Mike Kravetz
--
Mike Kravetz
currently
assume a zero return code indicates success. Change the callers to look
for true to indicate success. No functional change, only code cleanup.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c| 4 ++--
include/linux/hugetlb.h | 2 +-
mm/hugetlb.c| 37
routine hugetlbfs_set_page_dirty with
__set_page_dirty_no_writeback as it addresses both of these issues.
Suggested-by: Matthew Wilcox
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c | 13 +
1 file changed, 1 insertion(+), 12 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs
* vmemmap pages successfully, then we can free
> + * a HugeTLB page.
> + */
> + goto retry;
> + }
> + list_add_tail(>lru, list);
> + }
> +}
> +
--
Mike Kravetz
e GFP_ATOMIC to allocate the vmemmap pages.
>
> Signed-off-by: Muchun Song
It is unfortunate we need to add this complexitty, but I can not think
of another way. One small comment (no required change) below.
Reviewed-by: Mike Kravetz
> ---
> m
On 12/16/20 2:25 PM, Oscar Salvador wrote:
> On Wed, Dec 16, 2020 at 02:08:30PM -0800, Mike Kravetz wrote:
>>> + * vmemmap_rmap_walk - walk vmemmap page table
>>> +
>>> +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
>>> +
the name
implies this routine will reuse vmemmap pages. Perhaps, it makes more sense
to rename as 'vmemmap_remap_free'? It will first remap, then free vmemmap.
But, then I looked at the code above and perhaps you are using the word
'_reuse' because the page before the range will be reused? The vmemmap
p
On 12/15/20 5:03 PM, Mike Kravetz wrote:
> On 12/13/20 7:45 AM, Muchun Song wrote:
>> diff --git a/fs/Kconfig b/fs/Kconfig
>> index 976e8b9033c4..4c3a9c614983 100644
>> --- a/fs/Kconfig
>> +++ b/fs/Kconfig
>> @@ -245,6 +245,21 @@ config HUGETLBFS
>> config
an be saved for each 1GB HugeTLB page.
When a HugeTLB page is allocated or freed, the vmemmap array
representing the range associated with the page will need to be
remapped. When a page is allocated, vmemmap pages are freed
after remapping. When a page
On 12/14/20 5:06 PM, Mike Kravetz wrote:
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d029d938d26d..8713f8ef0f4c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4106,10 +4106,30 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm,
> struct v
.com
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization")
Cc:
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 22 +-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d029d938d26d.
As previously mentioned, I feel qualified to review the hugetlb changes
and some other closely related changes. However, this patch set is
touching quite a few areas and I do not feel qualified to make authoritative
statements about them all. I too hope others will take a look.
--
Mike Kravetz
ss_offline was noticed.
The hstate index is not reinitialized each time through the do-while loop.
Fix this as well.
Fixes: 1adc4d419aa2 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb
reservations")
Cc:
Reported-by: Adrian Moreno
Tested-by: Adrian Moreno
Signed-off-by: M
ugetlbfs: allow
> registration of ranges containing huge pages").
>
> Fix it.
>
> Cc: Mike Kravetz
> Cc: Jens Axboe
> Cc: Andrea Arcangeli
> Cc: Peter Xu
> Cc: Alexander Viro
> Cc: io-ur...@vger.kernel.org
> Cc: linux-fsde...@vger.kernel.org
> Cc: linu
On 11/22/20 11:38 PM, Michal Hocko wrote:
> On Fri 20-11-20 09:45:12, Mike Kravetz wrote:
>> On 11/20/20 1:43 AM, David Hildenbrand wrote:
> [...]
>>>>> To keep things easy, maybe simply never allow to free these hugetlb pages
>>>>> again for now? If they
is would eliminate a bunch
of the complex code doing page table manipulation. It does not address
the issue of struct page pages going away which is being discussed here,
but it could be a way to simply the first version of this code. If this
is going to be an 'opt in' feature as previously suggested, then eliminating
the PMD/huge page vmemmap mapping may be acceptable. My guess is that
sysadmins would only 'opt in' if they expect most of system memory to be used
by hugetlb pages. We certainly have database and virtualization use cases
where this is true.
--
Mike Kravetz
changed, 85 insertions(+)
Thanks for the cleanup.
Oscar made some other comments. I only have one additional minor comment
below.
With those minor cleanups,
Acked-by: Mike Kravetz
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
...
> +int vmemmap_pgtable_prealloc(struct hstate *h, struc
one before removing
the pages of struct pages. This keeps everything 'consistent' as things
are remapped.
If you want to use one of the 'pages of struct pages' for the new pte
page, then there will be a period of time when things are inconsistent.
Before setting up the mapping, some code could potentially access that
pages of struct pages.
I tend to agree that allocating allocating a new page is the safest thing
to do here. Or, perhaps someone can think of a way make this safe.
--
Mike Kravetz
7 ---
> 1 file changed, 4 insertions(+), 3 deletions(-)
Thank you,
Reviewed-by: Mike Kravetz
--
Mike Kravetz
ile changed, 2 deletions(-)
Thanks,
Reviewed-by: Mike Kravetz
--
Mike Kravetz
| 4 | ---+ | | |
> + * | 2M| | 5 | -+ | |
> + * | | | 6 | ---+ |
> + * | | | 7 | -+
> + * | | +---+
> + * | |
> + * | |
> + * +---+
--
Mike Kravetz
0;
>
> should not be needed.
> Actually, we do not initialize other values like resv_huge_pages
> or surplus_huge_pages.
>
> If that is the case, the "else" could go.
>
> Mike?
Correct. Those assignments have been in the code for a very long time.
> The changes itself look good to me.
> I think that putting all the vemmap stuff into hugetlb-vmemmap.* was
> the right choice.
Agree!
--
Mike Kravetz
def_bool HUGETLB_PAGE
> + depends on X86
> + depends on SPARSEMEM_VMEMMAP
> + depends on HAVE_BOOTMEM_INFO_NODE
> + help
> + When using SPARSEMEM_VMEMMAP, the system can save up some memory
Should that read,
When using HUGETLB_PAGE_FREE_VMEMMAP, ...
as the help message is for this config option.
--
Mike Kravetz
; __must_hold(_lock)
> {
> struct list_head surplus_list;
Thank you for noticing the type difference.
However, if the parameter delta is changed to long then we should also change
the local variables in gather_surplus_pages that are used with delta.
Specifically, the local va
On 11/12/20 4:35 PM, Mike Kravetz wrote:
> On 11/10/20 7:41 PM, Muchun Song wrote:
>> On Wed, Nov 11, 2020 at 8:47 AM Mike Kravetz wrote:
>>>
>>> On 11/8/20 6:10 AM, Muchun Song wrote:
>>> I am reading the code incorrectly it does not appear page->l
On 11/10/20 7:41 PM, Muchun Song wrote:
> On Wed, Nov 11, 2020 at 8:47 AM Mike Kravetz wrote:
>>
>> On 11/8/20 6:10 AM, Muchun Song wrote:
>> I am reading the code incorrectly it does not appear page->lru (of the huge
>> page) is being used for
r(page, HUGETLB_PAGE_DTOR);
> set_hugetlb_cgroup(page, NULL);
When I saw that comment in previous patch series, I assumed page->lru was
being used to store preallocated pages and pages to free. However, unless
I am reading the code incorrectly it does not appear page->lru (of the huge
page) is being used for this purpose. Is that correct?
If it is correct, would using page->lru of the huge page make this code
simpler? I am just missing the reason why you are using
page_huge_pte(page)->lru
--
Mike Kravetz
On 11/10/20 11:50 AM, Matthew Wilcox wrote:
> On Tue, Nov 10, 2020 at 11:31:31AM -0800, Mike Kravetz wrote:
>> On 11/9/20 5:52 AM, Oscar Salvador wrote:
>>> On Sun, Nov 08, 2020 at 10:10:55PM +0800, Muchun Song wrote:
>
> I don't like config options. I like boot opt
>
> Hi Mike, what's your opinion?
I would be happy to see this in a separate file. As Oscar mentions, the
hugetlb.c file/code is already somethat difficult to read and understand.
--
Mike Kravetz
is might be a
> trade-off between saving up memory and increasing the cost
> of certain operations on allocation/free path.
> That is why I mentioned it there.
Yes, this is somewhat a trade-off.
As a config option, this is something that would likely be decided by
distros. I almost hate to suggest this, but is it something that an
end user would want to decide? Is this something that perhaps should
be a boot/kernel command line option?
--
Mike Kravetz
ool. I remember the
use case pointed out in commit 099730d67417. It says, "I have a hugetlbfs
user which is never explicitly allocating huge pages with 'nr_hugepages'.
They only set 'nr_overcommit_hugepages' and then let the pages be allocated
from the buddy allocator at fault time." In this case, I suspect they were
using 'page fault' allocation for initialization much like someone using
/proc/sys/vm/nr_hugepages. So, the overhead may not be as noticeable.
--
Mike Kravetz
On 11/2/20 4:28 PM, Mike Kravetz wrote:
> The RFC series reverted all patches where i_mmap_rwsem was used for
> pmd sharing synchronization, and then added code to use hinode_rwsem.
> This series ends up with the same code in the end, but is structured
> as follows:
>
> - Rev
/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/
Reported-by: Qian Cai
Suggested-by: Hugh Dickins
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization")
Cc:
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c| 90 +++
ensure proper locking are also added.
Use of the new semaphore and supporting routines will be provided in a
later patch.
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization")
Cc:
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c| 12
inc
sem to address page
fault/truncate race")
Cc:
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c | 28
mm/hugetlb.c | 23 ---
2 files changed, 20 insertions(+), 31 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetl
c:
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c | 34 --
mm/hugetlb.c | 40 ++--
2 files changed, 58 insertions(+), 16 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index bc9979382a1e..6b
d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization")
Cc:
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c| 31 +--
include/linux/fs.h | 15
include/linux/hugetlb.h | 8 --
mm/hugetlb.c| 188 +++
is that this will be easier to review.
Mike Kravetz (4):
Revert hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
hugetlbfs: add hinode_rwsem to hugetlb specific inode
hugetlbfs: use hinode_rwsem for pmd sharing synchronization
huegtlbfs: handle page fault/truncate races
fs/hugetlbfs/inode.c
On 10/28/20 11:13 PM, Muchun Song wrote:
> On Thu, Oct 29, 2020 at 7:42 AM Mike Kravetz wrote:
>>
>> On 10/26/20 7:51 AM, Muchun Song wrote:
>>> +
>>> +static inline spinlock_t *vmemmap_pmd_lockptr(pmd_t *pmd)
>>> +{
>>> + static DEFINE_S
(page, HUGETLB_PAGE_DTOR);
> set_hugetlb_cgroup(page, NULL);
> @@ -1783,6 +1892,14 @@ static struct page *alloc_fresh_huge_page(struct
> hstate *h,
> if (!page)
> return NULL;
>
> + if (vmemmap_pgtable_prealloc(h, page)) {
> + if (hstate_is_gigantic(h))
> + free_gigantic_page(page, huge_page_order(h));
> + else
> + put_page(page);
> + return NULL;
> + }
> +
It seems a bit strange that we will fail a huge page allocation if
vmemmap_pgtable_prealloc fails. Not sure, but it almost seems like we shold
allow the allocation and log a warning? It is somewhat unfortunate that
we need to allocate a page to free pages.
> if (hstate_is_gigantic(h))
> prep_compound_gigantic_page(page, huge_page_order(h));
> prep_new_huge_page(h, page, page_to_nid(page));
>
--
Mike Kravetz
mentioned.
>
More eyes on that series would be appreciated.
That series will dynamically free and allocate memmap pages as hugetlb
pages are allocated or freed. I haven't looked through this series, but
my first thought is that we would need to ensure those allocs/frees are
directed to t
On 10/28/20 12:26 AM, Muchun Song wrote:
> On Wed, Oct 28, 2020 at 8:33 AM Mike Kravetz wrote:
>> On 10/26/20 7:51 AM, Muchun Song wrote:
>>
>> I see the following routines follow the pattern for vmemmap manipulation
>> in dax.
>
> Did you mean move those
gd_none(*pgd))
> + return NULL;
> + p4d = p4d_offset(pgd, addr);
> + if (p4d_none(*p4d))
> + return NULL;
> + pud = pud_offset(p4d, addr);
> +
> + WARN_ON_ONCE(pud_bad(*pud));
> + if (pud_none(*pud) || pud_bad(*pud))
> + return NULL;
> + pmd = pmd_offset(pud, addr);
> +
> + return pmd;
> +}
That routine is not really hugetlb specific. Perhaps we could move it
to sparse-vmemmap.c? Or elsewhere?
--
Mike Kravetz
; 2 files changed, 38 insertions(+)
Patch looks fine with updated commit message.
Acked-by: Mike Kravetz
--
Mike Kravetz
es from day 1 of their existence? I would prefer
not to touch them in case some is depending on current format.
--
Mike Kravetz
> ---
> drivers/base/node.c | 4 ++--
> mm/hugetlb.c| 6 +++---
> 2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/
the semaphore if pmd sharing is possible. Also,
add lockdep_assert calls to huge_pmd_share/unshare to help catch callers
not using the proper locking.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c| 11 +-
include/linux/hugetlb.h | 8 ++--
mm/hugetlb.c| 83
s per hugetlb calculation")
commit 87bf91d39bb5 ("hugetlbfs: Use i_mmap_rwsem to address page
fault/truncate race")
commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization")
Signed-off-by
is possible in an attempt to
minimize performance impacts. In addition, routines which can be used with
lockdep to help with proper locking are also added.
Use of the new semaphore and supporting routines will be provided in a
subsequent patch.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c
as necessary.
File truncation (remove_inode_hugepages) needs to handle page mapping
changes that could have happened before locking the page. This could
happen if page was added to page cache and later backed out in fault
processing.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c | 34
hinode_lock_write() helper.
- Split out addition of hinode_rwsem and helper routines to a separate
patch.
[1]
https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/
Mike Kravetz (4):
hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync
hugetlbfs: add
it could be used by the hugetlb
code to make it simpler.
--
Mike Kravetz
On 10/21/20 7:33 PM, Roman Gushchin wrote:
> On Wed, Oct 21, 2020 at 05:15:53PM -0700, Mike Kravetz wrote:
>> On 10/16/20 3:52 PM, Roman Gushchin wrote:
>>> This small patchset makes cma_release() non-blocking and simplifies
>>> the code in hugetlbfs, where previously
he bitmap in cma_clear_bitmap and could
block. However, I do not see why cma->lock has to be a mutex. I may be
missing something, but I do not see any code protected by the mutex doing
anything that could sleep?
Could we simply change that mutex to a spinlock?
--
Mike Kravetz
quot;)
Cc:
Reported-by: Michal Privoznik
Co-developed-by: David Hildenbrand
Signed-off-by: David Hildenbrand
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 20 +++-
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 67fc6383995b.
> On 21.10.20 15:11, David Hildenbrand wrote:
>> On 21.10.20 14:57, Michal Privoznik wrote:
>>> On 10/21/20 5:35 AM, Mike Kravetz wrote:
>>>> On 10/20/20 6:38 AM, David Hildenbrand wrote:
>>>>
>>>> It would be good if Mina (at least) would lo
g seen with the qemu
use case.
I'm still doing more testing and code inspection to look for other issues.
>From 861bcd7d0443f18a5fed3c3ddc5f1c71e78c4ef4 Mon Sep 17 00:00:00 2001
From: Mike Kravetz
Date: Tue, 20 Oct 2020 20:21:42 -0700
Subject: [PATCH] hugetlb_cgroup: fix reservation accounting
S
On 10/15/20 4:05 PM, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Tue, Oct 13, 2020 at 04:10:59PM -0700, Mike Kravetz wrote:
>> Due to pmd sharing, the huge PTE pointer returned by huge_pte_alloc
>> may not be valid. This can happen if a call to huge_pmd_unshare for
>> the same p
On 10/14/20 11:31 AM, Mike Kravetz wrote:
> On 10/14/20 11:18 AM, David Hildenbrand wrote:
>
> FWIW - I ran libhugetlbfs tests which do a bunch of hole punching
> with (and without) hugetlb controller enabled and did not see this issue.
>
I took a closer look aft
ks for having a look. I started poking around myself but,
> being new to cgroup code, I even failed to understand why that code gets
> triggered though the hugetlb controller isn't even enabled.
>
> I assume you at least have to make sure that there is
> a page populated (MMAP_POPULATE, or rea
is not taken
if the caller knows the target can not possibly be part of a shared pmd.
lockdep_assert calls are added to huge_pmd_share and huge_pmd_unshare to
help catch callers not using the proper locking.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c| 8
include/linux/hugetlb.h | 66
as necessary.
File truncation (remove_inode_hugepages) needs to handle page mapping
changes that could have happened before locking the page. This could
happen if page was added to page cache and later backed out in fault
processing.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c | 34
slate' approach seemed best but I am open to whatever
would be easiest to review.
[1]
https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/
Mike Kravetz (3):
hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync
hugetlbfs: introduce hinode_rwsem for pmd
s per hugetlb calculation")
commit 87bf91d39bb5 ("hugetlbfs: Use i_mmap_rwsem to address page
fault/truncate race")
commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization")
Signed-off-by
On 10/12/20 6:59 PM, Xing Zhengjun wrote:
>
>
> On 10/13/2020 1:40 AM, Mike Kravetz wrote:
>> On 10/11/20 10:29 PM, Xing Zhengjun wrote:
>>> Hi Mike,
>>>
>>> I re-test it in v5.9-rc8, the regression still existed. It is almost
>>>
301 - 400 of 2070 matches
Mail list logo