7; as it can not longer happen. Remove the dead code
and expand comments to explain reasoning. Similarly, checks for races
with truncation in the page fault path can be simplified and removed.
Cc:
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike
Comments made by Naoya were addressed.
Mike Kravetz (2):
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
fs/hugetlbfs/inode.c | 61 +++-
mm/hugetl
dition, callers
of huge_pte_alloc continue to hold the semaphore until finished with
the ptep.
- i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.
Cc:
Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz
On 12/21/18 12:21 PM, Kirill A. Shutemov wrote:
> On Fri, Dec 21, 2018 at 10:28:25AM -0800, Mike Kravetz wrote:
>> On 12/21/18 2:28 AM, Kirill A. Shutemov wrote:
>>> On Tue, Dec 18, 2018 at 02:35:57PM -0800, Mike Kravetz wrote:
>>>> Instead of writing the required
On 12/21/18 2:28 AM, Kirill A. Shutemov wrote:
> On Tue, Dec 18, 2018 at 02:35:57PM -0800, Mike Kravetz wrote:
>> Instead of writing the required complicated code for this rare
>> occurrence, just eliminate the race. i_mmap_rwsem is now held in read
>> mode for the d
On 12/21/18 2:05 AM, Kirill A. Shutemov wrote:
> On Tue, Dec 18, 2018 at 02:35:56PM -0800, Mike Kravetz wrote:
>> While looking at BUGs associated with invalid huge page map counts,
>> it was discovered and observed that a huge pte pointer could become
>> 'invalid' a
7; as it can not longer happen. Remove the dead code
and expand comments to explain reasoning. Similarly, checks for races
with truncation in the page fault path can be simplified and removed.
Cc:
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike
7-1-mike.krav...@oracle.com
Comments made by Naoya were addressed.
Mike Kravetz (2):
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
fs/hugetlbfs/inode.c | 50 +
mm/hugetlb.c
dition, callers
of huge_pte_alloc continue to hold the semaphore until finished with
the ptep.
- i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.
Cc:
Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz
On 12/18/18 2:10 PM, Andrew Morton wrote:
> On Mon, 17 Dec 2018 16:17:52 -0800 Mike Kravetz
> wrote:
>
>> ...
>>
>>> As you suggested in a comment to the subsequent patch, it would be better to
>>> combine the patches and remove the dead code when it
On 12/17/18 10:42 AM, Mike Kravetz wrote:
> On 12/17/18 2:25 AM, Aneesh Kumar K.V wrote:
>> On 12/4/18 1:38 AM, Mike Kravetz wrote:
>>>
>>> Instead of writing the required complicated code for this rare
>>> occurrence, just eliminate the race. i_mmap_rwsem
Tkhai
Thanks for cleaning this up!
Reviewed-by: Mike Kravetz
--
Mike Kravetz
On 12/17/18 2:25 AM, Aneesh Kumar K.V wrote:
> On 12/4/18 1:38 AM, Mike Kravetz wrote:
>> hugetlbfs page faults can race with truncate and hole punch operations.
>> Current code in the page fault path attempts to handle this by 'backing
>> out' operations if we e
o separate patches addressing each issue.
Hopefully, this is easier to understand/review.
Mike Kravetz (3):
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
hugetlbfs: remove unnecessary code after i_mmap_rwsem synchroniz
After expanding i_mmap_rwsem use for better shared pmd and page fault/
truncation synchronization, remove code that is no longer necessary.
Cc:
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/in
t eliminate the race. i_mmap_rwsem is now held in read
mode for the duration of page fault processing. Hold i_mmap_rwsem
longer in truncation and hold punch code to cover the call to
remove_inode_hugepages.
Cc:
Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
Signed-off-by: M
dition, callers
of huge_pte_alloc continue to hold the semaphore until finished with
the ptep.
- i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.
Cc:
Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz
gt;
>> Signed-off-by: Yongkai Wu
>> Acked-by: Michal Hocko
>> Acked-by: Mike Kravetz
Thank you for fixing the formatting and commit message.
Adding Andrew on so he can add to his tree as appropriatre. Also Cc'ing Michal.
>> ---
>> mm/hugetlb.c | 5 +++--
&g
ation to the
change log (commit message) and fix the formatting of the patch.
Can you tell us more about the root cause of your issue? What was the issue?
How could you reproduce it? How did you solve it?
--
Mike Kravetz
On 11/5/18 1:30 PM, Andrew Morton wrote:
> On Mon, 5 Nov 2018 13:23:15 -0800 Mike Kravetz
> wrote:
>
>> This bug has been experienced several times by Oracle DB team.
>> The BUG is in the routine remove_inode_hugepages() as follows:
>> /*
>> * If
iated page.
This is how we end up with an elevated map count.
To solve, check the dst_pte entry for huge_pte_none. If !none, this
implies PMD sharing so do not copy.
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 23 +++
1 file changed, 19 insertions(+), 4 deletions(-)
diff
On 10/25/18 5:42 PM, Naoya Horiguchi wrote:
> Hi Mike,
>
> On Tue, Oct 23, 2018 at 09:50:53PM -0700, Mike Kravetz wrote:
>> Now, anopther task truncates the hugetlbfs file. As part of truncation,
>> it unmaps everyone who has the file mapped. If a task has a shared p
worse. This leads to bad
things such as incorrect page map/reference counts or invaid memory
references.
Fix this all by modifying the usage of i_mmap_rwsem to cover
fault/truncate races as well as handling of shared pmds
Mike Kravetz (1):
hugetlbfs: use i_mmap_rwsem for pmd sharing and tru
d in read mode after huge_pte_alloc, until the caller
is finished with the returned ptep.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c | 21 ++
mm/hugetlb.c | 65 +---
mm/rmap.c| 10 +++
mm/userfaultfd.c |
On 10/23/18 12:43 AM, Michal Hocko wrote:
> On Wed 17-10-18 21:10:22, Mike Kravetz wrote:
>> Some test systems were experiencing negative huge page reserve
>> counts and incorrect file block counts. This was traced to
>> /proc/sys/vm/drop_caches removing clean pages f
On 10/18/18 6:47 PM, Andrew Morton wrote:
> On Thu, 18 Oct 2018 20:46:21 -0400 Andrea Arcangeli
> wrote:
>
>> On Thu, Oct 18, 2018 at 04:16:40PM -0700, Mike Kravetz wrote:
>>> I was not sure about this, and expected someone could come up with
>>> something
On 10/18/18 4:08 PM, Andrew Morton wrote:
> On Wed, 17 Oct 2018 21:10:22 -0700 Mike Kravetz
> wrote:
>
>> Some test systems were experiencing negative huge page reserve
>> counts and incorrect file block counts. This was traced to
>> /proc/sys/vm/drop_caches removing
tle sense to even try to drop hugetlbfs
pagecache pages, so disable calls to these filesystems in drop_caches
code.
Fixes: 70c3547e36f5 ("hugetlbfs: add hugetlbfs_fallocate()")
Cc: sta...@vger.kernel.org
Signed-off-by: Mike Kravetz
---
fs/drop_caches.c | 7 +++
mm/hugetlb.c | 6 ++
On 10/8/18 1:03 AM, Kirill A. Shutemov wrote:
> On Sun, Oct 07, 2018 at 04:38:48PM -0700, Mike Kravetz wrote:
>> The following hugetlbfs truncate/page fault race can be recreated
>> with programs doing something like the following.
>>
>> A huegtlbfs file is mmap(MAP_SHA
following patch describes the current race in detail and adds the mutex
to prevent truncate/fault races.
Mike Kravetz (1):
hugetlbfs: introduce truncation/fault mutex to avoid races
fs/hugetlbfs/inode.c| 24
include/linux/hugetlb.h | 1 +
mm/hugetlb.c| 25
ation takes in write mode.
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c| 24
include/linux/hugetlb.h | 1 +
mm/hugetlb.c| 25 +++--
mm/userfaultfd.c| 8 +++-
4 files changed, 47 insertions(+), 11 deletions(-)
diff
nitions per Mike
Thanks Anshuman,
Acked-by: Mike Kravetz
--
Mike Kravetz
>
> include/uapi/asm-generic/hugetlb_encode.h | 2 ++
> include/uapi/linux/memfd.h| 2 ++
> include/uapi/linux/mman.h | 2 ++
> include/uapi/linux/shm.h
; include/uapi/linux/mman.h | 2 ++
> 2 files changed, 4 insertions(+)
Thanks Anshuman,
However, I think we should also add similar definitions in:
uapi/linux/memfd.h
uapi/linux/shm.h
--
Mike Kravetz
t;> Can we come up with a 2M base page VM or something? We have possible
>> memory sizes of a couple TB now. That should give us a million or so 2M
>> pages to work with.
>
> That sounds a good idea. Don't know whether someone has tried this.
IIRC, Hugh Dickins and some others at Google tried going down this path.
There was a brief discussion at LSF/MM. It is something I too would like
to explore in my spare time.
--
Mike Kravetz
On 09/05/2018 04:07 PM, Matthew Wilcox wrote:
> On Wed, Sep 05, 2018 at 03:00:08PM -0700, Andrew Morton wrote:
>> On Wed, 5 Sep 2018 14:35:11 -0700 Mike Kravetz
>> wrote:
>>
>>>>so perhaps we could put some
>>&
he powerpc iommu code was added, I doubt this was taken into account.
I would be afraid of someone adding put_page from hardirq context.
--
Mike Kravetz
> And attention will need to be paid to -stable backporting. How long
> has mm_iommu_free() existed, and been doing this?
ly need/want a separate patch. We
could just add the notifiers to the shared pmd patch. Back porting the
shared pmd patch will also require some fixup.
Either would work. I'll admit I do not know what stable maintainers would
prefer.
--
Mike Kravetz
se wrote:
>>>>> On Wed, Aug 29, 2018 at 08:39:06PM +0200, Michal Hocko wrote:
>>>>>> On Wed 29-08-18 14:14:25, Jerome Glisse wrote:
>>>>>>> On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote:
>>>>>> [...]
>
On 08/29/2018 02:11 PM, Jerome Glisse wrote:
> On Wed, Aug 29, 2018 at 08:39:06PM +0200, Michal Hocko wrote:
>> On Wed 29-08-18 14:14:25, Jerome Glisse wrote:
>>> On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote:
>> [...]
>>>> What would be the best
On 08/27/2018 06:46 AM, Jerome Glisse wrote:
> On Mon, Aug 27, 2018 at 09:46:45AM +0200, Michal Hocko wrote:
>> On Fri 24-08-18 11:08:24, Mike Kravetz wrote:
>>> Here is an updated patch which does as you suggest above.
>> [...]
>>> @@ -1409,6 +1419,32 @@ static
On 08/27/2018 12:46 AM, Michal Hocko wrote:
> On Fri 24-08-18 11:08:24, Mike Kravetz wrote:
>> On 08/24/2018 01:41 AM, Michal Hocko wrote:
>>> On Thu 23-08-18 13:59:16, Mike Kravetz wrote:
>>>
>>> Acked-by: Michal Hocko
>>>
>>> One nit be
On 08/24/2018 01:41 AM, Michal Hocko wrote:
> On Thu 23-08-18 13:59:16, Mike Kravetz wrote:
>
> Acked-by: Michal Hocko
>
> One nit below.
>
> [...]
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 3103099f64fd..a73c5728e961 100644
>> --- a/mm/hugetl
that this range is flushed
if huge_pmd_unshare succeeds and unmaps a PUD_SUZE area.
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 53 +++-
1 file changed, 44 insertions(+), 9 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index
routine as suggested by
Kirill.
v3-v5: Address build errors if !CONFIG_HUGETLB_PAGE and
!CONFIG_ARCH_WANT_HUGE_PMD_SHARE
Mike Kravetz (2):
mm: migration: fix migration of huge PMD shared pages
hugetlb: take PMD sharing into account when flushing tlb/caches
include/linux/huge
page")
Cc: sta...@vger.kernel.org
Signed-off-by: Mike Kravetz
---
include/linux/hugetlb.h | 14 ++
mm/hugetlb.c| 40 +--
mm/rmap.c | 42 ++---
3 files changed, 91 insertions(+), 5 deleti
On 08/23/2018 10:01 AM, Mike Kravetz wrote:
> On 08/23/2018 05:48 AM, Michal Hocko wrote:
>> On Tue 21-08-18 18:10:42, Mike Kravetz wrote:
>> [...]
>>
>> OK, after burning myself when trying to be clever here it seems like
>> your proposed solutio
On 08/23/2018 05:48 AM, Michal Hocko wrote:
> On Tue 21-08-18 18:10:42, Mike Kravetz wrote:
> [...]
>
> OK, after burning myself when trying to be clever here it seems like
> your proposed solution is indeed simpler.
>
>> +bool huge_pmd_sharing_possible(st
On 08/23/2018 01:21 AM, Kirill A. Shutemov wrote:
> On Thu, Aug 23, 2018 at 09:30:35AM +0200, Michal Hocko wrote:
>> On Wed 22-08-18 09:48:16, Mike Kravetz wrote:
>>> On 08/22/2018 05:28 AM, Michal Hocko wrote:
>>>> On Tue 21-08-18 18:10:42, Mike Kravetz wrote:
&g
On 08/22/2018 02:05 PM, Kirill A. Shutemov wrote:
> On Tue, Aug 21, 2018 at 06:10:42PM -0700, Mike Kravetz wrote:
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 3103099f64fd..f085019a4724 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -4555,6 +45
On 08/22/2018 05:28 AM, Michal Hocko wrote:
> On Tue 21-08-18 18:10:42, Mike Kravetz wrote:
> [...]
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index eb477809a5c0..8cf853a4b093 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1362,11 +1362,21 @@ static boo
ration of huge PMD shared pages issue. That is sort
of a Tested-by :).
Just wanted to point out that it was pretty easy to hit this issue. It
was easier than the issue I am working. And, the issue I am trying to
address was seen in a real customer environment. So, I would not be
surprised to see this issue in real customer environments as well.
If you (or others) think we should go forward with these patches, I can
spend some time doing a review. Already did a 'quick look' some time back.
--
Mike Kravetz
te to
> help improve the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Mike-Kravetz/huge_pmd_unshare-migration-and-flushing/20180822-050255
> config: sparc64-allyesconfig (attached as .config)
> compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2
te to
> help improve the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Mike-Kravetz/huge_pmd_unshare-migration-and-flushing/20180822-050255
> config: i386-tinyconfig (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> reproduce:
> #
page")
Cc: sta...@vger.kernel.org
Signed-off-by: Mike Kravetz
---
include/linux/hugetlb.h | 14 ++
mm/hugetlb.c| 40 +++
mm/rmap.c | 42 ++---
3 files changed, 93 insertions(+), 3 deleti
advantage of the new routine
huge_pmd_sharing_possible() to adjust flushing ranges in the cases
where huge PMD sharing is possible. There is no copy to stable for this
patch as it has not been reported as an issue and discovered only via
code inspection.
Mike Kravetz (2):
mm: migration: fix migration of
range is flushed if
huge_pmd_unshare succeeds and unmaps a PUD_SUZE area.
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 53 +++-
1 file changed, 44 insertions(+), 9 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fd155dc52117
On 08/14/2018 01:48 AM, Kirill A. Shutemov wrote:
> On Mon, Aug 13, 2018 at 11:21:41PM +0000, Mike Kravetz wrote:
>> On 08/13/2018 03:58 AM, Kirill A. Shutemov wrote:
>>> On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote:
>>>> I am not %100 sure on the req
dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz
---
v2: Fixed build issue for !CONFIG_HUGETLB_PAGE and typos in comment
include/linux/hugetlb.h | 6 ++
mm/rmap.c | 21 +
2 files changed, 27 insertions(+)
diff --git a/inclu
On 08/13/2018 03:58 AM, Kirill A. Shutemov wrote:
> On Sun, Aug 12, 2018 at 08:41:08PM -0700, Mike Kravetz wrote:
>> The page migration code employs try_to_unmap() to try and unmap the
>> source page. This is accomplished by using rmap_walk to find all
>> vmas where the
huge_pmd_unshare for hugetlbfs huge pages. If it is a
shared mapping it will be 'unshared' which removes the page table
entry and drops reference on PMD page. After this, flush caches and
TLB.
Signed-off-by: Mike Kravetz
---
I am not %100 sure on the required flushing, so suggestions would be
a
On 07/05/2018 04:07 AM, Alexandre Ghiti wrote:
> arm, arm64, ia64, parisc, powerpc, sh, sparc, x86 architectures
> use the same version of huge_pte_none, so move this generic
> implementation into asm-generic/hugetlb.h.
>
Reviewed-by: Mike Kravetz
--
Mike Kravetz
> Signed-of
On 07/17/2018 06:28 PM, Naoya Horiguchi wrote:
> On Tue, Jul 17, 2018 at 01:10:39PM -0700, Mike Kravetz wrote:
>> It seems that soft_offline_free_page can be called for in use pages.
>> Certainly, that is the case in the first workflow above. With the
>> suggested changes, I
workflow above. With the
suggested changes, I think this is OK for huge pages. However, it seems
that setting HWPoison on a in use non-huge page could cause issues?
While looking at the code, I noticed this comment in __get_any_page()
/*
* When the target page is a free hugepage, just remove it
* from free hugepage list.
*/
Did that apply to some code that was removed? It does not seem to make
any sense in that routine.
--
Mike Kravetz
unt of time as with 0, as opposed to the 25+
> minutes it would take before.
>
> Signed-off-by: Cannon Matthews
Thanks,
Acked-by: Mike Kravetz
--
Mike Kravetz
> ---
> v2: removed the memset of the huge_bootmem_page area and added
> INIT_LIST_HEAD instead.
>
> mm/hugetlb.c
On 07/11/2018 01:57 PM, Davidlohr Bueso wrote:
> On Wed, 11 Jul 2018, Mike Kravetz wrote:
>
>> This reverts commit ee8f248d266e ("hugetlb: add phys addr to struct
>> huge_bootmem_page")
>>
>> At one time powerpc used this field and supporting code.
line").
There are no users of this field and supporting code, so remove it.
Signed-off-by: Mike Kravetz
---
include/linux/hugetlb.h | 3 ---
mm/hugetlb.c| 9 +
2 files changed, 1 insertion(+), 11 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.
add. We do
>> initialize the rest explicitly already.
>
> Forgot to mention that after that is addressed you can add
> Acked-by: Michal Hocko
Cannon,
How about if you make this change suggested by Michal, and I will submit
a separate patch to revert the patch which added the phys field to
huge_bootmem_page structure.
FWIW,
Reviewed-by: Mike Kravetz
--
Mike Kravetz
n't find any code that sets phys.
Although, it is potentially used in gather_bootmem_prealloc(). It appears
powerpc used this field at one time, but no longer does.
Am I missing something?
Not an issue with this patch, rather existing code. I'd prefer not to do
the memset(
off-by: Cannon Matthews
My only suggestion would be to remove the mention of 2M pages in the
commit message. Thanks for adding this.
Reviewed-by: Mike Kravetz
--
Mike Kravetz
> ---
> mm/hugetlb.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
&
tfd is/can be enabled for impacted architectures.
Reviewed-by: Mike Kravetz
--
Mike Kravetz
> ---
> fs/userfaultfd.c | 12 +++-
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 123bf7d516fc..594d192b2331 1006
ection. Unlike the case in ad55eac74f20, the access
permissions on section 1 (RE) are different than section 2 (RW). If we
allowed the previous MAP_FIXED behavior, we would be changing part of a
read only section to read write. This is exactly what MAP_FIXED_NOREPLACE
was designed to prevent.
--
Mike Kravetz
nmapped the overlapping page. However, this does not seem right.
--
Mike Kravetz
On 05/30/2018 01:02 AM, Michal Hocko wrote:
> On Tue 29-05-18 15:21:14, Mike Kravetz wrote:
>> Just a quick heads up. I noticed a change in libhugetlbfs testing starting
>> with v4.17-rc1.
>>
>> V4.16 libhugetlbfs test results
>> ** TEST SUM
libhugetlbfs tests for an unrelated
issue/change and, will do some analysis to see exactly what is happening.
Also, will take it upon myself to run libhugetlbfs test suite on a
regular (at least weekly) basis.
--
Mike Kravetz
ssh_to_dbg # sudo ./test_mmap 4
mapping 4 huge pages
address 7f62bba0 read (-)
address 7f62bbc0 read (-)
Connection to dbg closed by remote host.
Connection to dbf closed.
OOM did kick in (lots of console/log output) and killed the shell
as well.
--
Mike Kravetz
ge after the page fault under heavy cache contention.
>
> Signed-off-by: "Huang, Ying"
Reviewed-by: Mike Kravetz
--
Mike Kravetz
> Cc: Mike Kravetz
> Cc: Michal Hocko
> Cc: David Rientjes
> Cc: Andrea Arcangeli
> Cc: "Kirill A. Shutemov"
> Cc: Andi
> the "address" is renamed to "haddr" in hugetlb_cow() in this patch.
> Next patch will use target subpage address in hugetlb_cow() too.
>
> The patch is just code cleanup without any functionality changes.
>
> Signed-off-by: "Huang, Ying"
> Suggested-
ry area from the begin to the end, so
> cause copy on write. For each child process, other child processes
> could be seen as other workloads which generate heavy cache pressure.
> At the same time, the IPC (instruction per cycle) increased from 0.63
> to 0.78, and the time spent in user sp
er inline support of the compilers,
> the indirect call will be optimized to be the direct call. Our tests
> show no performance change with the patch.
>
> This patch is a code cleanup without functionality change.
>
> Signed-off-by: "Huang, Ying"
> Suggested-by:
allocator.
I really do not think hugetlbfs overcommit will provide any benefit over
THP for your use case. Also, new user space code is required to "fall back"
to normal pages in the case of hugetlbfs page allocation failure. This
is not needed in the THP case.
--
Mike Kravetz
On 05/22/2018 09:41 AM, Reinette Chatre wrote:
> On 5/21/2018 4:48 PM, Mike Kravetz wrote:
>> On 05/21/2018 01:54 AM, Vlastimil Babka wrote:
>>> On 05/04/2018 01:29 AM, Mike Kravetz wrote:
>>>> +/**
>>>> + * find_alloc_contig_pages() --
ages are not accounted for when
they are allocated as 'reserves'. It is not until these reserves are actually
used that accounting limits are checked. This 'seems' to align with general
allocation of huge pages within the pool. No accounting is done until they
are actually allocated to a mapping/file.
--
Mike Kravetz
On 05/21/2018 05:00 AM, Vlastimil Babka wrote:
> On 05/04/2018 01:29 AM, Mike Kravetz wrote:
>> Vlastimil and Michal brought up the issue of allocation alignment. The
>> routine will currently align to 'nr_pages' (which is the requested size
>> argument). It does
On 05/21/2018 01:54 AM, Vlastimil Babka wrote:
> On 05/04/2018 01:29 AM, Mike Kravetz wrote:
>> find_alloc_contig_pages() is a new interface that attempts to locate
>> and allocate a contiguous range of pages. It is provided as a more
>
> How about dropping the 'fin
On 05/18/2018 03:32 AM, Vlastimil Babka wrote:
> On 05/04/2018 01:29 AM, Mike Kravetz wrote:
>> The routine start_isolate_page_range and alloc_contig_range have
>> comments saying that migratetype must be either MIGRATE_MOVABLE or
>> MIGRATE_CMA. However, this is not enforc
On 05/17/2018 09:27 PM, TSUKADA Koutaro wrote:
> Thanks to Mike Kravetz for comment on the previous version patch.
>
> The purpose of this patch-set is to make it possible to control whether or
> not to charge surplus hugetlb pages obtained by overcommitting to memory
> cgroup. I
. hugetlb.c and hugetlb.h are
not 100% hugetlbfs, but a majority of their content is hugetlbfs
related.
Signed-off-by: Mike Kravetz
---
MAINTAINERS | 8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 9051a9ca24a2..c7a5eb074eb1 100644
--- a
On 05/18/2018 02:12 AM, Vlastimil Babka wrote:
> On 05/04/2018 01:29 AM, Mike Kravetz wrote:
>> free_contig_range() is currently defined as:
>> void free_contig_range(unsigned long pfn, unsigned nr_pages);
>> change to,
>> void free_contig_range(unsigned long
r in which sub-pages are to be copied. IIUC, you
added the same algorithm for sub-page ordering to copy_huge_page()
that was previously added to clear_huge_page(). Correct? If so,
then perhaps a common helper could be used by both the clear and copy
huge page routines. It would also make maintenance easier.
--
Mike Kravetz
oad which generates heavy cache
> pressure. At the same time, the cache miss rate reduced from ~36.3%
> to ~25.6%, the IPC (instruction per cycle) increased from 0.3 to 0.37,
> and the time spent in user space is reduced ~19.3%.
>
Agree with Michal that commit message looks better.
that another area to consider?
That gets back to Michal's question of a specific use case or generic
optimization. Unless code is simple (as in this patch), seems like we should
hold off on considering additional optimizations unless there is a specific
use case.
I'm still OK with this change.
--
Mike Kravetz
to ~25.6%, the
> IPC (instruction per cycle) increased from 0.3 to 0.37, and the time
> spent in user space is reduced ~19.3%
Since this patch only addresses hugetlbfs huge pages, I would suggest
making that more explicit in the commit message. Other than that, the
changes look fine to me.
DONE
> ok 1..2 selftests: memfd: run_fuse_test.sh [PASS]
> selftests: memfd: run_hugetlbfs_test.sh
>
> Please run memfd with hugetlbfs test as root
> not ok 1..3 selftests: memfd: run_hugetlbfs_test.sh [SKIP]
>
> Signed-off-by: Shuah Khan (Samsung OSG)
Thanks for all your
n it.
>
> In addition, return skip code when not enough huge pages are available to
> run the test.
>
> Kselftest framework SKIP code is 4 and the framework prints appropriate
> messages to indicate that the test is skipped.
>
> Signed-off-by: Shuah Khan (Samsung OSG)
Tha
ksft_skip
We now KNOW that we are running as root because of the check above. We
can delete this test, and rely on the later check to determine if the
number of huge pages was actually increased.
How about this instead (untested)?
Signed-off-by: Mike Kravetz
diff --git a/tools/testing/selftests/
On 05/03/2018 05:09 PM, TSUKADA Koutaro wrote:
> On 2018/05/03 11:33, Mike Kravetz wrote:
>> On 05/01/2018 11:54 PM, TSUKADA Koutaro wrote:
>>> On 2018/05/02 13:41, Mike Kravetz wrote:
>>>> What is the reason for not charging pages at allocation/reserve time? I
is employed
if possible. There is no guarantee that the routine will succeed.
So, the user must be prepared for failure and have a fall back plan.
Signed-off-by: Mike Kravetz
---
include/linux/gfp.h | 12 +
mm/page_alloc.c | 136 +++-
2
an unsigned int.
However, this should be changed to an unsigned long to be consistent
with other page counts.
Signed-off-by: Mike Kravetz
---
include/linux/gfp.h | 2 +-
mm/cma.c| 2 +-
mm/hugetlb.c| 2 +-
mm/page_alloc.c | 6 +++---
4 files changed, 6 insertions(+), 6
as there are two primary
users. Contiguous range allocation which wants to enforce migration
type checking. Memory offline (hotplug) which is not concerned about
type checking.
Signed-off-by: Mike Kravetz
---
include/linux/page-isolation.h | 8 +++-
mm/memory_hotplug.c| 2
Use the new find_alloc_contig_pages() interface for the allocation of
gigantic pages and remove associated code in hugetlb.c.
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 87 +---
1 file changed, 6 insertions(+), 81 deletions(-)
diff
701 - 800 of 1239 matches
Mail list logo