implementation and
your new implementation. Originally, PMD is restored after trying to
migrate the misplaced THP. I think this can reduce the TLB
shooting-down IPI.
Best Regards,
Huang, Ying
> In the old code anon_vma lock was needed to serialize THP migration
> against THP split, but si
; --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2127,7 +2127,7 @@ static inline bool is_shared_exec_page(struct
> vm_area_struct *vma,
> * the page that will be dropped by this function before returning.
> */
> int migrate_misplaced_page(struct page *page, struct vm_area_struc
inaccessible. But the difference between the
accessible window is small. Because the page will be made
inaccessible soon for migrating.
Signed-off-by: "Huang, Ying"
Cc: Peter Zijlstra
Cc: Mel Gorman
Cc: Peter Xu
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: "Matthew Wilcox"
Mel Gorman writes:
> On Thu, Mar 25, 2021 at 12:33:45PM +0800, Huang, Ying wrote:
>> > I caution against this patch.
>> >
>> > It's non-deterministic for a number of reasons. As it requires NUMA
>> > balancing to be enabled, the pageout behaviour of a
Hi, Mel,
Thanks for comment!
Mel Gorman writes:
> On Wed, Mar 24, 2021 at 04:32:09PM +0800, Huang Ying wrote:
>> One idea behind the LRU page reclaiming algorithm is to put the
>> access-once pages in the inactive list and access-more-than-once pages
>> in the activ
and cold pages. But generally, I don't think it is a good idea to
improve the performance via increasing the system overhead purely.
Signed-off-by: "Huang, Ying"
Inspired-by: Yu Zhao
Cc: Hillf Danton
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: Matthew Wilcox
Cc: Mel Gorman
Cc: Michal
Yu Zhao writes:
> On Mon, Mar 22, 2021 at 11:13:19AM +0800, Huang, Ying wrote:
>> Yu Zhao writes:
>>
>> > On Wed, Mar 17, 2021 at 11:37:38AM +0800, Huang, Ying wrote:
>> >> Yu Zhao writes:
>> >>
>> >> > On Tue, Mar 16, 20
Yu Zhao writes:
> On Wed, Mar 17, 2021 at 11:37:38AM +0800, Huang, Ying wrote:
>> Yu Zhao writes:
>>
>> > On Tue, Mar 16, 2021 at 02:44:31PM +0800, Huang, Ying wrote:
>> > The scanning overhead is only one of the two major problems of the
>> &g
Yu Zhao writes:
> On Tue, Mar 16, 2021 at 02:44:31PM +0800, Huang, Ying wrote:
>> Yu Zhao writes:
>>
>> > On Tue, Mar 16, 2021 at 10:07:36AM +0800, Huang, Ying wrote:
>> >> Rik van Riel writes:
>> >>
>>
Yu Zhao writes:
> On Tue, Mar 16, 2021 at 02:52:52PM +0800, Huang, Ying wrote:
>> Yu Zhao writes:
>>
>> > On Tue, Mar 16, 2021 at 10:08:51AM +0800, Huang, Ying wrote:
>> >> Yu Zhao writes:
>> >> [snip]
>> >>
>> >&g
Yu Zhao writes:
> On Tue, Mar 16, 2021 at 10:08:51AM +0800, Huang, Ying wrote:
>> Yu Zhao writes:
>> [snip]
>>
>> > +/* Main function used by foreground, background and user-triggered aging.
>> > */
>> > +static bool walk_mm_li
Yu Zhao writes:
> On Tue, Mar 16, 2021 at 10:07:36AM +0800, Huang, Ying wrote:
>> Rik van Riel writes:
>>
>> > On Sat, 2021-03-13 at 00:57 -0700, Yu Zhao wrote:
>> >
>> >> +/*
>> >> + * After pages are faulted in, they become the younge
ation of the function?
And may be the number of mm_struct and the number of pages scanned.
In comparison, in the traditional LRU algorithm, for each round, only a
small subset of the whole physical memory is scanned.
Best Regards,
Huang, Ying
> +
> + if (!last) {
> +
scheduled after the previous
scanning will not be scanned. I guess that this helps OOM kills?
If so, how about just take advantage of that information for OOM killing
and page reclaiming? For example, if a process hasn't been scheduled
for long time, just reclaim its private pages.
Best Regards,
Huang, Ying
Hi, Butt,
Shakeel Butt writes:
> On Wed, Mar 10, 2021 at 4:47 PM Huang, Ying wrote:
>>
>> From: Huang Ying
>>
>> In shrink_node(), to determine whether to enable cache trim mode, the
>> LRU size is gotten via lruvec_page_state(). That gets th
decreases 51.4% (from 213.0 MB/s to 103.6
MB/s) with the patch, while the benchmark score decreases only 1.8%.
A new sysctl knob kernel.numa_balancing_rate_limit_mbps is added for
the users to specify the limit.
TODO: Add ABI document for new sysctl knob.
Signed-off-by: "Huang, Ying"
% with 32.4% fewer NUMA page migrations on a 2 socket Intel server
with Optance DC Persistent Memory. Because it improves the accuracy
of the hot page selection.
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo
nse.
- If fast response is more important for system performance, the
administrator can set a higher hot threshold.
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dave Hansen
Cc: Dan Williams
Cc:
-by: "Huang, Ying"
Suggested-by: Dave Hansen
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dan Williams
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
mm/huge_memory.c | 30 +-
mm/mprotect
cleanup.
- Rebased on the latest page demotion patchset.
v2:
- Addressed comments for V1.
- Rebased on v5.5.
Huang Ying (6):
NUMA balancing: optimize page placement for memory tiering system
memory tiering: add page promotion counter
memory tiering: skip to scan fast memory
memo
TODO:
- Update ABI document: Documentation/sysctl/kernel.txt
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dave Hansen
Cc: Dan Williams
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.or
To distinguish the number of the memory tiering promoted pages from
that of the originally inter-socket NUMA balancing migrated pages.
The counter is per-node (count in the target node). So this can be
used to identify promotion imbalance among the NUMA nodes.
Signed-off-by: "Huang, Ying
From: Huang Ying
In shrink_node(), to determine whether to enable cache trim mode, the
LRU size is gotten via lruvec_page_state(). That gets the value from
a per-CPU counter (mem_cgroup_per_node->lruvec_stat[]). The error of
the per-CPU counter from CPU local counting and the descendant mem
Hillf Danton writes:
> On Thu, 4 Feb 2021 18:10:51 +0800 Huang Ying wrote:
>> With the advent of various new memory types, some machines will have
>> multiple types of memory, e.g. DRAM and PMEM (persistent memory). The
>> memory subsystem of these machines can be
To distinguish the number of the memory tiering promoted pages from
that of the originally inter-socket NUMA balancing migrated pages.
The counter is per-node (count in the target node). So this can be
used to identify promotion imbalance among the NUMA nodes.
Signed-off-by: "Huang, Ying
decreases 51.4% (from 213.0 MB/s to 103.6
MB/s) with the patch, while the benchmark score decreases only 1.8%.
A new sysctl knob kernel.numa_balancing_rate_limit_mbps is added for
the users to specify the limit.
TODO: Add ABI document for new sysctl knob.
Signed-off-by: "Huang, Ying"
% with 32.4% fewer NUMA page migrations on a 2 socket Intel server
with Optance DC Persistent Memory. Because it improves the accuracy
of the hot page selection.
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo
nse.
- If fast response is more important for system performance, the
administrator can set a higher hot threshold.
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dave Hansen
Cc: Dan Williams
Cc:
-by: "Huang, Ying"
Suggested-by: Dave Hansen
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dan Williams
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
include/linux/node.h | 5 +
mm/huge_memory.
TODO:
- Update ABI document: Documentation/sysctl/kernel.txt
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dave Hansen
Cc: Dan Williams
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.or
ebased on the latest page demotion patchset.
v2:
- Addressed comments for V1.
- Rebased on v5.5.
Huang Ying (6):
NUMA balancing: optimize page placement for memory tiering system
memory tiering: skip to scan fast memory
memory tiering: hot page selection with hint page fault latency
memory
d to me.
Acked-by: "Huang, Ying"
> ---
> mm/swap_state.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index d0d417efeecc..3cdee7b11da9 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@
"Alejandro Colomar (man-pages)" writes:
> Hi Huang Ying,
>
> On 1/20/21 7:12 AM, Huang Ying wrote:
>> Signed-off-by: "Huang, Ying"
>> Cc: "Alejandro Colomar"
>
> Sorry, for the confusion.
> I have a different email for reading lists.
Matthew Wilcox writes:
> On Wed, Jan 20, 2021 at 03:27:11PM +0800, Huang Ying wrote:
>> To catch the error in updating the swap cache shadow entries or their count.
>
> I just resent a patch that removes nrexceptional tracking.
>
> Can you use !mapping_empty() inst
Michal Hocko writes:
> On Wed 20-01-21 15:27:11, Huang Ying wrote:
>> To catch the error in updating the swap cache shadow entries or their count.
>
> What is the error?
There's no error in the current code. But we will change the related
code in the future. So this checki
To catch the error in updating the swap cache shadow entries or their count.
Signed-off-by: "Huang, Ying"
Cc: Minchan Kim
Cc: Joonsoo Kim ,
Cc: Johannes Weiner ,
Cc: Vlastimil Babka , Hugh Dickins ,
Cc: Mel Gorman ,
Cc: Michal Hocko ,
Cc: Dan Williams ,
Cc: Christoph Hellwig , Il
Signed-off-by: "Huang, Ying"
Cc: "Alejandro Colomar"
---
man2/set_mempolicy.2 | 22 ++
1 file changed, 22 insertions(+)
diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
index 68011eecb..fa64a1820 100644
--- a/man2/set_mempolicy.2
+++ b/man2/set_m
be used before the --membind/-m memory policy in the command
line. With it, the Linux kernel NUMA balancing will be enabled for
the process if --membind/-m is used and the feature is supported by
the kernel.
Signed-off-by: "Huang, Ying"
---
libnuma.c | 14 ++
numa.3
from
node 1 to node 3 after killing the memory eater, and the pmbench score
can increase about 17.5%.
Signed-off-by: "Huang, Ying"
Acked-by: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: "Matthew Wilcox (Oracle)"
Cc: Dave Hanse
necessary.
v4:
- Use new flags instead of reuse MPOL_MF_LAZY.
v3:
- Rebased on latest upstream (v5.10-rc3)
- Revised the change log.
v2:
- Rebased on latest upstream (v5.10-rc1)
Best Regards,
Huang, Ying
Linus Torvalds writes:
> On Tue, Jan 12, 2021 at 9:24 PM huang ying
> wrote:
>> >
>> > Couldn't we just move it to the tail of the LRU list so it's reclaimed
>> > first? Or is locking going to be a problem here?
>>
>> Yes. That's a way to
On Wed, Jan 13, 2021 at 11:12 AM Matthew Wilcox wrote:
>
> On Wed, Jan 13, 2021 at 11:08:56AM +0800, huang ying wrote:
> > On Wed, Jan 13, 2021 at 10:47 AM Linus Torvalds
> > wrote:
> > >
> > > On Tue, Jan 12, 2021 at 6:43 PM Huang Ying wrote:
> >
On Wed, Jan 13, 2021 at 10:47 AM Linus Torvalds
wrote:
>
> On Tue, Jan 12, 2021 at 6:43 PM Huang Ying wrote:
> >
> > So in this patch, at the end of wp_page_copy(), the old unused swap
> > cache page will be tried to be freed.
>
> I'd much rather free it later
SwapCached: 1240 kB
AnonPages: 1904 kB
BTW: I think this should be in stable after v5.9.
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: "Huang, Ying"
Cc: Linus Torvalds
Cc: Peter Xu
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Mel Gorman
Hi, Peter,
Huang Ying writes:
> Now, NUMA balancing can only optimize the page placement among the
> NUMA nodes if the default memory policy is used. Because the memory
> policy specified explicitly should take precedence. But this seems
> too strict in some situations.
Signed-off-by: "Huang, Ying"
Cc: "Alejandro Colomar"
---
man2/set_mempolicy.2 | 22 ++
1 file changed, 22 insertions(+)
diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
index 68011eecb..fa64a1820 100644
--- a/man2/set_mempolicy.2
+++ b/man2/set_m
be used before the --membind/-m memory policy in the command
line. With it, the Linux kernel NUMA balancing will be enabled for
the process if --membind/-m is used and the feature is supported by
the kernel.
Signed-off-by: "Huang, Ying"
---
libnuma.c | 14 ++
numa.3
from
node 1 to node 3 after killing the memory eater, and the pmbench score
can increase about 17.5%.
Signed-off-by: "Huang, Ying"
Acked-by: Mel Gorman
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: "Matthew Wilcox (Oracle)"
Cc: Dave Hanse
:
- Rebased on latest upstream (v5.10-rc3)
- Revised the change log.
v2:
- Rebased on latest upstream (v5.10-rc1)
Best Regards,
Huang, Ying
"Alejandro Colomar (mailing lists; readonly)"
writes:
> Hi Huang, Ying,
>
> Sorry I forgot to answer.
> See below.
>
> BTW, Linux 5.10 has been released recently;
> is this series already merged for 5.11?
> If not yet, could you just write '5.??' and we'll
On Fri, 11 Dec 2020, Borislav Petkov wrote:
> [CAUTION: External Email]
>
> On Mon, Dec 07, 2020 at 02:12:26PM +0800, Ying-Tsun Huang wrote:
> > In mtrr_type_lookup, if the input memory address region is not in the
> > MTRR, over 4GB, and not over the top of memory, write-back attribute
> >
"Huang, Ying" writes:
> Peter Zijlstra writes:
>
>> On Wed, Dec 02, 2020 at 11:40:54AM +, Mel Gorman wrote:
>>> On Wed, Dec 02, 2020 at 04:42:32PM +0800, Huang Ying wrote:
>>> > Now, NUMA balancing can only optimize the page placement among the
&g
Hi, Alex,
Sorry for late, I just notice this email today.
"Alejandro Colomar (mailing lists; readonly)"
writes:
> Hi Huang Ying,
>
> Please see a few fixes below.
>
> Michael, as always, some question for you too ;)
>
> Thanks,
>
> Alex
>
> On 12/2/
Hi, Alex,
"Alejandro Colomar (man-pages)" writes:
> Hi Huang Ying,
>
> Please, see a few fixes below.
>
> Thanks,
>
> Alex
>
> On 12/4/20 10:15 AM, Huang Ying wrote:
>> Signed-off-by: "Huang, Ying"
>> ---
>> man2/set_
Peter Zijlstra writes:
> On Wed, Dec 02, 2020 at 11:40:54AM +, Mel Gorman wrote:
>> On Wed, Dec 02, 2020 at 04:42:32PM +0800, Huang Ying wrote:
>> > Now, NUMA balancing can only optimize the page placement among the
>> > NUMA nodes if the default memory policy i
be used before the --membind/-m memory policy in the command
line. With it, the Linux kernel NUMA balancing will be enabled for
the process if --membind/-m is used and the feature is supported by
the kernel.
Signed-off-by: "Huang, Ying"
---
libnuma.c | 14 ++
numa.3
Signed-off-by: "Huang, Ying"
---
man2/set_mempolicy.2 | 14 ++
1 file changed, 14 insertions(+)
diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
index 68011eecb..fb2e6fd96 100644
--- a/man2/set_mempolicy.2
+++ b/man2/set_mempolicy.2
@@ -113,6 +113,15 @@ A no
from
node 1 to node 3 after killing the memory eater, and the pmbench score
can increase about 17.5%.
Signed-off-by: "Huang, Ying"
Acked-by: Mel Gorman
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: "Matthew Wilcox (Oracle)"
Cc: Dave Hanse
the change log.
v2:
- Rebased on latest upstream (v5.10-rc1)
Best Regards,
Huang, Ying
Mel Gorman writes:
> On Wed, Dec 02, 2020 at 04:42:33PM +0800, Huang Ying wrote:
>> Signed-off-by: "Huang, Ying"
>> ---
>> man2/set_mempolicy.2 | 9 +
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/man2/set_mempolicy.2 b/man2/set
Signed-off-by: "Huang, Ying"
---
man2/set_mempolicy.2 | 9 +
1 file changed, 9 insertions(+)
diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
index 68011eecb..3754b3e12 100644
--- a/man2/set_mempolicy.2
+++ b/man2/set_mempolicy.2
@@ -113,6 +113,12 @@ A nonempty
.
es can be migrated from
node 1 to node 3 after killing the memory eater, and the pmbench score
can increase about 17.5%.
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: "Matthew Wilcox (Oracle)"
Cc: Dav
be used before the memory policy options in the command
line. With it, the Linux kernel NUMA balancing will be enabled for
the process if the feature is supported by the kernel.
Signed-off-by: "Huang, Ying"
---
libnuma.c | 14 ++
numa.3| 15 +
, because it's not clear that it's necessary.
v4:
- Use new flags instead of reuse MPOL_MF_LAZY.
v3:
- Rebased on latest upstream (v5.10-rc3)
- Revised the change log.
v2:
- Rebased on latest upstream (v5.10-rc1)
Best Regards,
Huang, Ying
Dave Hansen writes:
> On 11/25/20 9:32 PM, Huang Ying wrote:
>> --- a/man2/set_mempolicy.2
>> +++ b/man2/set_mempolicy.2
>> @@ -113,6 +113,11 @@ A nonempty
>> .I nodemask
>> specifies node IDs that are relative to the set of
>> node IDs allowed by the pr
es can be migrated from
node 1 to node 3 after killing the memory eater, and the pmbench score
can increase about 17.5%.
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: "Matthew Wilcox (Oracle)"
Cc: Dav
, because it's not clear that it's necessary.
v4:
- Use new flags instead of reuse MPOL_MF_LAZY.
v3:
- Rebased on latest upstream (v5.10-rc3)
- Revised the change log.
v2:
- Rebased on latest upstream (v5.10-rc1)
Best Regards,
Huang, Ying
From: Huang Ying
A new API: numa_set_membind_balancing() is added to libnuma. It is
same as numa_set_membind() except that the Linux kernel NUMA balancing
will be enabled for the task if the feature is supported by the
kernel.
At the same time, a new option: --balancing (-b) is added
From: Huang Ying
Signed-off-by: "Huang, Ying"
---
man2/set_mempolicy.2 | 8
1 file changed, 8 insertions(+)
diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
index 68011eecb..fb16bb351 100644
--- a/man2/set_mempolicy.2
+++ b/man2/set_mempolicy.2
@@ -113,6 +113,11 @@
st may
>> become too long in some cases. And the code/algorithm changes that are
>> needed by controlling the length of the purging list is much less than
>> that are needed by merging. So I suggest to do length controlling
>> firstly, then merging. Again, just my 2 cents.
>>
> All such kind of tuning parameters work for one case and does not for
> others. Therefore i prefer to have something more generic that tends
> to improve the things, instead of thinking how to tune parameters to
> cover all test cases and workloads.
It's a new mechanism to control the length of the purging list directly.
So, I don't think that's just parameter tuning. It's just a simple and
direct method. It can work together with merging method to control the
purging latency even if the vmap areas cannot be merged in some cases.
But these cases may not exist in practice, so I will not insist to use
this method.
Best Regards,
Huang, Ying
Do you think so?
>> >>
>> > If we set lazy_max_pages() to vague value such as 100, the performance
>> > will be just destroyed.
>>
>> Sorry, my original words weren't clear enough. What I really want to
>> suggest is to control the length of the purging list instead of reduce
>> lazy_max_pages() directly. That is, we can have a "atomic_t
>> nr_purge_item" to record the length of the purging list and start
>> purging if (vmap_lazy_nr > lazy_max_pages && nr_purge_item >
>> max_purge_item). vmap_lazy_nr is to control the virtual address space,
>> nr_purge_item is to control the batching purging latency. "100" is just
>> an example, the real value should be determined according to the test
>> results.
>>
> OK. Now i see what you meant. Please note, the merging is in place, so
> the list size gets reduced.
Yes. In theory, even with merging, the length of the purging list may
become too long in some cases. And the code/algorithm changes that are
needed by controlling the length of the purging list is much less than
that are needed by merging. So I suggest to do length controlling
firstly, then merging. Again, just my 2 cents.
Best Regards,
Huang, Ying
Uladzislau Rezki writes:
> On Thu, Nov 19, 2020 at 09:40:29AM +0800, Huang, Ying wrote:
>> Uladzislau Rezki writes:
>>
>> > On Wed, Nov 18, 2020 at 10:44:13AM +0800, huang ying wrote:
>> >> On Tue, Nov 17, 2020 at 9:04 PM Uladzislau Rezki wrote:
>>
Mel Gorman writes:
> On Thu, Nov 19, 2020 at 02:17:21PM +0800, Huang, Ying wrote:
>> >> Various page placement optimization based on the NUMA balancing can be
>> >> done with these flags. As the first step, in this patch, if the
>> >> memory of the
Mel Gorman writes:
> On Wed, Nov 18, 2020 at 01:19:52PM +0800, Huang Ying wrote:
>> Now, AutoNUMA can only optimize the page placement among the NUMA
>
> Note that the feature is referred to as NUMA_BALANCING in the kernel
> configs as AUTONUMA as it was first presen
Uladzislau Rezki writes:
> On Wed, Nov 18, 2020 at 10:44:13AM +0800, huang ying wrote:
>> On Tue, Nov 17, 2020 at 9:04 PM Uladzislau Rezki wrote:
>> >
>> > On Tue, Nov 17, 2020 at 10:37:34AM +0800, huang ying wrote:
>> > > On Tue, Nov 17, 2020 at 6:00 A
AutoNUMA
for a specific memory area inside an application, so we only add the
flag at the thread level (set_mempolicy()) instead of the memory area
level (mbind()). We can do that when it become necessary.
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Mel Gorman
C
On Tue, Nov 17, 2020 at 9:04 PM Uladzislau Rezki wrote:
>
> On Tue, Nov 17, 2020 at 10:37:34AM +0800, huang ying wrote:
> > On Tue, Nov 17, 2020 at 6:00 AM Uladzislau Rezki (Sony)
> > wrote:
> > >
> > > A current "lazy drain" model suffers f
_vmap_area_lazy() as
follows,
if (atomic_long_read(_lazy_nr) < resched_threshold)
cond_resched_lock(_vmap_area_lock);
If it works properly, the latency problem can be solved. Can you
check whether this doesn't work for you?
Best Reagrds,
Huang, Ying
U(page) ||
> !get_page_unless_zero(page))
> return NULL;
>
> - pgdat = page_pgdat(page);
> - spin_lock_irq(>lru_lock);
get_page_unless_zero() is a full memory barrier. But do we need a
compiler barrier here to prevent the compiler to cache Pag
Hi, Mel,
Mel Gorman writes:
> On Wed, Nov 04, 2020 at 01:36:58PM +0800, Huang, Ying wrote:
>> > I've no specific objection to the patch or the name change. I can't
>> > remember exactly why I picked the name, it was 8 years ago but I think it
>> > was because t
it seems not
a good API/ABI for the purpose of the patch.
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: "Matthew Wilcox (Oracle)"
Cc: Dave Hansen
Cc: Andi Kleen
Cc: Michal Hocko
Cc: David Rientje
.]
Signed-off-by: "Huang, Ying"
Acked-by: Mel Gorman
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: "Matthew Wilcox (Oracle)"
Cc: Dave Hansen
Cc: Andi Kleen
Cc: Michal Hocko
Cc: David Rientjes
---
mm/mempolicy.c | 17 +++--
1 file ch
To make it possible to optimize cross-socket memory accessing with
AutoNUMA even if the memory of the application is bound to multiple
NUMA nodes.
Changes:
v3:
- Rebased on latest upstream (v5.10-rc3)
- Revised the change log.
v2:
- Rebased on latest upstream (v5.10-rc1)
Huang Ying (2
. The flag is upper case with prefix, so
it looks generally OK by itself. But in the following patch, we will
introduce a label named after the flag, which is lower case and
without prefix, so it's better to rename it.
Signed-off-by: "Huang, Ying"
Suggested-by: "Matthew Wilcox (
Mel Gorman writes:
> On Wed, Nov 04, 2020 at 01:36:58PM +0800, Huang, Ying wrote:
>> But from another point of view, I suggest to remove the constraints of
>> MPOL_F_MOF in the future. If the overhead of AutoNUMA isn't acceptable,
>> why not just disable AutoNUMA glo
performance of PMEM is much worse than that of DRAM. If we found
that some pages on PMEM are accessed frequently (hot), we may want to
move them to DRAM to optimize the system performance. If the unmovable
pages are allocated on PMEM and hot, it's possible that we cannot move
the pages to DRAM unless rebooting the system. So we think we should
make the PMEM nodes to be MOVABLE only.
Best Regards,
Huang, Ying
Hi, Mel,
Thanks for comments!
Mel Gorman writes:
> On Wed, Oct 28, 2020 at 10:34:11AM +0800, Huang Ying wrote:
>> Now, AutoNUMA can only optimize the page placement among the NUMA nodes if
>> the
>> default memory policy is used. Because the memory policy specified
&g
Michal Hocko writes:
> On Fri 30-10-20 15:27:51, Huang, Ying wrote:
>> Michal Hocko writes:
>>
>> > On Wed 28-10-20 10:34:10, Huang Ying wrote:
>> >> To follow code-of-conduct better.
>> >
>> > This is changing a user visible interface an
Michal Hocko writes:
> On Wed 28-10-20 10:34:10, Huang Ying wrote:
>> To follow code-of-conduct better.
>
> This is changing a user visible interface and any userspace which refers
> to the existing name will fail to compile unless I am missing something.
Although these flags
MB/s to 105.9
MB/s) with the patch, while the benchmark score decreases only 3.3%.
A new sysctl knob kernel.numa_balancing_rate_limit_mbps is added for
the users to specify the limit.
TODO: Add ABI document for new sysctl knob.
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Michal
To distinguish the number of promotion from the original inter-socket
NUMA balancing migration. The counter is per-node (target node).
This is to identify imbalance among NUMA nodes.
Signed-off-by: "Huang, Ying"
---
include/linux/mmzone.h | 1 +
mm/migrate.c | 10
ocumentation/sysctl/kernel.txt
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dave Hansen
Cc: Dan Williams
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
include/linux/sched/
. So that the page faults could be avoided
too.
In the test, if only the memory tiering AutoNUMA mode is enabled, the
number of the AutoNUMA hint faults for the DRAM node is reduced to
almost 0 with the patch. While the benchmark score doesn't change
visibly.
Signed-off-by: "Huang,
ant for system performance, the
administrator can set a higher hot threshold.
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dave Hansen
Cc: Dan Williams
Cc: linux-kernel@vger.kernel.
NUMA page migrations on a 2 socket Intel server
with Optance DC Persistent Memory. Because it improves the accuracy
of the hot page selection.
Signed-off-by: "Huang, Ying"
Cc: Andrew Morton
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: D
ssed comments for V1.
- Rebased on v5.5.
Huang Ying (6):
autonuma: Optimize page placement for memory tiering system
autonuma, memory tiering: Skip to scan fast memory
autonuma, memory tiering: Hot page selection with hint page fault latency
autonuma, memory tiering: Rate limit NUMA
David Rientjes writes:
> On Tue, 20 Oct 2020, Huang, Ying wrote:
>
>> >> =
>> >> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
>>
t memory access latency at the hardware level when
> running on a NUMA system.
So you think it's better to bind processes to NUMA node or CPU? But we
want to use this test case to capture NUMA/CPU placement/balance issue
too.
0day solve the problem in another way. We run the test case
multiple-times and calculate the average and standard deviation, then
compare.
For this specific regression, I found something strange,
10.93 ± 15% +10.8 21.78 ± 10%
perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.shmem_getpage_gfp.shmem_fault
It appears the lock contention becomes heavier with the patch. But I
cannot understand why too.
Best Regards,
Huang, Ying
t; /proc/vmstat.
>
> [ daveh:
>- __count_vm_events() a bit, and made them look at the THP
> size directly rather than getting data from migrate_pages()
It appears that we get the data from migrate_pages() now.
> ]
>
> Signed-off-by: Yang Shi
> Signed-off-by: Dave Hansen
> Cc: Dav
Matthew Wilcox writes:
> On Fri, Oct 09, 2020 at 03:36:47PM +0800, Huang, Ying wrote:
>> +if (PageSwapCache(head)) {
>> +swp_entry_t entry = { .val = page_private(head) };
>> +
>> +split_swap_cluster(entry);
>> +}
> ...
&g
101 - 200 of 3349 matches
Mail list logo