Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct

2018-03-27 Thread Yang Shi
On 3/27/18 3:32 AM, Cyrill Gorcunov wrote: On Mon, Mar 26, 2018 at 05:59:49PM -0400, Yang Shi wrote: Say we've two syscalls running prctl_set_mm_map in parallel, and imagine one have @start_brk = 20 @brk = 10 and second caller has @start_brk = 30 and @brk = 20. Since now the call is gu

[QUESTION] About VM_LOCKONFAULT for file page

2018-06-04 Thread Yang Shi
Hi folks, I did a quick test with mlock2 + VM_LOCKONFAULT flag. The test just does an 1MB anonymous map and 1MB file map with VM_LOCKONFAULT respectively. Then it tries to access one page of each mapping. From /proc/meminfo, I can see 1 page marked mlocked from anonymous mapping. But, the

[RFC v5 PATCH] mm: shmem: make stat.st_blksize return huge page size if THP is on

2018-04-25 Thread Yang Shi
SIZE. Signed-off-by: Yang Shi Cc: "Kirill A. Shutemov" Cc: Hugh Dickins Cc: Michal Hocko Cc: Alexander Viro Suggested-by: Christoph Hellwig --- v4 --> v5: * Adopted suggestion from Kirill to use IS_ENABLED and check 'force' and 'deny'. Extracted the condition

Re: [v2 PATCH 7/9] mm: vmscan: check if the demote target node is contended or not

2019-04-15 Thread Yang Shi
On 4/11/19 9:06 AM, Dave Hansen wrote: On 4/10/19 8:56 PM, Yang Shi wrote: When demoting to PMEM node, the target node may have memory pressure, then the memory pressure may cause migrate_pages() fail. If the failure is caused by memory pressure (i.e. returning -ENOMEM), tag the node with

Re: [v2 PATCH 5/9] mm: vmscan: demote anon DRAM pages to PMEM node

2019-04-15 Thread Yang Shi
On 4/11/19 7:31 AM, Dave Hansen wrote: On 4/10/19 8:56 PM, Yang Shi wrote: include/linux/gfp.h| 12 include/linux/migrate.h| 1 + include/trace/events/migrate.h | 3 +- mm/debug.c | 1 + mm/internal.h | 13 + mm

Re: [v2 PATCH 7/9] mm: vmscan: check if the demote target node is contended or not

2019-04-15 Thread Yang Shi
On 4/15/19 3:13 PM, Dave Hansen wrote: On 4/15/19 3:06 PM, Yang Shi wrote: This seems like an actively bad idea to me. Why do we need an *active* note to say the node is contended?  Why isn't just getting a failure back from migrate_pages() enough?  Have you observed this in practice?

Re: [v2 PATCH 5/9] mm: vmscan: demote anon DRAM pages to PMEM node

2019-04-15 Thread Yang Shi
On 4/15/19 3:14 PM, Dave Hansen wrote: On 4/15/19 3:10 PM, Yang Shi wrote: Also, I don't see anything in the code tying this to strictly demote from DRAM to PMEM.  Is that the end effect, or is it really implemented that way and I missed it? No, not restrict to PMEM. It just tries to d

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-15 Thread Yang Shi
On 4/12/19 1:47 AM, Michal Hocko wrote: On Thu 11-04-19 11:56:50, Yang Shi wrote: [...] Design == Basically, the approach is aimed to spread data from DRAM (closest to local CPU) down further to PMEM and disk (typically assume the lower tier storage is slower, larger and cheaper than the

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-16 Thread Yang Shi
On 4/16/19 12:47 AM, Michal Hocko wrote: On Mon 15-04-19 17:09:07, Yang Shi wrote: On 4/12/19 1:47 AM, Michal Hocko wrote: On Thu 11-04-19 11:56:50, Yang Shi wrote: [...] Design == Basically, the approach is aimed to spread data from DRAM (closest to local CPU) down further to PMEM

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-16 Thread Yang Shi
On 4/16/19 2:22 PM, Dave Hansen wrote: On 4/16/19 12:19 PM, Yang Shi wrote: would we prefer to try all the nodes in the fallback order to find the first less contended one (i.e. DRAM0 -> PMEM0 -> DRAM1 -> PMEM1 -> Swap)? Once a page went to DRAM1, how would we tell that it o

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-16 Thread Yang Shi
On 4/16/19 4:04 PM, Dave Hansen wrote: On 4/16/19 2:59 PM, Yang Shi wrote: On 4/16/19 2:22 PM, Dave Hansen wrote: Keith Busch had a set of patches to let you specify the demotion order via sysfs for fun.  The rules we came up with were: 1. Pages keep no history of where they have been 2

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-16 Thread Yang Shi
Why cannot we start simple and build from there? In other words I do not think we really need anything like N_CPU_MEM at all. In this patchset N_CPU_MEM is used to tell us what nodes are cpuless nodes. They would be the preferred demotion target.  Of course, we could rely on firmware to just

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-17 Thread Yang Shi
On 4/17/19 9:39 AM, Michal Hocko wrote: On Wed 17-04-19 09:37:39, Keith Busch wrote: On Wed, Apr 17, 2019 at 05:39:23PM +0200, Michal Hocko wrote: On Wed 17-04-19 09:23:46, Keith Busch wrote: On Wed, Apr 17, 2019 at 11:23:18AM +0200, Michal Hocko wrote: On Tue 16-04-19 14:22:33, Dave Hanse

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-17 Thread Yang Shi
I would also not touch the numa balancing logic at this stage and rather see how the current implementation behaves. I agree we would prefer start from something simpler and see how it works. The "twice access" optimization is aimed to reduce the PMEM bandwidth burden since the bandwidth

[QUESTIONS] THP allocation in NUMA fault migration path

2019-04-17 Thread Yang Shi
Hi folks, I noticed that there might be new THP allocation in NUMA fault migration path (migrate_misplaced_transhuge_page()) even when THP is disabled (set to "never"). When THP is set to "never", there should be not any new THP allocation, but the migration path is kind of special. So I'm no

Re: [QUESTIONS] THP allocation in NUMA fault migration path

2019-04-18 Thread Yang Shi
On 4/17/19 11:32 PM, Michal Hocko wrote: On Wed 17-04-19 21:15:41, Yang Shi wrote: Hi folks, I noticed that there might be new THP allocation in NUMA fault migration path (migrate_misplaced_transhuge_page()) even when THP is disabled (set to "never"). When THP is set to &quo

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-18 Thread Yang Shi
On 4/17/19 10:51 AM, Michal Hocko wrote: On Wed 17-04-19 10:26:05, Yang Shi wrote: On 4/17/19 9:39 AM, Michal Hocko wrote: On Wed 17-04-19 09:37:39, Keith Busch wrote: On Wed, Apr 17, 2019 at 05:39:23PM +0200, Michal Hocko wrote: On Wed 17-04-19 09:23:46, Keith Busch wrote: On Wed, Apr

Re: [PATCH] mm: use mm.arg_lock in get_cmdline()

2019-04-18 Thread Yang Shi
re. While reading the code, I found that this new spinlock was not used in get_cmdline() to protect access to these fields. Fixing this even if there is no issue reported yet for this. Fixes: 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct")

Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility

2019-04-23 Thread Yang Shi
On 4/23/19 11:34 AM, Yang Shi wrote: On 4/23/19 10:52 AM, Michal Hocko wrote: On Wed 24-04-19 00:43:01, Yang Shi wrote: The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma") introduced THPeligible bit for processes' smaps. But, when checking the

[PATCH] mm: filemap: correct the comment about VM_FAULT_RETRY

2019-04-25 Thread Yang Shi
The commit 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") changed when mmap_sem is dropped during filemap page fault and when returning VM_FAULT_RETRY. Correct the comment to reflect the change. Cc: Josef Bacik Signed-off-by: Yang Shi --- mm/filemap.c | 6

Re: [RFC PATCH] mm: vmscan: do not iterate all mem cgroups for global direct reclaim

2019-01-25 Thread Yang Shi
On 1/24/19 12:43 AM, Michal Hocko wrote: On Wed 23-01-19 12:24:38, Yang Shi wrote: On 1/23/19 1:59 AM, Michal Hocko wrote: On Wed 23-01-19 04:09:42, Yang Shi wrote: In current implementation, both kswapd and direct reclaim has to iterate all mem cgroups. It is not a problem before

Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node

2019-03-26 Thread Yang Shi
On 3/26/19 6:58 AM, Michal Hocko wrote: On Sat 23-03-19 12:44:25, Yang Shi wrote: With Dave Hansen's patches merged into Linus's tree https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c221c0b0308fd01d9fb33a16f64d2fd95f8830a4 PMEM could be hot plugg

Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node

2019-03-26 Thread Yang Shi
On 3/26/19 11:37 AM, Michal Hocko wrote: On Tue 26-03-19 11:33:17, Yang Shi wrote: On 3/26/19 6:58 AM, Michal Hocko wrote: On Sat 23-03-19 12:44:25, Yang Shi wrote: With Dave Hansen's patches merged into Linus's tree https://git.kernel.org/pub/scm/linux/kernel/git/torvalds

Re: [PATCH 06/10] mm: vmscan: demote anon DRAM pages to PMEM node

2019-03-26 Thread Yang Shi
On 3/26/19 5:35 PM, Keith Busch wrote: On Mon, Mar 25, 2019 at 12:49:21PM -0700, Yang Shi wrote: On 3/24/19 3:20 PM, Keith Busch wrote: How do these pages eventually get to swap when migration fails? Looks like that's skipped. Yes, they will be just put back to LRU. Actually, I

Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node

2019-03-27 Thread Yang Shi
On 3/27/19 10:34 AM, Dan Williams wrote: On Wed, Mar 27, 2019 at 2:01 AM Michal Hocko wrote: On Tue 26-03-19 19:58:56, Yang Shi wrote: On 3/26/19 11:37 AM, Michal Hocko wrote: On Tue 26-03-19 11:33:17, Yang Shi wrote: On 3/26/19 6:58 AM, Michal Hocko wrote: On Sat 23-03-19 12:44:25

Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node

2019-03-27 Thread Yang Shi
On 3/27/19 1:09 PM, Michal Hocko wrote: On Wed 27-03-19 11:59:28, Yang Shi wrote: On 3/27/19 10:34 AM, Dan Williams wrote: On Wed, Mar 27, 2019 at 2:01 AM Michal Hocko wrote: On Tue 26-03-19 19:58:56, Yang Shi wrote: [...] It is still NUMA, users still can see all the NUMA nodes. No

[v2 PATCH 3/9] mm: numa: promote pages to DRAM when it gets accessed twice

2019-04-10 Thread Yang Shi
accurately. Signed-off-by: Yang Shi --- mm/huge_memory.c | 11 ++ mm/internal.h| 80 ++ mm/memory.c | 21 ++ mm/vmscan.c | 116 --- 4 files changed, 146 insertions(+), 82 deletions

[v2 PATCH 4/9] mm: migrate: make migrate_pages() return nr_succeeded

2019-04-10 Thread Yang Shi
pages are reclaimed (demoted) since page reclaim behavior depends on this. Add *nr_succeeded parameter to make migrate_pages() return how many pages are demoted successfully for all cases. Signed-off-by: Yang Shi --- include/linux/migrate.h | 5 +++-- mm/compaction.c | 3 ++- mm/gup.c

[v2 PATCH 2/9] mm: page_alloc: make find_next_best_node find return cpuless node

2019-04-10 Thread Yang Shi
Need find the cloest cpuless node to demote DRAM pages. Add "cpuless" parameter to find_next_best_node() to skip DRAM node on demand. Signed-off-by: Yang Shi --- mm/internal.h | 11 +++ mm/page_alloc.c | 14 ++ 2 files changed, 21 insertions(+), 4 deletions(-)

[v2 PATCH 1/9] mm: define N_CPU_MEM node states

2019-04-10 Thread Yang Shi
N_CPU_MEMORY node states. The nodes with both CPUs and memory are called "primary" nodes. /sys/devices/system/node/primary would show the current online "primary" nodes. Signed-off-by: Yang Shi --- drivers/base/node.c | 2 ++ include/linux/nodemask.h | 3 ++- mm/memory

[v2 PATCH 8/9] mm: vmscan: add page demotion counter

2019-04-10 Thread Yang Shi
Account the number of demoted pages into reclaim_state->nr_demoted. Add pgdemote_kswapd and pgdemote_direct VM counters showed in /proc/vmstat. Signed-off-by: Yang Shi --- include/linux/vm_event_item.h | 2 ++ include/linux/vmstat.h| 1 + mm/internal.h | 1 +

[v2 PATCH 5/9] mm: vmscan: demote anon DRAM pages to PMEM node

2019-04-10 Thread Yang Shi
And, define a new migration reason for demotion, called MR_DEMOTE. Demote page via async migration to avoid blocking. Signed-off-by: Yang Shi --- include/linux/gfp.h| 12 include/linux/migrate.h| 1 + include/trace/events/migrate.h | 3 +- mm/debug.c

[v2 PATCH 9/9] mm: numa: add page promotion counter

2019-04-10 Thread Yang Shi
Add counter for page promotion for NUMA balancing. Signed-off-by: Yang Shi --- include/linux/vm_event_item.h | 1 + mm/huge_memory.c | 4 mm/memory.c | 4 mm/vmstat.c | 1 + 4 files changed, 10 insertions(+) diff --git a/include/linux

[v2 PATCH 7/9] mm: vmscan: check if the demote target node is contended or not

2019-04-10 Thread Yang Shi
. Check if the target node is PGDAT_CONTENDED or not, if it is just skip demotion. Signed-off-by: Yang Shi --- include/linux/mmzone.h | 3 +++ mm/vmscan.c| 28 2 files changed, 31 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h

[v2 PATCH 6/9] mm: vmscan: don't demote for memcg reclaim

2019-04-10 Thread Yang Shi
The memcg reclaim happens when the limit is breached, but demotion just migrate pages to the other node instead of reclaiming them. This sounds pointless to memcg reclaim since the usage is not reduced at all. Signed-off-by: Yang Shi --- mm/vmscan.c | 38

[v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-10 Thread Yang Shi
ace kernel pages (i.e. page table, slabs, etc) on DRAM only. [1]: https://lore.kernel.org/linux-mm/20181226131446.330864...@intel.com/ [2]: https://lore.kernel.org/linux-mm/20190321200157.29678-1-keith.bu...@intel.com/ [3]: https://lore.kernel.org/linux-mm/20190404071312.gd12...@dhcp22.suse.cz

Re: [v7 PATCH 05/12] mm: memcontrol: rename shrinker_map to shrinker_info

2021-02-11 Thread Yang Shi
On Thu, Feb 11, 2021 at 8:47 AM Kirill Tkhai wrote: > > On 10.02.2021 02:33, Yang Shi wrote: > > On Tue, Feb 9, 2021 at 12:50 PM Roman Gushchin wrote: > >> > >> On Tue, Feb 09, 2021 at 09:46:39AM -0800, Yang Shi wrote: > >>> The following patch is going

Re: [v7 PATCH 12/12] mm: vmscan: shrink deferred objects proportional to priority

2021-02-11 Thread Yang Shi
On Thu, Feb 11, 2021 at 5:10 AM Vlastimil Babka wrote: > > On 2/9/21 6:46 PM, Yang Shi wrote: > > The number of deferred objects might get windup to an absurd number, and it > > results in clamp of slab objects. It is undesirable for sustaining > > workingset. > >

Re: [v7 PATCH 12/12] mm: vmscan: shrink deferred objects proportional to priority

2021-02-11 Thread Yang Shi
On Thu, Feb 11, 2021 at 10:52 AM Vlastimil Babka wrote: > > On 2/11/21 6:29 PM, Yang Shi wrote: > > On Thu, Feb 11, 2021 at 5:10 AM Vlastimil Babka wrote: > >> > trace_mm_shrink_slab_start(shrinker, shrinkctl, nr, > >> >

Re: [v7 PATCH 03/12] mm: vmscan: use shrinker_rwsem to protect shrinker_maps allocation

2021-02-09 Thread Yang Shi
On Tue, Feb 9, 2021 at 12:33 PM Roman Gushchin wrote: > > On Tue, Feb 09, 2021 at 09:46:37AM -0800, Yang Shi wrote: > > Since memcg_shrinker_map_size just can be changed under holding > > shrinker_rwsem > > exclusively, the read side can be protected by holding read

Re: [v7 PATCH 04/12] mm: vmscan: remove memcg_shrinker_map_size

2021-02-09 Thread Yang Shi
On Tue, Feb 9, 2021 at 12:43 PM Roman Gushchin wrote: > > On Tue, Feb 09, 2021 at 09:46:38AM -0800, Yang Shi wrote: > > Both memcg_shrinker_map_size and shrinker_nr_max is maintained, but > > actually the > > map size can be calculated via shrinker_nr_max, so it seems

Re: [v7 PATCH 05/12] mm: memcontrol: rename shrinker_map to shrinker_info

2021-02-09 Thread Yang Shi
On Tue, Feb 9, 2021 at 12:50 PM Roman Gushchin wrote: > > On Tue, Feb 09, 2021 at 09:46:39AM -0800, Yang Shi wrote: > > The following patch is going to add nr_deferred into shrinker_map, the > > change will > > make shrinker_map not only include map

Re: [v7 PATCH 06/12] mm: vmscan: add shrinker_info_protected() helper

2021-02-09 Thread Yang Shi
On Tue, Feb 9, 2021 at 4:22 PM Roman Gushchin wrote: > > On Tue, Feb 09, 2021 at 09:46:40AM -0800, Yang Shi wrote: > > The shrinker_info is dereferenced in a couple of places via > > rcu_dereference_protected > > with different calling conventions, for example, u

Re: [v7 PATCH 07/12] mm: vmscan: use a new flag to indicate shrinker is registered

2021-02-09 Thread Yang Shi
On Tue, Feb 9, 2021 at 4:39 PM Roman Gushchin wrote: > > On Tue, Feb 09, 2021 at 09:46:41AM -0800, Yang Shi wrote: > > Currently registered shrinker is indicated by non-NULL > > shrinker->nr_deferred. > > This approach is fine with nr_deferred at the shrinker le

Re: [v7 PATCH 08/12] mm: vmscan: add per memcg shrinker nr_deferred

2021-02-09 Thread Yang Shi
On Tue, Feb 9, 2021 at 5:10 PM Roman Gushchin wrote: > > On Tue, Feb 09, 2021 at 09:46:42AM -0800, Yang Shi wrote: > > Currently the number of deferred objects are per shrinker, but some slabs, > > for example, > > vfs inode/dentry cache are per memcg, this would

Re: [v7 PATCH 09/12] mm: vmscan: use per memcg nr_deferred of shrinker

2021-02-09 Thread Yang Shi
On Tue, Feb 9, 2021 at 5:27 PM Roman Gushchin wrote: > > On Tue, Feb 09, 2021 at 09:46:43AM -0800, Yang Shi wrote: > > Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's > > nr_deferred > > will be used in the following cases: > >

Re: [v7 PATCH 07/12] mm: vmscan: use a new flag to indicate shrinker is registered

2021-02-09 Thread Yang Shi
On Tue, Feb 9, 2021 at 5:34 PM Roman Gushchin wrote: > > On Tue, Feb 09, 2021 at 05:12:51PM -0800, Yang Shi wrote: > > On Tue, Feb 9, 2021 at 4:39 PM Roman Gushchin wrote: > > > > > > On Tue, Feb 09, 2021 at 09:46:41AM -0800, Yang Shi wrote: > > > > Cur

Re: [v7 PATCH 08/12] mm: vmscan: add per memcg shrinker nr_deferred

2021-02-09 Thread Yang Shi
On Tue, Feb 9, 2021 at 5:40 PM Roman Gushchin wrote: > > On Tue, Feb 09, 2021 at 05:25:16PM -0800, Yang Shi wrote: > > On Tue, Feb 9, 2021 at 5:10 PM Roman Gushchin wrote: > > > > > > On Tue, Feb 09, 2021 at 09:46:42AM -0800, Yang Shi wrote: > > > > Cu

Re: [v7 PATCH 09/12] mm: vmscan: use per memcg nr_deferred of shrinker

2021-02-10 Thread Yang Shi
On Wed, Feb 10, 2021 at 6:37 AM Kirill Tkhai wrote: > > On 10.02.2021 04:52, Yang Shi wrote: > > On Tue, Feb 9, 2021 at 5:27 PM Roman Gushchin wrote: > >> > >> On Tue, Feb 09, 2021 at 09:46:43AM -0800, Yang Shi wrote: > >>> Use per memcg's nr_defe

Re: [v7 PATCH 01/12] mm: vmscan: use nid from shrink_control for tracepoint

2021-02-10 Thread Yang Shi
On Tue, Feb 9, 2021 at 11:14 AM Shakeel Butt wrote: > > On Tue, Feb 9, 2021 at 9:47 AM Yang Shi wrote: > > > > The tracepoint's nid should show what node the shrink happens on, the start > > tracepoint > > uses nid from shrinkctl, but the nid might be set to

Re: [v7 PATCH 07/12] mm: vmscan: use a new flag to indicate shrinker is registered

2021-02-10 Thread Yang Shi
On Tue, Feb 9, 2021 at 4:39 PM Roman Gushchin wrote: > > On Tue, Feb 09, 2021 at 09:46:41AM -0800, Yang Shi wrote: > > Currently registered shrinker is indicated by non-NULL > > shrinker->nr_deferred. > > This approach is fine with nr_deferred at the shrinker le

Re: [RFC][PATCH 05/13] mm/numa: automatically generate node migration order

2021-02-02 Thread Yang Shi
On Mon, Feb 1, 2021 at 11:13 AM Dave Hansen wrote: > > On 1/29/21 12:46 PM, Yang Shi wrote: > ... > >> int next_demotion_node(int node) > >> { > >> - return node_demotion[node]; > >> + /* > >> +* node_demotion[] is update

Re: [RFC][PATCH 08/13] mm/migrate: demote pages during reclaim

2021-02-02 Thread Yang Shi
On Mon, Jan 25, 2021 at 4:41 PM Dave Hansen wrote: > > > From: Dave Hansen > > This is mostly derived from a patch from Yang Shi: > > > https://lore.kernel.org/linux-mm/1560468577-101178-10-git-send-email-yang@linux.alibaba.com/ > > Add code to the

Re: [RFC][PATCH 11/13] mm/vmscan: Consider anonymous pages without swap

2021-02-02 Thread Yang Shi
ibility of future reclaim. > > #Signed-off-by: Keith Busch > Cc: Keith Busch > [vishal: fixup the migration->demotion rename] > Signed-off-by: Vishal Verma > Signed-off-by: Dave Hansen > Cc: Yang Shi > Cc: David Rientjes > Cc: Huang Ying > Cc: Dan Williams

Re: [RFC][PATCH 11/13] mm/vmscan: Consider anonymous pages without swap

2021-02-02 Thread Yang Shi
On Tue, Feb 2, 2021 at 1:35 PM Dave Hansen wrote: > > On 2/2/21 10:56 AM, Yang Shi wrote: > >> > >> /* If we have no swap space, do not bother scanning anon pages. */ > >> - if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) &l

Re: [RFC][PATCH 08/13] mm/migrate: demote pages during reclaim

2021-02-02 Thread Yang Shi
On Tue, Feb 2, 2021 at 3:55 AM Oscar Salvador wrote: > > On Mon, Jan 25, 2021 at 04:34:27PM -0800, Dave Hansen wrote: > > > > From: Dave Hansen > > > > This is mostly derived from a patch from Yang Shi: > > > > > > https://lore.kernel.org/l

[v6 PATCH 0/11] Make shrinker's nr_deferred memcg aware

2021-02-03 Thread Yang Shi
ytes. 10K memcgs would need ~3.2MB memory. It seems fine. We have been running the patched kernel on some hosts of our fleet (test and production) for months, it works very well. The monitor data shows the working set is sustained as expected. Yang Shi (11): mm: vmscan: use nid from shrink

[v6 PATCH 01/11] mm: vmscan: use nid from shrink_control for tracepoint

2021-02-03 Thread Yang Shi
. It seems confusing. And the following patch will remove using nid directly in do_shrink_slab(), this patch also helps cleanup the code. Acked-by: Vlastimil Babka Signed-off-by: Yang Shi --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vms

[v6 PATCH 05/11] mm: memcontrol: rename shrinker_map to shrinker_info

2021-02-03 Thread Yang Shi
he "memcg_" prefix. Acked-by: Vlastimil Babka Signed-off-by: Yang Shi --- include/linux/memcontrol.h | 8 ++--- mm/memcontrol.c| 6 ++-- mm/vmscan.c| 62 +++--- 3 files changed, 38 insertions(+), 38 deletions(-) diff -

[v6 PATCH 04/11] mm: vmscan: remove memcg_shrinker_map_size

2021-02-03 Thread Yang Shi
Both memcg_shrinker_map_size and shrinker_nr_max is maintained, but actually the map size can be calculated via shrinker_nr_max, so it seems unnecessary to keep both. Remove memcg_shrinker_map_size since shrinker_nr_max is also used by iterating the bit map. Signed-off-by: Yang Shi --- mm

[v6 PATCH 06/11] mm: vmscan: use a new flag to indicate shrinker is registered

2021-02-03 Thread Yang Shi
This would prevent the shrinkers from unregistering correctly. Remove SHRINKER_REGISTERING since we could check if shrinker is registered successfully by the new flag. Signed-off-by: Yang Shi --- include/linux/shrinker.h | 7 --- mm/vmscan.c | 31 +-- 2 fi

[v6 PATCH 02/11] mm: vmscan: consolidate shrinker_maps handling code

2021-02-03 Thread Yang Shi
can.c for tighter integration with shrinker code, and remove the "memcg_" prefix. There is no functional change. Acked-by: Vlastimil Babka Signed-off-by: Yang Shi --- include/linux/memcontrol.h | 11 ++-- mm/huge_memory.c | 4 +- mm/list_lru.c | 6 +-

[v6 PATCH 03/11] mm: vmscan: use shrinker_rwsem to protect shrinker_maps allocation

2021-02-03 Thread Yang Shi
larity. And a test with heavy paging workload didn't show write lock makes things worse. Acked-by: Vlastimil Babka Signed-off-by: Yang Shi --- mm/vmscan.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 96b08c79f18d..e

[v6 PATCH 09/11] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers

2021-02-03 Thread Yang Shi
rinker's SHRINKER_MEMCG_AWARE flag would be cleared. This makes the implementation of this patch simpler. Acked-by: Vlastimil Babka Signed-off-by: Yang Shi --- mm/vmscan.c | 31 --- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index

[v6 PATCH 08/11] mm: vmscan: use per memcg nr_deferred of shrinker

2021-02-03 Thread Yang Shi
Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's nr_deferred will be used in the following cases: 1. Non memcg aware shrinkers 2. !CONFIG_MEMCG 3. memcg is disabled by boot parameter Signed-off-by: Yang Shi --- mm/vms

[v6 PATCH 11/11] mm: vmscan: shrink deferred objects proportional to priority

2021-02-03 Thread Yang Shi
x27;s patch: https://lore.kernel.org/linux-xfs/20191031234618.15403-13-da...@fromorbit.com/ Tested with kernel build and vfs metadata heavy workload in our production environment, no regression is spotted so far. Signed-off-by: Yang Shi --- mm/vmscan.c | 40 +-

[v6 PATCH 07/11] mm: vmscan: add per memcg shrinker nr_deferred

2021-02-03 Thread Yang Shi
ed all the time. Signed-off-by: Yang Shi --- include/linux/memcontrol.h | 7 +++--- mm/vmscan.c| 45 -- 2 files changed, 33 insertions(+), 19 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 4c92538

[v6 PATCH 10/11] mm: memcontrol: reparent nr_deferred when memcg offline

2021-02-03 Thread Yang Shi
Now shrinker's nr_deferred is per memcg for memcg aware shrinkers, add to parent's corresponding nr_deferred when memcg offline. Acked-by: Vlastimil Babka Signed-off-by: Yang Shi --- include/linux/memcontrol.h | 1 + mm/memcontrol.c| 1 + mm/vmscan.c

Re: [RFC][PATCH 05/13] mm/numa: automatically generate node migration order

2021-02-03 Thread Yang Shi
On Tue, Feb 2, 2021 at 4:43 PM Dave Hansen wrote: > > On 2/2/21 9:46 AM, Yang Shi wrote: > > On Mon, Feb 1, 2021 at 11:13 AM Dave Hansen wrote: > >> On 1/29/21 12:46 PM, Yang Shi wrote: > >> ... > >>>> int next_demotion_node(int node) >

Re: [v5 PATCH 02/11] mm: vmscan: consolidate shrinker_maps handling code

2021-01-28 Thread Yang Shi
On Thu, Jan 28, 2021 at 8:10 AM Vlastimil Babka wrote: > > On 1/28/21 12:33 AM, Yang Shi wrote: > > The shrinker map management is not purely memcg specific, it is at the > > intersection > > between memory cgroup and shrinkers. It's allocation and assignment of

Re: [v5 PATCH 04/11] mm: vmscan: remove memcg_shrinker_map_size

2021-01-28 Thread Yang Shi
On Thu, Jan 28, 2021 at 8:53 AM Vlastimil Babka wrote: > > On 1/28/21 12:33 AM, Yang Shi wrote: > > Both memcg_shrinker_map_size and shrinker_nr_max is maintained, but > > actually the > > map size can be calculated via shrinker_nr_max, so it seems unnecessary to &

Re: [v5 PATCH 05/11] mm: memcontrol: rename shrinker_map to shrinker_info

2021-01-28 Thread Yang Shi
On Thu, Jan 28, 2021 at 9:38 AM Vlastimil Babka wrote: > > On 1/28/21 12:33 AM, Yang Shi wrote: > > The following patch is going to add nr_deferred into shrinker_map, the > > change will > > make shrinker_map not only include map anymore, so rename it to a more > &g

Re: [v5 PATCH 06/11] mm: vmscan: use a new flag to indicate shrinker is registered

2021-01-28 Thread Yang Shi
On Thu, Jan 28, 2021 at 9:56 AM Vlastimil Babka wrote: > > On 1/28/21 12:33 AM, Yang Shi wrote: > > Currently registered shrinker is indicated by non-NULL > > shrinker->nr_deferred. > > This approach is fine with nr_deferred at the shrinker level, but the > >

Re: [v5 PATCH 04/11] mm: vmscan: remove memcg_shrinker_map_size

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 3:22 AM Vlastimil Babka wrote: > > On 1/28/21 10:22 PM, Yang Shi wrote: > >> > @@ -266,12 +265,13 @@ int alloc_shrinker_maps(struct mem_cgroup *memcg) > >> > static int expand_shrinker_maps(int new_id) > >> > { > >> &g

Re: [v5 PATCH 02/11] mm: vmscan: consolidate shrinker_maps handling code

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 6:34 AM Kirill Tkhai wrote: > > On 28.01.2021 02:33, Yang Shi wrote: > > The shrinker map management is not purely memcg specific, it is at the > > intersection > > between memory cgroup and shrinkers. It's allocation and assignment of a &g

Re: [v5 PATCH 08/11] mm: vmscan: use per memcg nr_deferred of shrinker

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 6:59 AM Kirill Tkhai wrote: > > On 29.01.2021 17:55, Kirill Tkhai wrote: > > On 28.01.2021 02:33, Yang Shi wrote: > >> Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's > >> nr_deferred > >> will be u

Re: [v5 PATCH 07/11] mm: vmscan: add per memcg shrinker nr_deferred

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 5:00 AM Vlastimil Babka wrote: > > On 1/28/21 12:33 AM, Yang Shi wrote: > > Currently the number of deferred objects are per shrinker, but some slabs, > > for example, > > vfs inode/dentry cache are per memcg, this would result in poor iso

Re: [v5 PATCH 08/11] mm: vmscan: use per memcg nr_deferred of shrinker

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 7:13 AM Vlastimil Babka wrote: > > On 1/28/21 12:33 AM, Yang Shi wrote: > > Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's > > nr_deferred > > will be used in the following cases: > > 1. Non memcg

Re: [v5 PATCH 09/11] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 7:40 AM Vlastimil Babka wrote: > > On 1/28/21 12:33 AM, Yang Shi wrote: > > Now nr_deferred is available on per memcg level for memcg aware shrinkers, > > so don't need > > allocate shrinker->nr_deferred for such shrinkers anymore. >

Re: [v5 PATCH 10/11] mm: memcontrol: reparent nr_deferred when memcg offline

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 7:52 AM Vlastimil Babka wrote: > > On 1/28/21 12:33 AM, Yang Shi wrote: > > Now shrinker's nr_deferred is per memcg for memcg aware shrinkers, add to > > parent's > > corresponding nr_deferred when memcg offline. > > > > Si

Re: [v5 PATCH 07/11] mm: vmscan: add per memcg shrinker nr_deferred

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 9:20 AM Yang Shi wrote: > > On Fri, Jan 29, 2021 at 5:00 AM Vlastimil Babka wrote: > > > > On 1/28/21 12:33 AM, Yang Shi wrote: > > > Currently the number of deferred objects are per shrinker, but some > > > slabs, for example, > &

Re: [RFC][PATCH 05/13] mm/numa: automatically generate node migration order

2021-01-29 Thread Yang Shi
hat node_demotion[] > locking has no chance of becoming a bottleneck on large systems > with lots of CPUs in direct reclaim. > > This code is unused for now. It will be called later in the > series. > > Signed-off-by: Dave Hansen > Cc: Yang Shi > Cc: David Rientjes >

Re: [RFC][PATCH 06/13] mm/migrate: update migration order during on hotplug events

2021-01-29 Thread Yang Shi
; > This recalculation is far from optimal, most glaringly that it does > not even attempt to figure out if nodes are actually coming or going. > But, given the expected paucity of hotplug events, this should be > fine. > > Signed-off-by: Dave Hansen > Cc: Yang Shi > Cc:

Re: [RFC][PATCH 07/13] mm/migrate: make migrate_pages() return nr_succeeded

2021-01-29 Thread Yang Shi
On Mon, Jan 25, 2021 at 4:41 PM Dave Hansen wrote: > > > From: Yang Shi > > The migrate_pages() returns the number of pages that were not migrated, > or an error code. When returning an error code, there is no way to know > how many pages were migrated or not migrated.

Re: [v5 PATCH 07/11] mm: vmscan: add per memcg shrinker nr_deferred

2021-02-01 Thread Yang Shi
On Mon, Feb 1, 2021 at 7:17 AM Vlastimil Babka wrote: > > On 1/29/21 7:04 PM, Yang Shi wrote: > > >> > > @@ -209,9 +214,15 @@ static int expand_one_shrinker_info(struct > >> > > mem_cgroup *memcg, > >> > > i

Re: [v6 PATCH 07/11] mm: vmscan: add per memcg shrinker nr_deferred

2021-02-04 Thread Yang Shi
On Thu, Feb 4, 2021 at 12:31 AM Kirill Tkhai wrote: > > On 03.02.2021 20:20, Yang Shi wrote: > > Currently the number of deferred objects are per shrinker, but some slabs, > > for example, > > vfs inode/dentry cache are per memcg, this would result in poor iso

Re: [v6 PATCH 11/11] mm: vmscan: shrink deferred objects proportional to priority

2021-02-04 Thread Yang Shi
On Thu, Feb 4, 2021 at 2:23 AM Kirill Tkhai wrote: > > On 03.02.2021 20:20, Yang Shi wrote: > > The number of deferred objects might get windup to an absurd number, and it > > results in clamp of slab objects. It is undesirable for sustaining > > workingset. > >

Re: [v6 PATCH 09/11] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers

2021-02-04 Thread Yang Shi
On Thu, Feb 4, 2021 at 2:14 AM Kirill Tkhai wrote: > > On 04.02.2021 12:29, Kirill Tkhai wrote: > > On 03.02.2021 20:20, Yang Shi wrote: > >> Now nr_deferred is available on per memcg level for memcg aware shrinkers, > >> so don't need > >> alloc

Re: [v6 PATCH 08/11] mm: vmscan: use per memcg nr_deferred of shrinker

2021-02-04 Thread Yang Shi
On Thu, Feb 4, 2021 at 12:42 AM Kirill Tkhai wrote: > > On 03.02.2021 20:20, Yang Shi wrote: > > Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's > > nr_deferred > > will be used in the following cases: > > 1. Non memcg aware shri

Re: [v6 PATCH 07/11] mm: vmscan: add per memcg shrinker nr_deferred

2021-02-05 Thread Yang Shi
On Fri, Feb 5, 2021 at 6:38 AM Kirill Tkhai wrote: > > On 04.02.2021 20:17, Yang Shi wrote: > > On Thu, Feb 4, 2021 at 12:31 AM Kirill Tkhai wrote: > >> > >> On 03.02.2021 20:20, Yang Shi wrote: > >>> Currently the number of deferred objects are per sh

Re: [v6 PATCH 08/11] mm: vmscan: use per memcg nr_deferred of shrinker

2021-02-05 Thread Yang Shi
On Fri, Feb 5, 2021 at 6:42 AM Kirill Tkhai wrote: > > On 04.02.2021 20:23, Yang Shi wrote: > > On Thu, Feb 4, 2021 at 12:42 AM Kirill Tkhai wrote: > >> > >> On 03.02.2021 20:20, Yang Shi wrote: > >>> Use per memcg's nr_deferred for memcg awar

[v7 PATCH 11/12] mm: memcontrol: reparent nr_deferred when memcg offline

2021-02-09 Thread Yang Shi
Now shrinker's nr_deferred is per memcg for memcg aware shrinkers, add to parent's corresponding nr_deferred when memcg offline. Acked-by: Vlastimil Babka Acked-by: Kirill Tkhai Signed-off-by: Yang Shi --- include/linux/memcontrol.h | 1 + mm/memcontrol.c| 1 + m

[v7 PATCH 10/12] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers

2021-02-09 Thread Yang Shi
rinker's SHRINKER_MEMCG_AWARE flag would be cleared. This makes the implementation of this patch simpler. Acked-by: Vlastimil Babka Reviewed-by: Kirill Tkhai Signed-off-by: Yang Shi --- mm/vmscan.c | 33 ++--- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/mm/

[v7 PATCH 12/12] mm: vmscan: shrink deferred objects proportional to priority

2021-02-09 Thread Yang Shi
x27;s patch: https://lore.kernel.org/linux-xfs/20191031234618.15403-13-da...@fromorbit.com/ Tested with kernel build and vfs metadata heavy workload in our production environment, no regression is spotted so far. Signed-off-by: Yang Shi --- mm/vmscan.c | 40 +-

[v7 PATCH 09/12] mm: vmscan: use per memcg nr_deferred of shrinker

2021-02-09 Thread Yang Shi
Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's nr_deferred will be used in the following cases: 1. Non memcg aware shrinkers 2. !CONFIG_MEMCG 3. memcg is disabled by boot parameter Signed-off-by: Yang Shi --- mm/vms

[v7 PATCH 08/12] mm: vmscan: add per memcg shrinker nr_deferred

2021-02-09 Thread Yang Shi
ed all the time. Signed-off-by: Yang Shi --- include/linux/memcontrol.h | 7 +++--- mm/vmscan.c| 49 +- 2 files changed, 37 insertions(+), 19 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 4c92538

[v7 PATCH 06/12] mm: vmscan: add shrinker_info_protected() helper

2021-02-09 Thread Yang Shi
ct the dereference into a helper to make the code more readable. No functional change. Signed-off-by: Yang Shi --- mm/vmscan.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 9436f9246d32..273efbf4d53c 100644 --- a/mm/vmscan.c ++

[v7 PATCH 05/12] mm: memcontrol: rename shrinker_map to shrinker_info

2021-02-09 Thread Yang Shi
he "memcg_" prefix. Acked-by: Vlastimil Babka Acked-by: Kirill Tkhai Signed-off-by: Yang Shi --- include/linux/memcontrol.h | 8 ++--- mm/memcontrol.c| 6 ++-- mm/vmscan.c| 62 +++--- 3 files changed, 38 insertions(+), 38 deleti

[v7 PATCH 04/12] mm: vmscan: remove memcg_shrinker_map_size

2021-02-09 Thread Yang Shi
-by: Yang Shi --- mm/vmscan.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index e4ddaaaeffe2..641077b09e5d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -185,8 +185,10 @@ static LIST_HEAD(shrinker_list); static DECLARE_RWSEM

[v7 PATCH 03/12] mm: vmscan: use shrinker_rwsem to protect shrinker_maps allocation

2021-02-09 Thread Yang Shi
larity. And a test with heavy paging workload didn't show write lock makes things worse. Acked-by: Vlastimil Babka Acked-by: Kirill Tkhai Signed-off-by: Yang Shi --- mm/vmscan.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/mm/vmscan.c b/mm/vms

<    1   2   3   4   5   6   7   8   9   10   >