[PATCH 20/25] mm, compaction: Reduce unnecessary skipping of migration target scanner

2019-01-04 Thread Mel Gorman
and scan rates is marginal but avoiding unnecessary restarts is important. It helps later patches that are more careful about how pageblocks are treated as earlier iterations of those patches hit corner cases where the restarts were punishing and very visible. Signed-off-by: Mel Gorman --- mm

[PATCH 19/25] mm, compaction: Do not consider a need to reschedule as contention

2019-01-04 Thread Mel Gorman
( 0.00%)16249.30 * 20.32%* Amean fault-both-3217450.76 ( 0.00%)14904.71 * 14.59%* Signed-off-by: Mel Gorman --- mm/compaction.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 1a41a2dbff24..75eb0d40d4d7

[PATCH 18/25] mm, compaction: Rework compact_should_abort as compact_check_resched

2019-01-04 Thread Mel Gorman
patches but it just makes the review slightly harder. Signed-off-by: Mel Gorman --- mm/compaction.c | 61 ++--- 1 file changed, 23 insertions(+), 38 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index be27e4fa1b40..1a41a2dbff24 100644

[PATCH 17/25] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks

2019-01-04 Thread Mel Gorman
recently so overall the reduction in scan rates is a mere 2.8% which is borderline noise. Signed-off-by: Mel Gorman --- mm/compaction.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index 921720f7a416..be27e4fa1b40 100644 --- a/mm

[PATCH 16/25] mm, compaction: Check early for huge pages encountered by the migration scanner

2019-01-04 Thread Mel Gorman
are not materially different. Signed-off-by: Mel Gorman --- mm/compaction.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 608d274f9880..921720f7a416 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1071,6 +1071,9 @@ static

[PATCH 14/25] mm, compaction: Avoid rescanning the same pageblock multiple times

2019-01-04 Thread Mel Gorman
in this case. When it does happen, the scan rates multiple by factors measured in the hundreds and would be misleading to present. Signed-off-by: Mel Gorman --- mm/compaction.c | 32 ++-- mm/internal.h | 1 + 2 files changed, 27 insertions(+), 6 deletions(-) diff --git

[PATCH 15/25] mm, compaction: Finish pageblock scanning on contention

2019-01-04 Thread Mel Gorman
success rate but also by the fact that the scanners do not meet for longer when pageblocks are actually used. Overall this is justified and completing a pageblock scan is very important for later patches. Signed-off-by: Mel Gorman --- mm/compaction.c | 95

[PATCH 13/25] mm, compaction: Use free lists to quickly locate a migration target

2019-01-04 Thread Mel Gorman
by 35%. The 2-socket reductions for the free scanner are more dramatic which is a likely reflection that the machine has more memory. Signed-off-by: Mel Gorman --- mm/compaction.c | 203 ++-- 1 file changed, 198 insertions(+), 5 deletions(-) diff

[PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source

2019-01-04 Thread Mel Gorman
showed similar benefits. Signed-off-by: Mel Gorman --- mm/compaction.c | 179 +++- mm/internal.h | 2 + 2 files changed, 179 insertions(+), 2 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 8f0ce44dba41..137e32e8a2f5 100644

[PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance

2019-01-04 Thread Mel Gorman
( 0.00%) 95.17 ( 5.54%) Percentage huge-32 89.72 ( 0.00%) 93.59 ( 4.32%) Compaction migrate scanned5416830625516488 Compaction free scanned 80053095487603321 Migration scan rates are reduced by 52%. Signed-off-by: Mel Gorman --- mm/compaction.c | 126

[PATCH 06/25] mm, compaction: Skip pageblocks with reserved pages

2019-01-04 Thread Mel Gorman
but it would also be considered a bug given that such a change would ruin fragmentation. On both 1-socket and 2-socket machines, scan rates are reduced slightly on workloads that intensively allocate THP while the system is fragmented. Signed-off-by: Mel Gorman --- mm/compaction.c | 16 1

[PATCH 10/25] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction

2019-01-04 Thread Mel Gorman
was increased by less than 1% which is marginal. However, detailed tracing indicated that failure of migration due to a premature ENOMEM triggered by watermark checks were eliminated. Signed-off-by: Mel Gorman --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm

[PATCH 08/25] mm, compaction: Always finish scanning of a full pageblock

2019-01-04 Thread Mel Gorman
it is offset by future reductions in scanning. Hence, the results are not presented this time due to a misleading mix of gains/losses without any clear pattern. However, full scanning of the pageblock is important for later patches. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c

[PATCH 09/25] mm, compaction: Use the page allocator bulk-free helper for lists of pages

2019-01-04 Thread Mel Gorman
but it should reduce lock contention slightly in some cases. The main benefit is removing some partially duplicated code. Signed-off-by: Mel Gorman --- include/linux/gfp.h | 7 ++- mm/compaction.c | 12 +++- mm/page_alloc.c | 10 +- 3 files changed, 18 insertions(+), 11

[PATCH 07/25] mm, migrate: Immediately fail migration of a page with no migration handler

2019-01-04 Thread Mel Gorman
.00%)21707.05 ( 4.43%) Amean fault-both-3221692.92 ( 0.00%)21968.16 ( -1.27%) The 2-socket results are not materially different. Scan rates are similar as expected. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 delet

[PATCH 04/25] mm, compaction: Remove unnecessary zone parameter in some instances

2019-01-04 Thread Mel Gorman
. The change could be much deeper but this was enough to briefly clarify the flow. No functional change. Signed-off-by: Mel Gorman --- mm/compaction.c | 54 ++ 1 file changed, 26 insertions(+), 28 deletions(-) diff --git a/mm/compaction.c b/mm

[PATCH 05/25] mm, compaction: Rename map_pages to split_map_pages

2019-01-04 Thread Mel Gorman
It's non-obvious that high-order free pages are split into order-0 pages from the function name. Fix it. Signed-off-by: Mel Gorman --- mm/compaction.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 7acb43f07303..3afa4e9188b6

[PATCH 02/25] mm, compaction: Rearrange compact_control

2019-01-04 Thread Mel Gorman
compact_control spans two cache lines with write-intensive lines on both. Rearrange so the most write-intensive fields are in the same cache line. This has a negligible impact on the overall performance of compaction and is more a tidying exercise than anything. Signed-off-by: Mel Gorman Acked

[PATCH 00/25] Increase success rates and reduce latency of compaction v2

2019-01-04 Thread Mel Gorman
This series reduces scan rates and success rates of compaction, primarily by using the free lists to shorten scans, better controlling of skip information and whether multiple scanners can target the same block and capturing pageblocks before being stolen by parallel requests. The series is based

[PATCH 01/25] mm, compaction: Shrink compact_control

2019-01-04 Thread Mel Gorman
The isolate and migrate scanners should never isolate more than a pageblock of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm

[PATCH 03/25] mm, compaction: Remove last_migrated_pfn from compact_control

2019-01-04 Thread Mel Gorman
The last_migrated_pfn field is a bit dubious as to whether it really helps but either way, the information from it can be inferred without increasing the size of compact_control so remove the field. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 25

Re: [PATCH] mm, page_alloc: Do not wake kswapd with zone lock held

2019-01-04 Thread Mel Gorman
On Fri, Jan 04, 2019 at 09:18:38AM +0100, Vlastimil Babka wrote: > On 1/3/19 11:57 PM, Mel Gorman wrote: > > While zone->flag could have continued to be unused, there is potential > > for moving some existing fields into the flags field instead. Particularly > > re

[PATCH] mm, page_alloc: Do not wake kswapd with zone lock held

2019-01-03 Thread Mel Gorman
degredation in fragmentation treatment. While zone->flag could have continued to be unused, there is potential for moving some existing fields into the flags field instead. Particularly read-mostly ones like zone->initialized and zone->contiguous. Reported-by: syzbot+93d94a001cfbce9e6...@

Re: possible deadlock in __wake_up_common_lock

2019-01-03 Thread Mel Gorman
On Thu, Jan 03, 2019 at 02:40:35PM -0500, Qian Cai wrote: > > Signed-off-by: Mel Gorman > > Tested-by: Qian Cai Thanks! -- Mel Gorman SUSE Labs

Re: possible deadlock in __wake_up_common_lock

2019-01-03 Thread Mel Gorman
possible that the flag setting context is not the same as the flag clearing context or for small races to occur. However, each race possibility is harmless and there is no visible degredation in fragmentation treatment. While zone->flag could have continued to be unused, there is potential for moving so

Re: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration

2019-01-03 Thread Mel Gorman
well understood, it's not as clear to me whether distance is appropriate to describe "local-but-different-speed" memory given that accessing a remote NUMA node can saturate a single link where as the same may not be true of local-but-different-speed memory which probably has dedicated channels. In an ideal world, application developers interested in higher-speed-memory-reserved-for-important-use and cheaper-lower-speed-memory could describe what sort of application modifications they'd be willing to do but that might be unlikely. -- Mel Gorman SUSE Labs

Re: possible deadlock in __wake_up_common_lock

2019-01-02 Thread Mel Gorman
ly. 2. Use another alloc_flag in steal_suitable_fallback that is set when a wakeup is required but do the actual wakeup in rmqueue() after the zone locks are dropped and the allocation request is completed 3. Always wakeup kswapd if watermarks are boosted. I like this the least because it means doing wakeups that are unrelated to fragmentation that occurred in the current context. Any particular preference? While I recognise there is no test case available, how often does this trigger in syzbot as it would be nice to have some confirmation any patch is really fixing the problem. -- Mel Gorman SUSE Labs

Re: [PATCH] mm: compaction.c: Propagate return value upstream

2019-01-02 Thread Mel Gorman
not just ... > > Mel, Randy? You seem to have been the prime instigators on this. > Patch seems fine. Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-22 Thread Mel Gorman
rget o The exit condition for compaction is not when scanners meet but when fast_isolate_freepages cannot find any pageblock that is MIGRATE_MOVABLE && !pageblock_skip -- Mel Gorman SUSE Labs

Re: [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler

2018-12-20 Thread Mel Gorman
On Thu, Dec 20, 2018 at 11:44:57AM -0800, Yang Shi wrote: > On Fri, Dec 14, 2018 at 3:03 PM Mel Gorman > wrote: > > > > Pages with no migration handler use a fallback hander which sometimes > > works and sometimes persistently fails such as blockdev pages. Migration

Re: [PATCH 08/14] mm, compaction: Use the page allocator bulk-free helper for lists of pages

2018-12-19 Thread Mel Gorman
On Tue, Dec 18, 2018 at 10:55:31AM +0100, Vlastimil Babka wrote: > On 12/15/18 12:03 AM, Mel Gorman wrote: > > release_pages() is a simpler version of free_unref_page_list() but it > > tracks the highest PFN for caching the restart point of the compaction > > free scanner.

Re: [PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction

2018-12-18 Thread Mel Gorman
On Tue, Dec 18, 2018 at 02:58:33PM +0100, Vlastimil Babka wrote: > On 12/18/18 2:51 PM, Mel Gorman wrote: > > On Tue, Dec 18, 2018 at 01:36:42PM +0100, Vlastimil Babka wrote: > >> On 12/15/18 12:03 AM, Mel Gorman wrote: > >>> When pageblocks get fragmented, wate

Re: [PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction

2018-12-18 Thread Mel Gorman
On Tue, Dec 18, 2018 at 01:36:42PM +0100, Vlastimil Babka wrote: > On 12/15/18 12:03 AM, Mel Gorman wrote: > > When pageblocks get fragmented, watermarks are artifically boosted to pages > > are reclaimed to avoid further fragmentation events. However, compaction > > is often

Re: [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler

2018-12-18 Thread Mel Gorman
On Tue, Dec 18, 2018 at 10:06:31AM +0100, Vlastimil Babka wrote: > On 12/15/18 12:03 AM, Mel Gorman wrote: > > Pages with no migration handler use a fallback hander which sometimes > > works and sometimes persistently fails such as blockdev pages. Migration > > will re

Re: [PATCH 05/14] mm, compaction: Skip pageblocks with reserved pages

2018-12-18 Thread Mel Gorman
On Tue, Dec 18, 2018 at 09:08:02AM +0100, Vlastimil Babka wrote: > On 12/15/18 12:03 AM, Mel Gorman wrote: > > Reserved pages are set at boot time, tend to be clustered and almost > > never become unreserved. When isolating pages for migrating, skip > > the entire pagebloc

Re: [PATCH 04/14] mm, compaction: Rename map_pages to split_map_pages

2018-12-17 Thread Mel Gorman
On Mon, Dec 17, 2018 at 03:06:59PM +0100, Vlastimil Babka wrote: > On 12/15/18 12:03 AM, Mel Gorman wrote: > > It's non-obvious that high-order free pages are split into order-0 > > pages from the function name. Fix it. > > That's fine, but looks like the patch has an

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-14 Thread Mel Gorman
determine migration targets and set a bit if it should be > considered a migration source or a migration target. If all pages for a > pageblock are not on free_areas, they are fully used. > Series has patches which implement something similar to this idea. -- Mel Gorman SUSE Labs

[PATCH 13/14] mm, compaction: Capture a page under direct compaction

2018-12-14 Thread Mel Gorman
%) 99.22 ( 3.86%) Percentage huge-32 94.94 ( 0.00%) 98.97 ( 4.25%) And scan rates are reduced Compaction migrate scanned2763428419002941 Compaction free scanned 5527951946395714 Signed-off-by: Mel Gorman --- include/linux/compaction.h | 3 ++- include/linux

[PATCH 14/14] mm, compaction: Do not direct compact remote memory

2018-12-14 Thread Mel Gorman
, they are forbidden at the time of writing but if __GFP_THISNODE is ever removed, then it would still be preferable to fallback to small local base pages over remote THP in the general case. kcompactd is still woken via kswapd so compaction happens eventually. Signed-off-by: Mel Gorman --- mm

[PATCH 12/14] mm, compaction: Use free lists to quickly locate a migration target

2018-12-14 Thread Mel Gorman
isolmig-v1r4findfree-v1r8 Compaction migrate scanned2558745327634284 Compaction free scanned 8773589455279519 The free scan rates are reduced by 37%. Signed-off-by: Mel Gorman --- mm/compaction.c | 201

[PATCH 11/14] mm, compaction: Keep migration source private to a single compaction instance

2018-12-14 Thread Mel Gorman
%) Compaction migrate scanned5100545025587453 Compaction free scanned 78035946487735894 Migration scan rates are reduced by 49%. At the time of writing, the 2-socket results are not yet available. Signed-off-by: Mel Gorman --- mm/compaction.c | 112

[PATCH 10/14] mm, compaction: Use free lists to quickly locate a migration source

2018-12-14 Thread Mel Gorman
This is showing a 16% reduction in migration scanning with some mild improvements on latency. A 2-socket machine showed similar reductions of scan rates in percentage terms. Signed-off-by: Mel Gorman --- mm/compaction.c | 179 +++- mm/internal.h | 2

[PATCH 08/14] mm, compaction: Use the page allocator bulk-free helper for lists of pages

2018-12-14 Thread Mel Gorman
release_pages() is a simpler version of free_unref_page_list() but it tracks the highest PFN for caching the restart point of the compaction free scanner. This patch optionally tracks the highest PFN in the core helper and converts compaction to use it. Signed-off-by: Mel Gorman --- include

[PATCH 03/14] mm, compaction: Remove last_migrated_pfn from compact_control

2018-12-14 Thread Mel Gorman
The last_migrated_pfn field is a bit dubious as to whether it really helps but either way, the information from it can be inferred without increasing the size of compact_control so remove the field. Signed-off-by: Mel Gorman --- mm/compaction.c | 25 + mm/internal.h

[PATCH 07/14] mm, compaction: Always finish scanning of a full pageblock

2018-12-14 Thread Mel Gorman
of the pageblock and sometimes it is offset by future reductions in scanning. Hence, the results are not presented this time as it's a mix of gains/losses without any clear pattern. However, completing scanning of the pageblock is important for later patches. Signed-off-by: Mel Gorman --- mm/compaction.c

[PATCH 04/14] mm, compaction: Rename map_pages to split_map_pages

2018-12-14 Thread Mel Gorman
It's non-obvious that high-order free pages are split into order-0 pages from the function name. Fix it. Signed-off-by: Mel Gorman --- mm/compaction.c | 60 - 1 file changed, 29 insertions(+), 31 deletions(-) diff --git a/mm/compaction.c

[PATCH 02/14] mm, compaction: Rearrange compact_control

2018-12-14 Thread Mel Gorman
compact_control spans two cache lines with write-intensive lines on both. Rearrange so the most write-intensive fields are in the same cache line. This has a negligible impact on the overall performance of compaction and is more a tidying exercise than anything. Signed-off-by: Mel Gorman --- mm

[PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction

2018-12-14 Thread Mel Gorman
sensitive to timing and whether the boost was active or not. However, detailed tracing indicated that failure of migration due to a premature ENOMEM triggered by watermark checks were eliminated. Signed-off-by: Mel Gorman --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion

[PATCH 05/14] mm, compaction: Skip pageblocks with reserved pages

2018-12-14 Thread Mel Gorman
( 0.00%) 1052.64 * 10.52%* Compaction migrate scanned 3860713 3294284 Compaction free scanned 613786341 433423502 Kcompactd migrate scanned 408711 291915 Kcompactd free scanned 242509759 217164988 Signed-off-by: Mel Gorman --- mm/compaction.c | 7

[PATCH 01/14] mm, compaction: Shrink compact_control

2018-12-14 Thread Mel Gorman
The isolate and migrate scanners should never isolate more than a pageblock of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build. Signed-off-by: Mel Gorman --- mm/internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h

[RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1

2018-12-14 Thread Mel Gorman
This is a very preliminary RFC. I'm posting this early as the __GFP_THISNODE discussion continues and has started looking at the compaction implementation and it'd be worth looking at this series fdirst. The cc list is based on that dicussion just to make them aware it exists. A v2 will have a

[PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler

2018-12-14 Thread Mel Gorman
( 4.62%) Amean fault-both-3222461.41 ( 0.00%)21415.35 ( 4.66%) The 2-socket results are not materially different. Scan rates are similar as expected. Signed-off-by: Mel Gorman --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/migrate.c b

Re: [RFC/RFT][PATCH v6] cpuidle: New timer events oriented governor for tickless systems

2018-12-07 Thread Mel Gorman
a regular user, but > they seem to want to modify: > > /sys/kernel/mm/transparent_hugepage/enabled > Red herring in this case. Even if transparent hugepages are left as the default, it still tries to write it stupidly. An irritating, but harmless bug. -- Mel Gorman SUSE Labs

Re: [RFC/RFT][PATCH v6] cpuidle: New timer events oriented governor for tickless systems

2018-12-07 Thread Mel Gorman
a regular user, but > they seem to want to modify: > > /sys/kernel/mm/transparent_hugepage/enabled > Red herring in this case. Even if transparent hugepages are left as the default, it still tries to write it stupidly. An irritating, but harmless bug. -- Mel Gorman SUSE Labs

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Mel Gorman
On Wed, Dec 05, 2018 at 10:08:56AM +0100, Michal Hocko wrote: > On Tue 04-12-18 16:47:23, David Rientjes wrote: > > On Tue, 4 Dec 2018, Mel Gorman wrote: > > > > > What should also be kept in mind is that we should avoid conflating > > > locality preferences with

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Mel Gorman
On Wed, Dec 05, 2018 at 10:08:56AM +0100, Michal Hocko wrote: > On Tue 04-12-18 16:47:23, David Rientjes wrote: > > On Tue, 4 Dec 2018, Mel Gorman wrote: > > > > > What should also be kept in mind is that we should avoid conflating > > > locality preferences with

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-05 Thread Mel Gorman
t affects the level of work the system does as well as the overall success rate of operations (be it reclaim, THP allocation, compaction, whatever). This is why a reproduction case that is representative of the problem you're facing on the real workload matters would have been helpful because then any alternative proposal could have taken your workload into account during testing. -- Mel Gorman SUSE Labs

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-05 Thread Mel Gorman
t affects the level of work the system does as well as the overall success rate of operations (be it reclaim, THP allocation, compaction, whatever). This is why a reproduction case that is representative of the problem you're facing on the real workload matters would have been helpful because then any alternative proposal could have taken your workload into account during testing. -- Mel Gorman SUSE Labs

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Mel Gorman
On Tue, Dec 04, 2018 at 10:45:58AM +, Mel Gorman wrote: > I have *one* result of the series on a 1-socket machine running > "thpscale". It creates a file, punches holes in it to create a > very light form of fragmentation and then tries THP allocations > using mad

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Mel Gorman
On Tue, Dec 04, 2018 at 10:45:58AM +, Mel Gorman wrote: > I have *one* result of the series on a 1-socket machine running > "thpscale". It creates a file, punches holes in it to create a > very light form of fragmentation and then tries THP allocations > using mad

Re: [PATCH 5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-12-05 Thread Mel Gorman
robably worthwhile > > for long-term allocation success rates. It is possible to eliminate > > fragmentation events entirely with tuning due to this patch although that > > would require careful evaluation to determine if it's worthwhile. > > > > Signed-off-by: Mel Go

Re: [PATCH 5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-12-05 Thread Mel Gorman
robably worthwhile > > for long-term allocation success rates. It is possible to eliminate > > fragmentation events entirely with tuning due to this patch although that > > would require careful evaluation to determine if it's worthwhile. > > > > Signed-off-by: Mel Go

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-04 Thread Mel Gorman
r to put this special case > out of the main reclaim/compaction retry-with-increasing-priority loop > for non-costly-order allocations that in general can't fail. > Again, this is accurate. Scanning/compaction costs a lot. This has improved over time, but minimally it's unmapping pages, copying data and a bunch of TLB flushes. During migration, any access to the data being migrated stalls. The harm of reclaiming a little first so that the compaction is more likely to succeed incurred fewer stalls of small magnitude in general -- or at least it was the case when that behaviour was developed. -- Mel Gorman SUSE Labs

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-04 Thread Mel Gorman
r to put this special case > out of the main reclaim/compaction retry-with-increasing-priority loop > for non-costly-order allocations that in general can't fail. > Again, this is accurate. Scanning/compaction costs a lot. This has improved over time, but minimally it's unmapping pages, copying data and a bunch of TLB flushes. During migration, any access to the data being migrated stalls. The harm of reclaiming a little first so that the compaction is more likely to succeed incurred fewer stalls of small magnitude in general -- or at least it was the case when that behaviour was developed. -- Mel Gorman SUSE Labs

Re: [PATCH 5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-27 Thread Mel Gorman
robably worthwhile > > for long-term allocation success rates. It is possible to eliminate > > fragmentation events entirely with tuning due to this patch although that > > would require careful evaluation to determine if it's worthwhile. > > > > Signed-off-by: Mel Go

Re: [PATCH 5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-27 Thread Mel Gorman
robably worthwhile > > for long-term allocation success rates. It is possible to eliminate > > fragmentation events entirely with tuning due to this patch although that > > would require careful evaluation to determine if it's worthwhile. > > > > Signed-off-by: Mel Go

Re: Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
icated it would) and that disabling PSI by default is reasonably close in terms of performance for this particular workload on this particular machine so; Tested-by: Mel Gorman Thanks! -- Mel Gorman SUSE Labs

Re: Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
icated it would) and that disabling PSI by default is reasonably close in terms of performance for this particular workload on this particular machine so; Tested-by: Mel Gorman Thanks! -- Mel Gorman SUSE Labs

Re: Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
On Mon, Nov 26, 2018 at 12:32:18PM -0500, Johannes Weiner wrote: > On Mon, Nov 26, 2018 at 04:54:47PM +0000, Mel Gorman wrote: > > On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote: > > > @@ -509,6 +509,15 @@ config PSI > > > > > > Sa

Re: Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
On Mon, Nov 26, 2018 at 12:32:18PM -0500, Johannes Weiner wrote: > On Mon, Nov 26, 2018 at 04:54:47PM +0000, Mel Gorman wrote: > > On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote: > > > @@ -509,6 +509,15 @@ config PSI > > > > > > Sa

Re: Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote: > Hi Mel, > > On Mon, Nov 26, 2018 at 01:34:20PM +0000, Mel Gorman wrote: > > Hi Johannes, > > > > PSI is a great idea but it does have overhead and if enabled by Kconfig > > then it incur

Re: Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote: > Hi Mel, > > On Mon, Nov 26, 2018 at 01:34:20PM +0000, Mel Gorman wrote: > > Hi Johannes, > > > > PSI is a great idea but it does have overhead and if enabled by Kconfig > > then it incur

[PATCH] mm: Use alloc_flags to record if kswapd can wake -fix

2018-11-26 Thread Mel Gorman
Vlastimil Babka correctly pointed out that the ALLOC_KSWAPD flag needs to be applied in the !CONFIG_ZONE_DMA32 case. This is a fix for the mmotm path mm-use-alloc_flags-to-record-if-kswapd-can-wake.patch Signed-off-by: Mel Gorman --- mm/page_alloc.c | 10 ++ 1 file changed, 2 insertions

[PATCH] mm: Use alloc_flags to record if kswapd can wake -fix

2018-11-26 Thread Mel Gorman
Vlastimil Babka correctly pointed out that the ALLOC_KSWAPD flag needs to be applied in the !CONFIG_ZONE_DMA32 case. This is a fix for the mmotm path mm-use-alloc_flags-to-record-if-kswapd-can-wake.patch Signed-off-by: Mel Gorman --- mm/page_alloc.c | 10 ++ 1 file changed, 2 insertions

Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
60] psi: cgroup support git bisect bad 2ce7135adc9ad081aa3c49744144376ac74fea60 # first bad commit: [2ce7135adc9ad081aa3c49744144376ac74fea60] psi: cgroup support -- Mel Gorman SUSE Labs

Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
60] psi: cgroup support git bisect bad 2ce7135adc9ad081aa3c49744144376ac74fea60 # first bad commit: [2ce7135adc9ad081aa3c49744144376ac74fea60] psi: cgroup support -- Mel Gorman SUSE Labs

[PATCH 3/5] mm: Use alloc_flags to record if kswapd can wake

2018-11-23 Thread Mel Gorman
be claimed that this has nothing to do with ALLOC_NO_FRAGMENT. That's true in this patch but is not true later so it's done now for easier review to show where the flag needs to be recorded. No functional change. Signed-off-by: Mel Gorman --- mm/internal.h | 1 + mm/page_alloc.c | 25

[PATCH 3/5] mm: Use alloc_flags to record if kswapd can wake

2018-11-23 Thread Mel Gorman
be claimed that this has nothing to do with ALLOC_NO_FRAGMENT. That's true in this patch but is not true later so it's done now for easier review to show where the flag needs to be recorded. No functional change. Signed-off-by: Mel Gorman --- mm/internal.h | 1 + mm/page_alloc.c | 25

[PATCH 4/5] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-23 Thread Mel Gorman
erm allocation success rate would be higher. Signed-off-by: Mel Gorman --- Documentation/sysctl/vm.txt | 21 +++ include/linux/mm.h | 1 + include/linux/mmzone.h | 11 ++-- kernel/sysctl.c | 8 +++ mm/page_alloc.c | 43 +- mm/vmscan.c

[PATCH 2/5] mm: Move zone watermark accesses behind an accessor

2018-11-23 Thread Mel Gorman
This is a preparation patch only, no functional change. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- include/linux/mmzone.h | 9 + mm/compaction.c| 2 +- mm/page_alloc.c| 12 ++-- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git

[PATCH 4/5] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-23 Thread Mel Gorman
erm allocation success rate would be higher. Signed-off-by: Mel Gorman --- Documentation/sysctl/vm.txt | 21 +++ include/linux/mm.h | 1 + include/linux/mmzone.h | 11 ++-- kernel/sysctl.c | 8 +++ mm/page_alloc.c | 43 +- mm/vmscan.c

[PATCH 2/5] mm: Move zone watermark accesses behind an accessor

2018-11-23 Thread Mel Gorman
This is a preparation patch only, no functional change. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- include/linux/mmzone.h | 9 + mm/compaction.c| 2 +- mm/page_alloc.c| 12 ++-- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git

[PATCH 1/5] mm, page_alloc: Spread allocations across zones before introducing fragmentation

2018-11-23 Thread Mel Gorman
the relevance is reduced later in the series. Overall, the patch reduces the number of external fragmentation causing events so the success of THP over long periods of time would be improved for this adverse workload. Signed-off-by: Mel Gorman --- mm/inte

[PATCH 0/5] Fragmentation avoidance improvements v5

2018-11-23 Thread Mel Gorman
There are some big changes due to both Vlastimil's review feedback on v4 and some oddities spotted while answering his review. In some respects, the series is slightly less effective but the approach is more consistent and logical overall. The overhead is also lower from the first patch and

[PATCH 1/5] mm, page_alloc: Spread allocations across zones before introducing fragmentation

2018-11-23 Thread Mel Gorman
the relevance is reduced later in the series. Overall, the patch reduces the number of external fragmentation causing events so the success of THP over long periods of time would be improved for this adverse workload. Signed-off-by: Mel Gorman --- mm/inte

[PATCH 0/5] Fragmentation avoidance improvements v5

2018-11-23 Thread Mel Gorman
There are some big changes due to both Vlastimil's review feedback on v4 and some oddities spotted while answering his review. In some respects, the series is slightly less effective but the approach is more consistent and logical overall. The overhead is also lower from the first patch and

[PATCH 5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-23 Thread Mel Gorman
n be enough for kswapd to catch up. How much that helps is variable but probably worthwhile for long-term allocation success rates. It is possible to eliminate fragmentation events entirely with tuning due to this patch although that would require careful evaluation to determine if it's worthwhil

[PATCH 5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-23 Thread Mel Gorman
n be enough for kswapd to catch up. How much that helps is variable but probably worthwhile for long-term allocation success rates. It is possible to eliminate fragmentation events entirely with tuning due to this patch although that would require careful evaluation to determine if it's worthwhil

Re: [PATCH 4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-22 Thread Mel Gorman
On Thu, Nov 22, 2018 at 06:02:10PM +0100, Vlastimil Babka wrote: > On 11/21/18 11:14 AM, Mel Gorman wrote: > > An event that potentially causes external fragmentation problems has > > already been described but there are degrees of severity. A "serious" > > even

Re: [PATCH 4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-22 Thread Mel Gorman
On Thu, Nov 22, 2018 at 06:02:10PM +0100, Vlastimil Babka wrote: > On 11/21/18 11:14 AM, Mel Gorman wrote: > > An event that potentially causes external fragmentation problems has > > already been described but there are degrees of severity. A "serious" > > even

Re: [PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-22 Thread Mel Gorman
sn't seem worth the trouble. Indeed. While it works in some cases, it'll be full of holes and while I could close them, it just turns into a subtle mess. I've prepared a preparation path that encodes __GFP_KSWAPD_RECLAIM in alloc_flags and checks based on that. It's a lot cleaner overall, it's less of a mess than passing gfp_flags all the way through for one test and there are fewer side-effects. Thanks! -- Mel Gorman SUSE Labs

Re: [PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-22 Thread Mel Gorman
sn't seem worth the trouble. Indeed. While it works in some cases, it'll be full of holes and while I could close them, it just turns into a subtle mess. I've prepared a preparation path that encodes __GFP_KSWAPD_RECLAIM in alloc_flags and checks based on that. It's a lot cleaner overall, it's less of a mess than passing gfp_flags all the way through for one test and there are fewer side-effects. Thanks! -- Mel Gorman SUSE Labs

Re: [PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-22 Thread Mel Gorman
But returning 0 here means > actually allowing the allocation go through steal_suitable_fallback()? > So should it return ALLOC_NOFRAGMENT below, or was the intent different? > I want to avoid waking kswapd in steal_suitable_fallback if waking kswapd is not allowed. If the calling context does not allow it, it does mean that fragmentation will be allowed to occur. I'm banking on it being a relatively rare case but potentially it'll be problematic. The main source of allocation requests that I expect to hit this are THP and as they are already at pageblock_order, it has limited impact from a fragmentation perspective -- particularly as pageblock_order stealing is allowed even with ALLOC_NOFRAGMENT. -- Mel Gorman SUSE Labs

Re: [PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-22 Thread Mel Gorman
But returning 0 here means > actually allowing the allocation go through steal_suitable_fallback()? > So should it return ALLOC_NOFRAGMENT below, or was the intent different? > I want to avoid waking kswapd in steal_suitable_fallback if waking kswapd is not allowed. If the calling context does not allow it, it does mean that fragmentation will be allowed to occur. I'm banking on it being a relatively rare case but potentially it'll be problematic. The main source of allocation requests that I expect to hit this are THP and as they are already at pageblock_order, it has limited impact from a fragmentation perspective -- particularly as pageblock_order stealing is allowed even with ALLOC_NOFRAGMENT. -- Mel Gorman SUSE Labs

Re: [PATCH 1/4] mm, page_alloc: Spread allocations across zones before introducing fragmentation

2018-11-21 Thread Mel Gorman
zoneref *z = ac->preferred_zoneref; > > struct zone *zone; > > struct pglist_data *last_pgdat_dirty_limit = NULL; > > + bool no_fallback; > > > > +retry: > > Ugh, I think 'z = ac->preferred_zoneref' should be moved here under > retry. AFAICS without that, the preference of local node to > fragmentation avoidance doesn't work? > Yup, you're right! In the event of fragmentation of both normal and dma32 zone, it doesn't restart on the local node and instead falls over to the remote node prematurely. This is obviously not desirable. I'll give it and thanks for spotting it. -- Mel Gorman SUSE Labs

Re: [PATCH 1/4] mm, page_alloc: Spread allocations across zones before introducing fragmentation

2018-11-21 Thread Mel Gorman
zoneref *z = ac->preferred_zoneref; > > struct zone *zone; > > struct pglist_data *last_pgdat_dirty_limit = NULL; > > + bool no_fallback; > > > > +retry: > > Ugh, I think 'z = ac->preferred_zoneref' should be moved here under > retry. AFAICS without that, the preference of local node to > fragmentation avoidance doesn't work? > Yup, you're right! In the event of fragmentation of both normal and dma32 zone, it doesn't restart on the local node and instead falls over to the remote node prematurely. This is obviously not desirable. I'll give it and thanks for spotting it. -- Mel Gorman SUSE Labs

[PATCH 0/4] Fragmentation avoidance improvements v4

2018-11-21 Thread Mel Gorman
No major change from v3 really, mostly resending to see if there is any review reaction. It's rebased but a partial test indicated that the behaviour is similar to the previous baseline Changelog since v3 o Rebase to 4.20-rc3 o Remove a stupid warning from the last patch Changelog since v2 o

[PATCH 0/4] Fragmentation avoidance improvements v4

2018-11-21 Thread Mel Gorman
No major change from v3 really, mostly resending to see if there is any review reaction. It's rebased but a partial test indicated that the behaviour is similar to the previous baseline Changelog since v3 o Rebase to 4.20-rc3 o Remove a stupid warning from the last patch Changelog since v2 o

[PATCH 2/4] mm: Move zone watermark accesses behind an accessor

2018-11-21 Thread Mel Gorman
This is a preparation patch only, no functional change. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 9 + mm/compaction.c| 2 +- mm/page_alloc.c| 12 ++-- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b

[PATCH 2/4] mm: Move zone watermark accesses behind an accessor

2018-11-21 Thread Mel Gorman
This is a preparation patch only, no functional change. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 9 + mm/compaction.c| 2 +- mm/page_alloc.c| 12 ++-- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b

<    3   4   5   6   7   8   9   10   11   12   >