and scan rates is marginal but
avoiding unnecessary restarts is important. It helps later patches that
are more careful about how pageblocks are treated as earlier iterations
of those patches hit corner cases where the restarts were punishing and
very visible.
Signed-off-by: Mel Gorman
---
mm
( 0.00%)16249.30 * 20.32%*
Amean fault-both-3217450.76 ( 0.00%)14904.71 * 14.59%*
Signed-off-by: Mel Gorman
---
mm/compaction.c | 12 ++--
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 1a41a2dbff24..75eb0d40d4d7
patches but it just makes the review slightly
harder.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 61 ++---
1 file changed, 23 insertions(+), 38 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index be27e4fa1b40..1a41a2dbff24 100644
recently so overall the reduction in scan rates is a mere 2.8% which
is borderline noise.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 18 ++
1 file changed, 18 insertions(+)
diff --git a/mm/compaction.c b/mm/compaction.c
index 921720f7a416..be27e4fa1b40 100644
--- a/mm
are not materially different.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 16
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 608d274f9880..921720f7a416 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1071,6 +1071,9 @@ static
in this case. When it does happen,
the scan rates multiple by factors measured in the hundreds and would be
misleading to present.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 32 ++--
mm/internal.h | 1 +
2 files changed, 27 insertions(+), 6 deletions(-)
diff --git
success
rate but also by the fact that the scanners do not meet for longer when
pageblocks are actually used. Overall this is justified and completing
a pageblock scan is very important for later patches.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 95
by 35%. The 2-socket reductions for the
free scanner are more dramatic which is a likely reflection that the
machine has more memory.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 203 ++--
1 file changed, 198 insertions(+), 5 deletions(-)
diff
showed similar benefits.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 179 +++-
mm/internal.h | 2 +
2 files changed, 179 insertions(+), 2 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 8f0ce44dba41..137e32e8a2f5 100644
( 0.00%) 95.17 ( 5.54%)
Percentage huge-32 89.72 ( 0.00%) 93.59 ( 4.32%)
Compaction migrate scanned5416830625516488
Compaction free scanned 80053095487603321
Migration scan rates are reduced by 52%.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 126
but it would also be considered a bug given that such a change
would ruin fragmentation.
On both 1-socket and 2-socket machines, scan rates are reduced slightly
on workloads that intensively allocate THP while the system is fragmented.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 16
1
was increased by less
than 1% which is marginal. However, detailed tracing indicated that
failure of migration due to a premature ENOMEM triggered by watermark
checks were eliminated.
Signed-off-by: Mel Gorman
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm
it is offset by future reductions
in scanning. Hence, the results are not presented this time due to a
misleading mix of gains/losses without any clear pattern. However, full
scanning of the pageblock is important for later patches.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
mm/compaction.c
but it should reduce lock contention slightly in some cases.
The main benefit is removing some partially duplicated code.
Signed-off-by: Mel Gorman
---
include/linux/gfp.h | 7 ++-
mm/compaction.c | 12 +++-
mm/page_alloc.c | 10 +-
3 files changed, 18 insertions(+), 11
.00%)21707.05 ( 4.43%)
Amean fault-both-3221692.92 ( 0.00%)21968.16 ( -1.27%)
The 2-socket results are not materially different. Scan rates are similar
as expected.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 delet
. The
change could be much deeper but this was enough to briefly clarify the
flow.
No functional change.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 54 ++
1 file changed, 26 insertions(+), 28 deletions(-)
diff --git a/mm/compaction.c b/mm
It's non-obvious that high-order free pages are split into order-0 pages
from the function name. Fix it.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 7acb43f07303..3afa4e9188b6
compact_control spans two cache lines with write-intensive lines on
both. Rearrange so the most write-intensive fields are in the same
cache line. This has a negligible impact on the overall performance of
compaction and is more a tidying exercise than anything.
Signed-off-by: Mel Gorman
Acked
This series reduces scan rates and success rates of compaction, primarily
by using the free lists to shorten scans, better controlling of skip
information and whether multiple scanners can target the same block and
capturing pageblocks before being stolen by parallel requests. The series
is based
The isolate and migrate scanners should never isolate more than a pageblock
of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
mm/internal.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm
The last_migrated_pfn field is a bit dubious as to whether it really helps
but either way, the information from it can be inferred without increasing
the size of compact_control so remove the field.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
mm/compaction.c | 25
On Fri, Jan 04, 2019 at 09:18:38AM +0100, Vlastimil Babka wrote:
> On 1/3/19 11:57 PM, Mel Gorman wrote:
> > While zone->flag could have continued to be unused, there is potential
> > for moving some existing fields into the flags field instead. Particularly
> > re
degredation in fragmentation treatment.
While zone->flag could have continued to be unused, there is potential
for moving some existing fields into the flags field instead. Particularly
read-mostly ones like zone->initialized and zone->contiguous.
Reported-by: syzbot+93d94a001cfbce9e6...@
On Thu, Jan 03, 2019 at 02:40:35PM -0500, Qian Cai wrote:
> > Signed-off-by: Mel Gorman
>
> Tested-by: Qian Cai
Thanks!
--
Mel Gorman
SUSE Labs
possible that the flag setting context is not
the same as the flag clearing context or for small races to occur.
However, each race possibility is harmless and there is no visible
degredation in fragmentation treatment.
While zone->flag could have continued to be unused, there is potential
for moving so
well understood,
it's not as clear to me whether distance is appropriate to describe
"local-but-different-speed" memory given that accessing a remote
NUMA node can saturate a single link where as the same may not
be true of local-but-different-speed memory which probably has
dedicated channels. In an ideal world, application developers
interested in higher-speed-memory-reserved-for-important-use and
cheaper-lower-speed-memory could describe what sort of application
modifications they'd be willing to do but that might be unlikely.
--
Mel Gorman
SUSE Labs
ly.
2. Use another alloc_flag in steal_suitable_fallback that is set when a
wakeup is required but do the actual wakeup in rmqueue() after the
zone locks are dropped and the allocation request is completed
3. Always wakeup kswapd if watermarks are boosted. I like this the least
because it means doing wakeups that are unrelated to fragmentation
that occurred in the current context.
Any particular preference?
While I recognise there is no test case available, how often does this
trigger in syzbot as it would be nice to have some confirmation any
patch is really fixing the problem.
--
Mel Gorman
SUSE Labs
not just ...
>
> Mel, Randy? You seem to have been the prime instigators on this.
>
Patch seems fine.
Acked-by: Mel Gorman
--
Mel Gorman
SUSE Labs
rget
o The exit condition for compaction is not when scanners meet but when
fast_isolate_freepages cannot find any pageblock that is
MIGRATE_MOVABLE && !pageblock_skip
--
Mel Gorman
SUSE Labs
On Thu, Dec 20, 2018 at 11:44:57AM -0800, Yang Shi wrote:
> On Fri, Dec 14, 2018 at 3:03 PM Mel Gorman
> wrote:
> >
> > Pages with no migration handler use a fallback hander which sometimes
> > works and sometimes persistently fails such as blockdev pages. Migration
On Tue, Dec 18, 2018 at 10:55:31AM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > release_pages() is a simpler version of free_unref_page_list() but it
> > tracks the highest PFN for caching the restart point of the compaction
> > free scanner.
On Tue, Dec 18, 2018 at 02:58:33PM +0100, Vlastimil Babka wrote:
> On 12/18/18 2:51 PM, Mel Gorman wrote:
> > On Tue, Dec 18, 2018 at 01:36:42PM +0100, Vlastimil Babka wrote:
> >> On 12/15/18 12:03 AM, Mel Gorman wrote:
> >>> When pageblocks get fragmented, wate
On Tue, Dec 18, 2018 at 01:36:42PM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > When pageblocks get fragmented, watermarks are artifically boosted to pages
> > are reclaimed to avoid further fragmentation events. However, compaction
> > is often
On Tue, Dec 18, 2018 at 10:06:31AM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > Pages with no migration handler use a fallback hander which sometimes
> > works and sometimes persistently fails such as blockdev pages. Migration
> > will re
On Tue, Dec 18, 2018 at 09:08:02AM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > Reserved pages are set at boot time, tend to be clustered and almost
> > never become unreserved. When isolating pages for migrating, skip
> > the entire pagebloc
On Mon, Dec 17, 2018 at 03:06:59PM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > It's non-obvious that high-order free pages are split into order-0
> > pages from the function name. Fix it.
>
> That's fine, but looks like the patch has an
determine migration targets and set a bit if it should be
> considered a migration source or a migration target. If all pages for a
> pageblock are not on free_areas, they are fully used.
>
Series has patches which implement something similar to this idea.
--
Mel Gorman
SUSE Labs
%) 99.22 ( 3.86%)
Percentage huge-32 94.94 ( 0.00%) 98.97 ( 4.25%)
And scan rates are reduced
Compaction migrate scanned2763428419002941
Compaction free scanned 5527951946395714
Signed-off-by: Mel Gorman
---
include/linux/compaction.h | 3 ++-
include/linux
, they are forbidden at the time of writing but if __GFP_THISNODE
is ever removed, then it would still be preferable to fallback to small
local base pages over remote THP in the general case. kcompactd is still
woken via kswapd so compaction happens eventually.
Signed-off-by: Mel Gorman
---
mm
isolmig-v1r4findfree-v1r8
Compaction migrate scanned2558745327634284
Compaction free scanned 8773589455279519
The free scan rates are reduced by 37%.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 201
%)
Compaction migrate scanned5100545025587453
Compaction free scanned 78035946487735894
Migration scan rates are reduced by 49%. At the time of writing, the
2-socket results are not yet available.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 112
This is showing a 16% reduction in migration scanning with some mild
improvements on latency. A 2-socket machine showed similar reductions
of scan rates in percentage terms.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 179 +++-
mm/internal.h | 2
release_pages() is a simpler version of free_unref_page_list() but it
tracks the highest PFN for caching the restart point of the compaction
free scanner. This patch optionally tracks the highest PFN in the core
helper and converts compaction to use it.
Signed-off-by: Mel Gorman
---
include
The last_migrated_pfn field is a bit dubious as to whether it really helps
but either way, the information from it can be inferred without increasing
the size of compact_control so remove the field.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 25 +
mm/internal.h
of the pageblock and sometimes it is offset by future
reductions in scanning. Hence, the results are not presented this time as
it's a mix of gains/losses without any clear pattern. However, completing
scanning of the pageblock is important for later patches.
Signed-off-by: Mel Gorman
---
mm/compaction.c
It's non-obvious that high-order free pages are split into order-0
pages from the function name. Fix it.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 60 -
1 file changed, 29 insertions(+), 31 deletions(-)
diff --git a/mm/compaction.c
compact_control spans two cache lines with write-intensive lines on
both. Rearrange so the most write-intensive fields are in the same
cache line. This has a negligible impact on the overall performance of
compaction and is more a tidying exercise than anything.
Signed-off-by: Mel Gorman
---
mm
sensitive to timing and whether the boost was active or not. However,
detailed tracing indicated that failure of migration due to a premature
ENOMEM triggered by watermark checks were eliminated.
Signed-off-by: Mel Gorman
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion
( 0.00%) 1052.64 * 10.52%*
Compaction migrate scanned 3860713 3294284
Compaction free scanned 613786341 433423502
Kcompactd migrate scanned 408711 291915
Kcompactd free scanned 242509759 217164988
Signed-off-by: Mel Gorman
---
mm/compaction.c | 7
The isolate and migrate scanners should never isolate more than a pageblock
of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build.
Signed-off-by: Mel Gorman
---
mm/internal.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
This is a very preliminary RFC. I'm posting this early as the
__GFP_THISNODE discussion continues and has started looking at the
compaction implementation and it'd be worth looking at this series
fdirst. The cc list is based on that dicussion just to make them aware
it exists. A v2 will have a
( 4.62%)
Amean fault-both-3222461.41 ( 0.00%)21415.35 ( 4.66%)
The 2-socket results are not materially different. Scan rates are
similar as expected.
Signed-off-by: Mel Gorman
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/migrate.c b
a regular user, but
> they seem to want to modify:
>
> /sys/kernel/mm/transparent_hugepage/enabled
>
Red herring in this case. Even if transparent hugepages are left as the
default, it still tries to write it stupidly. An irritating, but
harmless bug.
--
Mel Gorman
SUSE Labs
a regular user, but
> they seem to want to modify:
>
> /sys/kernel/mm/transparent_hugepage/enabled
>
Red herring in this case. Even if transparent hugepages are left as the
default, it still tries to write it stupidly. An irritating, but
harmless bug.
--
Mel Gorman
SUSE Labs
On Wed, Dec 05, 2018 at 10:08:56AM +0100, Michal Hocko wrote:
> On Tue 04-12-18 16:47:23, David Rientjes wrote:
> > On Tue, 4 Dec 2018, Mel Gorman wrote:
> >
> > > What should also be kept in mind is that we should avoid conflating
> > > locality preferences with
On Wed, Dec 05, 2018 at 10:08:56AM +0100, Michal Hocko wrote:
> On Tue 04-12-18 16:47:23, David Rientjes wrote:
> > On Tue, 4 Dec 2018, Mel Gorman wrote:
> >
> > > What should also be kept in mind is that we should avoid conflating
> > > locality preferences with
t affects the level of work the system does
as well as the overall success rate of operations (be it reclaim, THP
allocation, compaction, whatever). This is why a reproduction case that is
representative of the problem you're facing on the real workload matters
would have been helpful because then any alternative proposal could have
taken your workload into account during testing.
--
Mel Gorman
SUSE Labs
t affects the level of work the system does
as well as the overall success rate of operations (be it reclaim, THP
allocation, compaction, whatever). This is why a reproduction case that is
representative of the problem you're facing on the real workload matters
would have been helpful because then any alternative proposal could have
taken your workload into account during testing.
--
Mel Gorman
SUSE Labs
On Tue, Dec 04, 2018 at 10:45:58AM +, Mel Gorman wrote:
> I have *one* result of the series on a 1-socket machine running
> "thpscale". It creates a file, punches holes in it to create a
> very light form of fragmentation and then tries THP allocations
> using mad
On Tue, Dec 04, 2018 at 10:45:58AM +, Mel Gorman wrote:
> I have *one* result of the series on a 1-socket machine running
> "thpscale". It creates a file, punches holes in it to create a
> very light form of fragmentation and then tries THP allocations
> using mad
robably worthwhile
> > for long-term allocation success rates. It is possible to eliminate
> > fragmentation events entirely with tuning due to this patch although that
> > would require careful evaluation to determine if it's worthwhile.
> >
> > Signed-off-by: Mel Go
robably worthwhile
> > for long-term allocation success rates. It is possible to eliminate
> > fragmentation events entirely with tuning due to this patch although that
> > would require careful evaluation to determine if it's worthwhile.
> >
> > Signed-off-by: Mel Go
r to put this special case
> out of the main reclaim/compaction retry-with-increasing-priority loop
> for non-costly-order allocations that in general can't fail.
>
Again, this is accurate. Scanning/compaction costs a lot. This has improved
over time, but minimally it's unmapping pages, copying data and a bunch
of TLB flushes. During migration, any access to the data being migrated
stalls. The harm of reclaiming a little first so that the compaction is
more likely to succeed incurred fewer stalls of small magnitude in
general -- or at least it was the case when that behaviour was
developed.
--
Mel Gorman
SUSE Labs
r to put this special case
> out of the main reclaim/compaction retry-with-increasing-priority loop
> for non-costly-order allocations that in general can't fail.
>
Again, this is accurate. Scanning/compaction costs a lot. This has improved
over time, but minimally it's unmapping pages, copying data and a bunch
of TLB flushes. During migration, any access to the data being migrated
stalls. The harm of reclaiming a little first so that the compaction is
more likely to succeed incurred fewer stalls of small magnitude in
general -- or at least it was the case when that behaviour was
developed.
--
Mel Gorman
SUSE Labs
robably worthwhile
> > for long-term allocation success rates. It is possible to eliminate
> > fragmentation events entirely with tuning due to this patch although that
> > would require careful evaluation to determine if it's worthwhile.
> >
> > Signed-off-by: Mel Go
robably worthwhile
> > for long-term allocation success rates. It is possible to eliminate
> > fragmentation events entirely with tuning due to this patch although that
> > would require careful evaluation to determine if it's worthwhile.
> >
> > Signed-off-by: Mel Go
icated it would) and that disabling PSI by default is reasonably
close in terms of performance for this particular workload on this
particular machine so;
Tested-by: Mel Gorman
Thanks!
--
Mel Gorman
SUSE Labs
icated it would) and that disabling PSI by default is reasonably
close in terms of performance for this particular workload on this
particular machine so;
Tested-by: Mel Gorman
Thanks!
--
Mel Gorman
SUSE Labs
On Mon, Nov 26, 2018 at 12:32:18PM -0500, Johannes Weiner wrote:
> On Mon, Nov 26, 2018 at 04:54:47PM +0000, Mel Gorman wrote:
> > On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote:
> > > @@ -509,6 +509,15 @@ config PSI
> > >
> > > Sa
On Mon, Nov 26, 2018 at 12:32:18PM -0500, Johannes Weiner wrote:
> On Mon, Nov 26, 2018 at 04:54:47PM +0000, Mel Gorman wrote:
> > On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote:
> > > @@ -509,6 +509,15 @@ config PSI
> > >
> > > Sa
On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote:
> Hi Mel,
>
> On Mon, Nov 26, 2018 at 01:34:20PM +0000, Mel Gorman wrote:
> > Hi Johannes,
> >
> > PSI is a great idea but it does have overhead and if enabled by Kconfig
> > then it incur
On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote:
> Hi Mel,
>
> On Mon, Nov 26, 2018 at 01:34:20PM +0000, Mel Gorman wrote:
> > Hi Johannes,
> >
> > PSI is a great idea but it does have overhead and if enabled by Kconfig
> > then it incur
Vlastimil Babka correctly pointed out that the ALLOC_KSWAPD flag needs to be
applied in the !CONFIG_ZONE_DMA32 case. This is a fix for the mmotm path
mm-use-alloc_flags-to-record-if-kswapd-can-wake.patch
Signed-off-by: Mel Gorman
---
mm/page_alloc.c | 10 ++
1 file changed, 2 insertions
Vlastimil Babka correctly pointed out that the ALLOC_KSWAPD flag needs to be
applied in the !CONFIG_ZONE_DMA32 case. This is a fix for the mmotm path
mm-use-alloc_flags-to-record-if-kswapd-can-wake.patch
Signed-off-by: Mel Gorman
---
mm/page_alloc.c | 10 ++
1 file changed, 2 insertions
60] psi: cgroup support
git bisect bad 2ce7135adc9ad081aa3c49744144376ac74fea60
# first bad commit: [2ce7135adc9ad081aa3c49744144376ac74fea60] psi: cgroup
support
--
Mel Gorman
SUSE Labs
60] psi: cgroup support
git bisect bad 2ce7135adc9ad081aa3c49744144376ac74fea60
# first bad commit: [2ce7135adc9ad081aa3c49744144376ac74fea60] psi: cgroup
support
--
Mel Gorman
SUSE Labs
be claimed that this has nothing to do with ALLOC_NO_FRAGMENT.
That's true in this patch but is not true later so it's done now for
easier review to show where the flag needs to be recorded.
No functional change.
Signed-off-by: Mel Gorman
---
mm/internal.h | 1 +
mm/page_alloc.c | 25
be claimed that this has nothing to do with ALLOC_NO_FRAGMENT.
That's true in this patch but is not true later so it's done now for
easier review to show where the flag needs to be recorded.
No functional change.
Signed-off-by: Mel Gorman
---
mm/internal.h | 1 +
mm/page_alloc.c | 25
erm allocation success rate would be higher.
Signed-off-by: Mel Gorman
---
Documentation/sysctl/vm.txt | 21 +++
include/linux/mm.h | 1 +
include/linux/mmzone.h | 11 ++--
kernel/sysctl.c | 8 +++
mm/page_alloc.c | 43 +-
mm/vmscan.c
This is a preparation patch only, no functional change.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
include/linux/mmzone.h | 9 +
mm/compaction.c| 2 +-
mm/page_alloc.c| 12 ++--
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git
erm allocation success rate would be higher.
Signed-off-by: Mel Gorman
---
Documentation/sysctl/vm.txt | 21 +++
include/linux/mm.h | 1 +
include/linux/mmzone.h | 11 ++--
kernel/sysctl.c | 8 +++
mm/page_alloc.c | 43 +-
mm/vmscan.c
This is a preparation patch only, no functional change.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
include/linux/mmzone.h | 9 +
mm/compaction.c| 2 +-
mm/page_alloc.c| 12 ++--
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git
the relevance is
reduced later in the series.
Overall, the patch reduces the number of external fragmentation causing
events so the success of THP over long periods of time would be improved
for this adverse workload.
Signed-off-by: Mel Gorman
---
mm/inte
There are some big changes due to both Vlastimil's review feedback on v4 and
some oddities spotted while answering his review. In some respects, the
series is slightly less effective but the approach is more consistent and
logical overall. The overhead is also lower from the first patch and
the relevance is
reduced later in the series.
Overall, the patch reduces the number of external fragmentation causing
events so the success of THP over long periods of time would be improved
for this adverse workload.
Signed-off-by: Mel Gorman
---
mm/inte
There are some big changes due to both Vlastimil's review feedback on v4 and
some oddities spotted while answering his review. In some respects, the
series is slightly less effective but the approach is more consistent and
logical overall. The overhead is also lower from the first patch and
n be enough for
kswapd to catch up. How much that helps is variable but probably worthwhile
for long-term allocation success rates. It is possible to eliminate
fragmentation events entirely with tuning due to this patch although that
would require careful evaluation to determine if it's worthwhil
n be enough for
kswapd to catch up. How much that helps is variable but probably worthwhile
for long-term allocation success rates. It is possible to eliminate
fragmentation events entirely with tuning due to this patch although that
would require careful evaluation to determine if it's worthwhil
On Thu, Nov 22, 2018 at 06:02:10PM +0100, Vlastimil Babka wrote:
> On 11/21/18 11:14 AM, Mel Gorman wrote:
> > An event that potentially causes external fragmentation problems has
> > already been described but there are degrees of severity. A "serious"
> > even
On Thu, Nov 22, 2018 at 06:02:10PM +0100, Vlastimil Babka wrote:
> On 11/21/18 11:14 AM, Mel Gorman wrote:
> > An event that potentially causes external fragmentation problems has
> > already been described but there are degrees of severity. A "serious"
> > even
sn't seem worth the trouble.
Indeed. While it works in some cases, it'll be full of holes and while
I could close them, it just turns into a subtle mess. I've prepared a
preparation path that encodes __GFP_KSWAPD_RECLAIM in alloc_flags and checks
based on that. It's a lot cleaner overall, it's less of a mess than passing
gfp_flags all the way through for one test and there are fewer side-effects.
Thanks!
--
Mel Gorman
SUSE Labs
sn't seem worth the trouble.
Indeed. While it works in some cases, it'll be full of holes and while
I could close them, it just turns into a subtle mess. I've prepared a
preparation path that encodes __GFP_KSWAPD_RECLAIM in alloc_flags and checks
based on that. It's a lot cleaner overall, it's less of a mess than passing
gfp_flags all the way through for one test and there are fewer side-effects.
Thanks!
--
Mel Gorman
SUSE Labs
But returning 0 here means
> actually allowing the allocation go through steal_suitable_fallback()?
> So should it return ALLOC_NOFRAGMENT below, or was the intent different?
>
I want to avoid waking kswapd in steal_suitable_fallback if waking
kswapd is not allowed. If the calling context does not allow it, it does
mean that fragmentation will be allowed to occur. I'm banking on it
being a relatively rare case but potentially it'll be problematic. The
main source of allocation requests that I expect to hit this are THP and
as they are already at pageblock_order, it has limited impact from a
fragmentation perspective -- particularly as pageblock_order stealing is
allowed even with ALLOC_NOFRAGMENT.
--
Mel Gorman
SUSE Labs
But returning 0 here means
> actually allowing the allocation go through steal_suitable_fallback()?
> So should it return ALLOC_NOFRAGMENT below, or was the intent different?
>
I want to avoid waking kswapd in steal_suitable_fallback if waking
kswapd is not allowed. If the calling context does not allow it, it does
mean that fragmentation will be allowed to occur. I'm banking on it
being a relatively rare case but potentially it'll be problematic. The
main source of allocation requests that I expect to hit this are THP and
as they are already at pageblock_order, it has limited impact from a
fragmentation perspective -- particularly as pageblock_order stealing is
allowed even with ALLOC_NOFRAGMENT.
--
Mel Gorman
SUSE Labs
zoneref *z = ac->preferred_zoneref;
> > struct zone *zone;
> > struct pglist_data *last_pgdat_dirty_limit = NULL;
> > + bool no_fallback;
> >
> > +retry:
>
> Ugh, I think 'z = ac->preferred_zoneref' should be moved here under
> retry. AFAICS without that, the preference of local node to
> fragmentation avoidance doesn't work?
>
Yup, you're right!
In the event of fragmentation of both normal and dma32 zone, it doesn't
restart on the local node and instead falls over to the remote node
prematurely. This is obviously not desirable. I'll give it and thanks
for spotting it.
--
Mel Gorman
SUSE Labs
zoneref *z = ac->preferred_zoneref;
> > struct zone *zone;
> > struct pglist_data *last_pgdat_dirty_limit = NULL;
> > + bool no_fallback;
> >
> > +retry:
>
> Ugh, I think 'z = ac->preferred_zoneref' should be moved here under
> retry. AFAICS without that, the preference of local node to
> fragmentation avoidance doesn't work?
>
Yup, you're right!
In the event of fragmentation of both normal and dma32 zone, it doesn't
restart on the local node and instead falls over to the remote node
prematurely. This is obviously not desirable. I'll give it and thanks
for spotting it.
--
Mel Gorman
SUSE Labs
No major change from v3 really, mostly resending to see if there is any
review reaction. It's rebased but a partial test indicated that the
behaviour is similar to the previous baseline
Changelog since v3
o Rebase to 4.20-rc3
o Remove a stupid warning from the last patch
Changelog since v2
o
No major change from v3 really, mostly resending to see if there is any
review reaction. It's rebased but a partial test indicated that the
behaviour is similar to the previous baseline
Changelog since v3
o Rebase to 4.20-rc3
o Remove a stupid warning from the last patch
Changelog since v2
o
This is a preparation patch only, no functional change.
Signed-off-by: Mel Gorman
---
include/linux/mmzone.h | 9 +
mm/compaction.c| 2 +-
mm/page_alloc.c| 12 ++--
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/include/linux/mmzone.h b
This is a preparation patch only, no functional change.
Signed-off-by: Mel Gorman
---
include/linux/mmzone.h | 9 +
mm/compaction.c| 2 +-
mm/page_alloc.c| 12 ++--
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/include/linux/mmzone.h b
701 - 800 of 10256 matches
Mail list logo