Re: [PATCH 1/2] sched: Move '\n' to the prior seq_printf in show_schedstat()

2020-12-03 Thread Mel Gorman
On Thu, Dec 03, 2020 at 02:46:23PM +0800, Yunfeng Ye wrote: > A little clean up that moving the '\n' to the prior seq_printf. and > remove the separate seq_printf which for line breaks. > > No functional changes. > > Signed-off-by: Yunfeng Ye Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [PATCH 2/2] sched: Split the function show_schedstat()

2020-12-03 Thread Mel Gorman
I could understand if there was a follow-up patch that adjusted some subset or there was a difference in checking for schedstat_enabled, locking or inserting new schedstat information. This can happen in the general case when the end result is easier to review here it seems to be just moving code aroun

Re: [PATCH -V6 RESEND 2/3] NOT kernel/man-pages: man2/set_mempolicy.2: Add mode flag MPOL_F_NUMA_BALANCING

2020-12-03 Thread Mel Gorman
go in this direction, MPOL_F_NUMA_BALANCING should be chcked against MPOL_INTERLEAVE and explicitly fail now so suport can be detected at runtime. > So, I prefer to make MPOL_F_NUMA_BALANCING to be > > Optimizing with NUMA balancing if possible, and we may add more > optimization in the future. > Maybe, but I think it's best that the actual behaviour of the kernel is documented instead of desired behaviour or future planning. -- Mel Gorman SUSE Labs

[tip: sched/core] sched/fair: Clear SMT siblings after determining the core is not idle

2020-12-03 Thread tip-bot2 for Mel Gorman
The following commit has been merged into the sched/core branch of tip: Commit-ID: 82b738de57d571cd366d89e75b5fd60f3060852b Gitweb: https://git.kernel.org/tip/82b738de57d571cd366d89e75b5fd60f3060852b Author:Mel Gorman AuthorDate:Mon, 30 Nov 2020 14:40:20 Committer

Re: [RFC] Documentation/scheduler/schedutil.txt

2020-12-02 Thread Mel Gorman
person to talk about HWP; even though I work for > Intel I know remarkably little of it. I don't even think I've got a > machine that has it on. > > Latest version below... I'll probably send it as a patch soon and get it > merged. We can always muck with it more later. > True. At least any confusion can then be driven by specific questions :) FWIW, after reading the new version I'll ack the patch when it shows up. Thanks! -- Mel Gorman SUSE Labs

Re: [sched/fair] 8d86968ac3: netperf.Throughput_tps -29.5% regression

2020-12-02 Thread Mel Gorman
re to be disabled to see if it fixes > > it. Over time, that might give an idea of exactly what sort of workloads > > benefit and what suffers. > > Okay, I'll add a sched_feat() for this feature. > If the series I'm preparing works out ok and your patch can be integrated, the sched_feat() may not be necessary because your patch would further reduce time complexity without worrying about when the information gets reset. -- Mel Gorman SUSE Labs

Re: [RFC] Documentation/scheduler/schedutil.txt

2020-12-02 Thread Mel Gorman
information see: kernel/sched/cpufreq_schedutil.c > > > NOTES > - > > - On low-load scenarios, where DVFS is most relevant, the 'running' numbers >will closely reflect utilization. > > - In saturated scenarios task movement will cause some transient dips, >suppose we have a CPU saturated with 4 tasks, then when we migrate a task >to an idle CPU, the old CPU will have a 'running' value of 0.75 while the >new CPU will gain 0.25. This is inevitable and time progression will >correct this. XXX do we still guarantee f_max due to no idle-time? > > - Much of the above is about avoiding DVFS dips, and independent DVFS domains >having to re-learn / ramp-up when load shifts. > -- Mel Gorman SUSE Labs

Re: [PATCH -V6 RESEND 3/3] NOT kernel/numactl: Support to enable Linux kernel NUMA balancing

2020-12-02 Thread Mel Gorman
ybe --balance-bind? The intent is to hint that it's specific to MPOL_BIND at this time. -- Mel Gorman SUSE Labs

Re: [PATCH -V6 RESEND 2/3] NOT kernel/man-pages: man2/set_mempolicy.2: Add mode flag MPOL_F_NUMA_BALANCING

2020-12-02 Thread Mel Gorman
uding a small test program in the manual page for a sequence that tests whether MPOL_F_NUMA_BALANCING works. They might have a better recommendation on how it should be handled. -- Mel Gorman SUSE Labs

Re: [PATCH -V6 RESEND 1/3] numa balancing: Migrate on fault among multiple bound nodes

2020-12-02 Thread Mel Gorman
policy. > Ok, I think this part is ok and while the test case is somewhat superficial, it at least demonstrated that the NUMA balancing overhead did not offset any potential benefit Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [PATCH] cpuidle: Select polling interval based on a c-state with a longer target residency

2020-12-01 Thread Mel Gorman
nges again resulting > > in bugs because the driver.poll parameter means something new. > > Right. > > > Using min_cstate was definitely a hazard because it showed up in both > > microbenchmarks and real workloads but you were right, lets only > > introduce a tunable when and if there is no other choice in the matter. > > > > So, informally the following patch is the next candidate. I'm happy to > > resend it as a separate mail if you prefer and think the patch is ok. > > I actually can apply it right away, so no need to resend. > Thanks very much. -- Mel Gorman SUSE Labs

Re: [PATCH 4/6] sched: make schedstats helpers independent of fair sched class

2020-12-01 Thread Mel Gorman
e helpers are in sched/stats.c, > > __update_stats_wait_*(struct rq *rq, struct task_struct *p, > struct sched_statistics *stats) > > Cc: Mel Gorman > Signed-off-by: Yafang Shao Think it's ok, it's mostly code shuffling. I'd have been happier if there was

Re: [PATCH 3/6] sched: make struct sched_statistics independent of fair sched class

2020-12-01 Thread Mel Gorman
ased overhead now when schedstats are *enabled* because _schedstat_from_sched_entity() has to be called but it appears that it is protected by a schedstat_enabled() check. So ok, schedstats when enabled are now a bit more expensive but they were expensive in the first place so does it matter? I'd have been happied if there was a comparison with schedstats enabled just in case the overhead is too high but it could also do with a second set of eyeballs. It's somewhat tentative but Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [PATCH 2/6] sched, fair: use __schedstat_set() in set_next_entity()

2020-12-01 Thread Mel Gorman
On Tue, Dec 01, 2020 at 07:54:12PM +0800, Yafang Shao wrote: > schedstat_enabled() has been already checked, so we can use > __schedstat_set() directly. > > Signed-off-by: Yafang Shao Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [PATCH] cpuidle: Select polling interval based on a c-state with a longer target residency

2020-11-30 Thread Mel Gorman
2975.88 ( 0.00%) 1059.73 * 8.59%* Hmean 4 1953.97 ( 0.00%) 2081.37 * 6.52%* Hmean 8 3645.76 ( 0.00%) 4052.95 * 11.17%* Hmean 16 6882.21 ( 0.00%) 6995.93 * 1.65%* Hmean 32 10752.20 ( 0.00%)10731.53 * -0.19%* Hm

[PATCH] sched/fair: Clear SMT siblings after determining the core is not idle

2020-11-30 Thread Mel Gorman
-off-by: Mel Gorman --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0d54d69ba1a5..d9acd55d309b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6087,10 +6087,11 @@ static int select_idle_core

[PATCH] cpuidle: Select polling interval based on a c-state with a longer target residency

2020-11-30 Thread Mel Gorman
20580.64 * -1.12%* Signed-off-by: Mel Gorman --- Documentation/admin-guide/kernel-parameters.txt | 18 + drivers/cpuidle/cpuidle.c | 49 - 2 files changed, 65 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kern

Re: [PATCH] cpuidle: Allow configuration of the polling interval before cpuidle enters a c-state

2020-11-27 Thread Mel Gorman
etal. If that is not the case, then it would make more sense that the haltpoll adaptive logic would simply be part of the core and not a per-governor decision. -- Mel Gorman SUSE Labs

Re: [PATCH] cpuidle: Allow configuration of the polling interval before cpuidle enters a c-state

2020-11-27 Thread Mel Gorman
nd increases exit latency that it would be unpopular on bare metal. I'm prototyping a v2 that simply picks different balance point to see if that gets a better reception. -- Mel Gorman SUSE Labs

Re: [PATCH] cpuidle: Allow configuration of the polling interval before cpuidle enters a c-state

2020-11-27 Thread Mel Gorman
On Thu, Nov 26, 2020 at 08:31:51PM +, Mel Gorman wrote: > > > and it is reasonable behaviour but it should be tunable. > > > > Only if there is no way to cover all of the relevant use cases in a > > generally acceptable way without adding more module params etc.

Re: [PATCH] cpuidle: Allow configuration of the polling interval before cpuidle enters a c-state

2020-11-26 Thread Mel Gorman
On Thu, Nov 26, 2020 at 07:24:41PM +0100, Rafael J. Wysocki wrote: > On Thu, Nov 26, 2020 at 6:25 PM Mel Gorman > wrote: > > It was noted that a few workloads that idle rapidly regressed when commit > > 36fcb4292473 ("cpuidle: use first valid target residency as pol

[PATCH] cpuidle: Allow configuration of the polling interval before cpuidle enters a c-state

2020-11-26 Thread Mel Gorman
0%)12608.15 * -0.51%* Hmean 12820744.83 ( 0.00%)21147.02 * 1.94%* Hmean 25620646.60 ( 0.00%)20608.48 * -0.18%* Hmean 32020892.89 ( 0.00%) 20831.99 * -0.29%* Signed-off-by: Mel Gorman --- Documentation/admin-guide/kernel-parameters.txt

Re: [sched/fair] 8d86968ac3: netperf.Throughput_tps -29.5% regression

2020-11-26 Thread Mel Gorman
easy enough to ask for the feature to be disabled to see if it fixes it. Over time, that might give an idea of exactly what sort of workloads benefit and what suffers. Note that the cost of select_idle_cpu() can also be reduced by enabling SIS_AVG_CPU so it would be interesting to know if the idle mask is superior or inferior to SIS_AVG_CPU for workloads that show regressions. -- Mel Gorman SUSE Labs

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-26 Thread Mel Gorman
. > > As far as I understood Mel, rounding these ranges up/down to cover full > MAX_ORDER blocks/pageblocks might work. > Yes, round down the lower end of the hole and round up the higher end to the MAX_ORDER boundary for the purposes of having valid zone/node linkages even if the underlying

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-26 Thread Mel Gorman
On Wed, Nov 25, 2020 at 12:59:58PM -0500, Andrea Arcangeli wrote: > On Wed, Nov 25, 2020 at 10:30:53AM +0000, Mel Gorman wrote: > > On Tue, Nov 24, 2020 at 03:56:22PM -0500, Andrea Arcangeli wrote: > > > Hello, > > > > > > On Tue, Nov 24, 2020 at 01:32:05P

[tip: sched/core] sched/numa: Allow a floating imbalance between NUMA nodes

2020-11-25 Thread tip-bot2 for Mel Gorman
The following commit has been merged into the sched/core branch of tip: Commit-ID: 7d2b5dd0bcc48095651f1b85f751eef610b3e034 Gitweb: https://git.kernel.org/tip/7d2b5dd0bcc48095651f1b85f751eef610b3e034 Author:Mel Gorman AuthorDate:Fri, 20 Nov 2020 09:06:29 Committer

[tip: sched/core] sched: Limit the amount of NUMA imbalance that can exist at fork time

2020-11-25 Thread tip-bot2 for Mel Gorman
The following commit has been merged into the sched/core branch of tip: Commit-ID: 23e6082a522e32232f7377540b4d42d8304253b8 Gitweb: https://git.kernel.org/tip/23e6082a522e32232f7377540b4d42d8304253b8 Author:Mel Gorman AuthorDate:Fri, 20 Nov 2020 09:06:30 Committer

[tip: sched/core] sched/numa: Rename nr_running and break out the magic number

2020-11-25 Thread tip-bot2 for Mel Gorman
The following commit has been merged into the sched/core branch of tip: Commit-ID: abeae76a47005aa3f07c9be12d8076365622e25c Gitweb: https://git.kernel.org/tip/abeae76a47005aa3f07c9be12d8076365622e25c Author:Mel Gorman AuthorDate:Fri, 20 Nov 2020 09:06:27 Committer

[tip: sched/core] sched: Avoid unnecessary calculation of load imbalance at clone time

2020-11-25 Thread tip-bot2 for Mel Gorman
The following commit has been merged into the sched/core branch of tip: Commit-ID: 5c339005f854fa75aa46078ad640919425658b3e Gitweb: https://git.kernel.org/tip/5c339005f854fa75aa46078ad640919425658b3e Author:Mel Gorman AuthorDate:Fri, 20 Nov 2020 09:06:28 Committer

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-25 Thread Mel Gorman
On Wed, Nov 25, 2020 at 12:04:15PM +0100, David Hildenbrand wrote: > On 25.11.20 11:39, Mel Gorman wrote: > > On Wed, Nov 25, 2020 at 07:45:30AM +0100, David Hildenbrand wrote: > >>> Something must have changed more recently than v5.1 that caused the > >>> zon

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-25 Thread Mel Gorman
ed page. I agree that compaction should not be returning pfns that are outside of the zone range because that is buggy in itself but valid struct pages should have valid information. I don't think we want to paper over that with unnecessary PageReserved checks. -- Mel Gorman SUSE Labs

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-25 Thread Mel Gorman
On Tue, Nov 24, 2020 at 03:56:22PM -0500, Andrea Arcangeli wrote: > Hello, > > On Tue, Nov 24, 2020 at 01:32:05PM +0000, Mel Gorman wrote: > > I would hope that is not the case because they are not meant to overlap. > > However, if the beginning of the pageblock was no

Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages

2020-11-24 Thread Mel Gorman
s-zone which would be surprising. Maybe this untested patch? diff --git a/mm/compaction.c b/mm/compaction.c index 13cb7a961b31..ef1b5dacc289 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1330,6 +1330,10 @@ fast_isolate_freepages(struct compact_control *cc) low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2)); min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1)); + /* Ensure the PFN is within the zone */ + low_pfn = max(cc->zone->zone_start_pfn, low_pfn); + min_pfn = max(cc->zone->zone_start_pfn, min_pfn); + if (WARN_ON_ONCE(min_pfn > low_pfn)) low_pfn = min_pfn; -- Mel Gorman SUSE Labs

Re: [RFC PATCH v2 3/5] sched: make schedstats helper independent of cfs_rq

2020-11-24 Thread Mel Gorman
it_start(struct cfs_rq *cfs_rq, struct sched_entity *se) { if (!schedstat_enabled()) return; __update_stats_wait_start(cfs_rq, se); } where __update_stats_wait_start then lives in kernel/sched/stats.c -- Mel Gorman SUSE Labs

Re: [PATCH v3 0/4] Revisit NUMA imbalance tolerance and fork balancing

2020-11-20 Thread Mel Gorman
On Fri, Nov 20, 2020 at 01:58:11PM +0100, Peter Zijlstra wrote: > On Fri, Nov 20, 2020 at 09:06:26AM +0000, Mel Gorman wrote: > > > Mel Gorman (4): > > sched/numa: Rename nr_running and break out the magic number > > sched: Avoid unnecessary calculation of load

[PATCH 3/4] sched/numa: Allow a floating imbalance between NUMA nodes

2020-11-20 Thread Mel Gorman
-off-by: Mel Gorman --- kernel/sched/fair.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9aded12aaa90..e17e6c5da1d5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1550,7 +1550,8 @@ struct

[PATCH v3 0/4] Revisit NUMA imbalance tolerance and fork balancing

2020-11-20 Thread Mel Gorman
ed as long as there are idle CPUs until the load balancer gets involved. This caused serious problems with a real workload that unfortunately I cannot share many details about but there is a proxy reproducer. -- 2.26.2 Mel Gorman (4): sched/numa: Rename nr_running and break out

[PATCH 2/4] sched: Avoid unnecessary calculation of load imbalance at clone time

2020-11-20 Thread Mel Gorman
confusing one type of imbalance with another depending on the group_type in the next patch. No functional change. Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index

[PATCH 1/4] sched/numa: Rename nr_running and break out the magic number

2020-11-20 Thread Mel Gorman
This is simply a preparation patch to make the following patches easier to read. No functional change. Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6d78b68847f9

[PATCH 4/4] sched: Limit the amount of NUMA imbalance that can exist at fork time

2020-11-20 Thread Mel Gorman
enerally performance was either neutral or better in the tests conducted. The main consideration with this patch is the point where fork stops spreading a task so some workloads may benefit from different balance points but it would be a risky tuning parameter. Signed-off-by: Mel Gorman --- kern

Re: [patch V4 4/8] sched: Make migrate_disable/enable() independent of RT

2020-11-19 Thread Mel Gorman
On Thu, Nov 19, 2020 at 12:14:11PM +0100, Peter Zijlstra wrote: > On Thu, Nov 19, 2020 at 09:38:34AM +0000, Mel Gorman wrote: > > On Wed, Nov 18, 2020 at 08:48:42PM +0100, Thomas Gleixner wrote: > > > From: Thomas Gleixner > > > > > > Now that the scheduler ca

Re: [patch V4 4/8] sched: Make migrate_disable/enable() independent of RT

2020-11-19 Thread Mel Gorman
ds sharing the same address space. I know it can have other examples that are rt-specific and some tricks on percpu page alloc draining that relies on a combination of migrate_disable and interrupt disabling to protect the structures but the above example might be understandable to a non-RT audience.

Re: [patch V4 2/8] mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP

2020-11-19 Thread Mel Gorman
> > because this only works on architectures which do not have cache aliasing > > problems. > > Very good. And you made sure to have a comment to not enable it for > production systems. > > Hopefully people will even read it ;) > And not start thinking it as a security hardening option. -- Mel Gorman SUSE Labs

[PATCH v2 0/4] Revisit NUMA imbalance tolerance and fork balancing

2020-11-19 Thread Mel Gorman
Changelog since v1 o Split out patch that moves imbalance calculation o Strongly connect fork imbalance considerations with adjust_numa_imbalance When NUMA and CPU balancing were reconciled, there was an attempt to allow a degree of imbalance but it caused more problems than it solved. Instead,

[PATCH 3/4] sched/numa: Allow a floating imbalance between NUMA nodes

2020-11-19 Thread Mel Gorman
-off-by: Mel Gorman --- kernel/sched/fair.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9aded12aaa90..e17e6c5da1d5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1550,7 +1550,8 @@ struct

[PATCH 1/4] sched/numa: Rename nr_running and break out the magic number

2020-11-19 Thread Mel Gorman
This is simply a preparation patch to make the following patches easier to read. No functional change. Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6d78b68847f9

[PATCH 4/4] sched: Limit the amount of NUMA imbalance that can exist at fork time

2020-11-19 Thread Mel Gorman
enerally performance was either neutral or better in the tests conducted. The main consideration with this patch is the point where fork stops spreading a task so some workloads may benefit from different balance points but it would be a risky tuning parameter. Signed-off-by: Mel Gorman --- kern

[PATCH 2/4] sched: Avoid unnecessary calculation of load imbalance at clone time

2020-11-19 Thread Mel Gorman
confusing one type of imbalance with another depending on the group_type in the next patch. No functional change. Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index

Re: [RFC -V5] autonuma: Migrate on fault among multiple bound nodes

2020-11-18 Thread Mel Gorman
> Yes. > Needs to be documented so applications know they can recover. Also needs to be determined how numactl should behave if the flag does not exist. Likely it will simply fail in which case the error should be clear. > > In that case, a manual page is defintely needed to > > explain that an error can be returned if the flag is used and the kernel > > does not support it so the application can cover by falling back to a > > strict binding. If it fails silently, then that also needs to be documented > > because it'll lead to different behaviour depending on the running > > kernel. > > Sure. Will describe this in the manual page. > Thanks. -- Mel Gorman SUSE Labs

Re: [RFC PATCH 2/4] sched: make schedstats helpers not depend on cfs_rq

2020-11-18 Thread Mel Gorman
hit regardless of intrest in schedstat. Regardless of the merit of adding schedstats for RT, the overhead of schedstats when stats are disabled should remain the same with the static branch check done in an inline function. -- Mel Gorman SUSE Labs

Re: [PATCH 3/3] sched/numa: Limit the amount of imbalance that can exist at fork time

2020-11-18 Thread Mel Gorman
previously, but this change seems unrelated to > rest of this patch and could deserve a dedicated patch > I can split it out as a preparation patch. It can be treated as a trivial micro-optimisation to avoid an unnecessary calculation of the imbalance for group_overloaded/group_fully_busy. The real motiviation is to avoid confusing the group_overloaded/group_fully_busy imbalance with allow_numa_imbalance and thinking they are somehow directly related to each other. -- Mel Gorman SUSE Labs

Re: [RFC -V5] autonuma: Migrate on fault among multiple bound nodes

2020-11-18 Thread Mel Gorman
is supported by the current running kernel? It looks like they might receive -EINVAL (didn't check for sure). In that case, a manual page is defintely needed to explain that an error can be returned if the flag is used and the kernel does not support it so the application can cover by falling back to

Re: [PATCH 3/3] sched/numa: Limit the amount of imbalance that can exist at fork time

2020-11-17 Thread Mel Gorman
On Tue, Nov 17, 2020 at 04:53:10PM +0100, Vincent Guittot wrote: > On Tue, 17 Nov 2020 at 16:17, Mel Gorman wrote: > > > > On Tue, Nov 17, 2020 at 03:31:19PM +0100, Vincent Guittot wrote: > > > On Tue, 17 Nov 2020 at 15:18, Peter Zijlstra wrote: > > > > > &

Re: [PATCH 3/3] sched/numa: Limit the amount of imbalance that can exist at fork time

2020-11-17 Thread Mel Gorman
On Tue, Nov 17, 2020 at 03:31:19PM +0100, Vincent Guittot wrote: > On Tue, 17 Nov 2020 at 15:18, Peter Zijlstra wrote: > > > > On Tue, Nov 17, 2020 at 01:42:22PM +0000, Mel Gorman wrote: > > > - if (local_sgs.idle_cpus) > > > +

Re: [PATCH 2/3] sched/numa: Allow a floating imbalance between NUMA nodes

2020-11-17 Thread Mel Gorman
On Tue, Nov 17, 2020 at 03:24:56PM +0100, Vincent Guittot wrote: > On Tue, 17 Nov 2020 at 14:42, Mel Gorman wrote: > > > > Currently, an imbalance is only allowed when a destination node > > is almost completely idle. This solved one basic class of problems > > an

Re: [PATCH 2/3] sched/numa: Allow a floating imbalance between NUMA nodes

2020-11-17 Thread Mel Gorman
On Tue, Nov 17, 2020 at 03:16:03PM +0100, Peter Zijlstra wrote: > On Tue, Nov 17, 2020 at 01:42:21PM +0000, Mel Gorman wrote: > > This patch revisits the possibility that NUMA nodes can be imbalanced > > until 25% of the CPUs are occupied. The reasoning behind 25% is somewhat

[PATCH 1/3] sched/numa: Rename nr_running and break out the magic number

2020-11-17 Thread Mel Gorman
This is simply a prepartion patch to make the following patches easier to read. No functional change. Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6d78b68847f9

[PATCH 3/3] sched/numa: Limit the amount of imbalance that can exist at fork time

2020-11-17 Thread Mel Gorman
ted. Note that the main consideration with this patch is the point where fork stops spreading a task. Some workloads may benefit from different balance points but it would be a risky tuning parameter. Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 17 ++--- 1 file changed, 10 insert

[PATCH 2/3] sched/numa: Allow a floating imbalance between NUMA nodes

2020-11-17 Thread Mel Gorman
-off-by: Mel Gorman --- kernel/sched/fair.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5fbed29e4001..33709dfac24d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1550,7 +1550,8 @@ struct

[RFC PATCH 0/3] Revisit NUMA imbalance tolerance and fork balancing

2020-11-17 Thread Mel Gorman
When NUMA and CPU balancing were reconciled, there was an attempt to allow a degree of imbalance but it caused more problems than it solved. Instead, imbalance was only allowed with an almost idle NUMA domain. A lot of the problems have since been addressed so it's time for a revisit. There is

Re: [PATCH] sched: Fix data-race in wakeup

2020-11-17 Thread Mel Gorman
"sched/core: Optimize ttwu() spinning on p->on_cpu") > Reported-by: Mel Gorman > Signed-off-by: Peter Zijlstra (Intel) Thanks, testing completed successfully! With the suggested alternative comment above sched_remote_wakeup; Reviewed-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [PATCH] sched: Fix rq->nr_iowait ordering

2020-11-17 Thread Mel Gorman
> Reported-by: Tejun Heo > Signed-off-by: Peter Zijlstra (Intel) s/Fixes: Fixes:/Fixes:/ Ok, very minor hazard that the same logic gets duplicated that someone might try "fix" but git blame should help. Otherwise, it makes sense as I've received more than one "bug"

Re: Loadavg accounting error on arm64

2020-11-16 Thread Mel Gorman
k in the "anti-guarantees" section of memory-barriers.txt :( sched_psi_wake_requeue can probably stay with the other three fields given they are under the rq lock but sched_remote_wakeup needs to move out. -- Mel Gorman SUSE Labs

Re: Loadavg accounting error on arm64

2020-11-16 Thread Mel Gorman
o I'm now both confused and wondering why smp_wmb made a difference at all. -- Mel Gorman SUSE Labs

Re: Loadavg accounting error on arm64

2020-11-16 Thread Mel Gorman
865 3653 2.33 292.74 672.24 1/861 3709 0.85 239.46 630.17 1/859 3711 0.31 195.87 590.73 1/857 3713 0.11 160.22 553.76 1/853 3715 Without the patch, the load average stabilised at 244 on an otherwise idle machine. Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu&q

Re: Loadavg accounting error on arm64

2020-11-16 Thread Mel Gorman
On Mon, Nov 16, 2020 at 03:29:46PM +, Mel Gorman wrote: > I did, it was the on_cpu ordering for the blocking case that had me > looking at the smp_store_release and smp_cond_load_acquire in arm64 in > the first place thinking that something in there must be breaking the > on_cpu o

Re: Loadavg accounting error on arm64

2020-11-16 Thread Mel Gorman
sched_migrated:1; > + unsigned:0; > unsignedsched_remote_wakeup:1; > + unsigned :0; > #ifdef CONFIG_PSI > unsignedsched_psi_wake_requeue:1; > #endif I'll test this after the smp_wmb() test completes. While a clobbering may be the issue, I also think the gap between the rq->nr_uninterruptible++ and smp_store_release(prev->on_cpu, 0) is relevant and a better candidate. -- Mel Gorman SUSE Labs

Re: Loadavg accounting error on arm64

2020-11-16 Thread Mel Gorman
On Mon, Nov 16, 2020 at 01:58:03PM +0100, Peter Zijlstra wrote: > On Mon, Nov 16, 2020 at 01:53:55PM +0100, Peter Zijlstra wrote: > > On Mon, Nov 16, 2020 at 11:49:38AM +0000, Mel Gorman wrote: > > > On Mon, Nov 16, 2020 at 09:10:54AM +, Mel Gorman wrote: > > > &g

Re: Loadavg accounting error on arm64

2020-11-16 Thread Mel Gorman
On Mon, Nov 16, 2020 at 01:11:03PM +, Will Deacon wrote: > On Mon, Nov 16, 2020 at 09:10:54AM +0000, Mel Gorman wrote: > > I got cc'd internal bug report filed against a 5.8 and 5.9 kernel > > that loadavg was "exploding" on arch64 on a machines acting as a build

Re: Loadavg accounting error on arm64

2020-11-16 Thread Mel Gorman
On Mon, Nov 16, 2020 at 01:46:57PM +0100, Peter Zijlstra wrote: > On Mon, Nov 16, 2020 at 09:10:54AM +0000, Mel Gorman wrote: > > Similarly, it's not clear why the arm64 implementation > > does not call smp_acquire__after_ctrl_dep in the smp_load_acquire > > impl

Re: Loadavg accounting error on arm64

2020-11-16 Thread Mel Gorman
On Mon, Nov 16, 2020 at 11:49:38AM +, Mel Gorman wrote: > On Mon, Nov 16, 2020 at 09:10:54AM +0000, Mel Gorman wrote: > > I'll be looking again today to see can I find a mistake in the ordering for > > how sched_contributes_to_load is handled but again, the lack of knowledge &

Re: Loadavg accounting error on arm64

2020-11-16 Thread Mel Gorman
On Mon, Nov 16, 2020 at 09:10:54AM +, Mel Gorman wrote: > I'll be looking again today to see can I find a mistake in the ordering for > how sched_contributes_to_load is handled but again, the lack of knowledge > on the arm64 memory model means I'm a bit stuck and a second set of eye

Re: [PATCH v41 10/24] mm: Add 'mprotect' hook to struct vm_operations_struct

2020-11-16 Thread Mel Gorman
n for it. Protections on the filesystem level are for the inode, I can't imagine what a filesystem would do with a protection change on the page table level but maybe I'm not particularly imaginative today. -- Mel Gorman SUSE Labs

Loadavg accounting error on arm64

2020-11-16 Thread Mel Gorman
stake in the ordering for how sched_contributes_to_load is handled but again, the lack of knowledge on the arm64 memory model means I'm a bit stuck and a second set of eyes would be nice :( -- Mel Gorman SUSE Labs

Re: [PATCH v41 10/24] mm: Add 'mprotect' hook to struct vm_operations_struct

2020-11-13 Thread Mel Gorman
rm this > permission comparison at mmap() time. However, there is no corresponding > ->mprotect() hook. > > Solution > > > Add a vm_ops->mprotect() hook so that mprotect() operations which are > inconsistent with any page's stashed intent can be rejected by t

Re: [PATCH] sched/fair: ensure tasks spreading in LLC during LB

2020-11-12 Thread Mel Gorman
9.5th: 94 > 99.9th: 94 > min=0, max=94 > > Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()") > Reported-by: Chris Mason > Suggested-by: Rik van Riel > Signed-off-by: Vincent Guittot Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [PATCH] sched/topology: Warn when NUMA diameter > 2

2020-11-11 Thread Mel Gorman
dst=5,val=20 > > Link: https://lore.kernel.org/lkml/jhjtux5edo2.mog...@arm.com > Signed-off-by: Valentin Schneider Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal

2020-11-10 Thread Mel Gorman
ely. Assuming schedutil gets the default, it may be necessary to either have a tunable or a kconfig that affects the initial PELT signal as to whether it should start low and ramp up, pick a midpoint or start high and scale down. -- Mel Gorman SUSE Labs

Re: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal

2020-11-09 Thread Mel Gorman
On Mon, Nov 09, 2020 at 10:24:11AM -0500, Phil Auld wrote: > Hi, > > On Fri, Nov 06, 2020 at 04:00:10PM + Mel Gorman wrote: > > On Fri, Nov 06, 2020 at 02:33:56PM +0100, Vincent Guittot wrote: > > > On Fri, 6 Nov 2020 at 13:03, Mel Gorman > > > wrote: >

Re: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal

2020-11-06 Thread Mel Gorman
m guessing "[PATCH v2] sched/fair: check for idle core" > > Yes, Sorry I sent my answer before adding the link > Grand, that's added to the mix on top to see how both patches measure up versus a revert. No guarantee I'll have full results by Monday. As usual, the test grid is loaded up to the eyeballs. -- Mel Gorman SUSE Labs

Re: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal

2020-11-06 Thread Mel Gorman
On Fri, Nov 06, 2020 at 02:33:56PM +0100, Vincent Guittot wrote: > On Fri, 6 Nov 2020 at 13:03, Mel Gorman wrote: > > > > On Wed, Nov 04, 2020 at 09:42:05AM +0000, Mel Gorman wrote: > > > While it's possible that some other factor masked the impact of the patch, >

Re: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal

2020-11-06 Thread Mel Gorman
On Wed, Nov 04, 2020 at 09:42:05AM +, Mel Gorman wrote: > While it's possible that some other factor masked the impact of the patch, > the fact it's neutral for two workloads in 5.10-rc2 is suspicious as it > indicates that if the patch was implemented against 5.10-rc2, it would

Re: [PATCH v40 10/24] mm: Add 'mprotect' hook to struct vm_operations_struct

2020-11-06 Thread Mel Gorman
-specific permission * checks need to be made before the mprotect is finalised. * No modifications should be done to the VMA, returns 0 * if the mprotect is permitted. */ int (*mprotect)(struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long newflags); If a future driver *does* need to poke deeper into the VM for mprotect then at least they'll have to explain why that's a good idea. -- Mel Gorman SUSE Labs

Re: [PATCH -V2 2/2] autonuma: Migrate on fault among multiple bound nodes

2020-11-05 Thread Mel Gorman
able performance as it'll have variable performance with limited control of the "important" applications that should use DRAM over PMEM. That's a long road but the step is not incompatible with the long-term goal. -- Mel Gorman SUSE Labs

Re: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal

2020-11-04 Thread Mel Gorman
so visible from page fault microbenchmarks that scale the number of threads. It's a vaguely similar class of problem but the patches are taking very different approaches. It'd been in my mind to consider reconciling that chunk with the adjust_numa_imbalance but had not gotten around to seeing how

Re: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal

2020-11-04 Thread Mel Gorman
ly not have been merged. I've queued the tests on the remaining machines to see if something more conclusive falls out. -- Mel Gorman SUSE Labs

Re: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal

2020-11-02 Thread Mel Gorman
sts, masking from cpufreq changes, phase of the moon and just general plain old bad luck. > I'll try to see if we can run some direct 5.8 - 5.9 tests like these. > That would be nice. While I often see false positive bisections for performance bugs, the number of identical reports and different machines made this more suspicious. -- Mel Gorman SUSE Labs

Re: [PATCH -V2 2/2] autonuma: Migrate on fault among multiple bound nodes

2020-11-02 Thread Mel Gorman
binding to one local node. If it shows up in regressions, it'll be interesting to get a detailed description of the workload. Pay particular attention to if THP is disabled as I learned relatively recently that NUMA balancing with THP disabled has higher overhead (which is hardly surprising). Lacking data or a specific objection Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [sched/fair] d8fcb81f1a: netperf.Throughput_tps -16.9% regression

2020-11-02 Thread Mel Gorman
he patch is a loss. I also had run reaim but not specifically for one sub-test like this does and the generation of machines used was much older than Gold 5318H. Grid or no grid, complete coverage is a challenge -- Mel Gorman SUSE Labs

Re: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal

2020-11-02 Thread Mel Gorman
ches so the sched domain groups it looks at are smaller than happens on other machines. Given the number of machines and workloads affected, can we revert and retry? I have not tested against current mainline as scheduler details have changed again but I can do so if desired. -- Mel Gorman SUSE Labs

Re: [PATCH v2] sched/fair: check for idle core

2020-10-27 Thread Mel Gorman
on the same machine were fine so overall; Acked-by: Mel Gorman Thanks. -- Mel Gorman SUSE Labs

Re: [PATCH v2] sched/fair: check for idle core

2020-10-23 Thread Mel Gorman
On Fri, Oct 23, 2020 at 11:21:50AM +0200, Julia Lawall wrote: > > > On Fri, 23 Oct 2020, Mel Gorman wrote: > > > On Thu, Oct 22, 2020 at 03:15:50PM +0200, Julia Lawall wrote: > > > In the case of a thread wakeup, wake_affine determines whether a core > >

Re: [PATCH v2] sched/fair: check for idle core

2020-10-23 Thread Mel Gorman
ent Guittot > In principal, I think the patch is ok after the recent discussion. I'm holding off an ack until a battery of tests on loads with different levels of utilisation and wakeup patterns makes its way through a test grid. It's based on Linus's tree mid-merge window that includes what is in the scheduler pipeline -- Mel Gorman SUSE Labs

Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core

2020-10-22 Thread Mel Gorman
ms had been nominated at that time. We have one, possibly two if Phil agrees. That's better than zero or unfairly placing the full responsibility on the Intel guys that have been testing it out. -- Mel Gorman SUSE Labs

Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core

2020-10-22 Thread Mel Gorman
n, but it's a useful safety net and a reasonable way to deprecate a feature. It's also useful for bug creation -- User X running whatever found that schedutil is worse than the old governor and had to temporarily switch back. Repeat until complaining stops and then tear out the old stuff. When/if there is a patch setting schedutil as the default, cc suitable distro people (Giovanni and myself for openSUSE). Other distros assuming they're watching can nominate their own victim. -- Mel Gorman SUSE Labs

Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core

2020-10-22 Thread Mel Gorman
On Thu, Oct 22, 2020 at 05:25:14PM +0200, Peter Zijlstra wrote: > On Thu, Oct 22, 2020 at 03:52:50PM +0100, Mel Gorman wrote: > > > There are some questions > > currently on whether schedutil is good enough when HWP is not available. > > Srinivas and Rafael will know be

Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core

2020-10-22 Thread Mel Gorman
0170803085115.r2jfz2lofy5sp...@techsingularity.net/) It's schedutil's turn :P -- Mel Gorman SUSE Labs

Re: [PATCH] sched/fair: check for idle core

2020-10-21 Thread Mel Gorman
On Wed, Oct 21, 2020 at 05:19:53PM +0200, Vincent Guittot wrote: > On Wed, 21 Oct 2020 at 17:08, Mel Gorman wrote: > > > > On Wed, Oct 21, 2020 at 03:24:48PM +0200, Julia Lawall wrote: > > > > I worry it's overkill because prev is always used if it is idle even > &

Re: [PATCH] sched/fair: check for idle core

2020-10-21 Thread Mel Gorman
e hit and miss rates are both higher but the miss rate is acceptable. As the hunk means that recent_used_cpu could have been set based on a previous wakeup, it makes it unreliable for making cross-node decisions. p->recent_used_cpu's primary purpose at the moment is to avoid select_idle_sibling searches. -- Mel Gorman SUSE Labs

Re: [PATCH] sched/fair: check for idle core

2020-10-21 Thread Mel Gorman
g at prev_cpu and this_cpu is a crude approximation and the path is heavily limited in terms of how clever it can be. -- Mel Gorman SUSE Labs

<    1   2   3   4   5   6   7   8   9   10   >