Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-10-02 Thread Jason Low
On Wed, Oct 2, 2013 at 12:19 PM, Waiman Long wrote: > On 09/26/2013 06:42 PM, Jason Low wrote: >> >> On Thu, 2013-09-26 at 14:41 -0700, Tim Chen wrote: >>> >>> Okay, that would makes sense for consistency because we always >>> first set node->lock

Re: [RFC][PATCH v4 3/3] sched: Periodically decay max cost of idle balance

2013-09-03 Thread Jason Low
On Fri, 2013-08-30 at 12:29 +0200, Peter Zijlstra wrote: > rcu_read_lock(); > for_each_domain(cpu, sd) { > + /* > + * Decay the newidle max times here because this is a regular > + * visit to all the domains. Decay ~0.5% per second. > +

Re: [RFC][PATCH v4 3/3] sched: Periodically decay max cost of idle balance

2013-09-04 Thread Jason Low
On Fri, 2013-08-30 at 12:18 +0200, Peter Zijlstra wrote: > On Thu, Aug 29, 2013 at 01:05:36PM -0700, Jason Low wrote: > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 58b0514..bba5a07 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/co

Re: [RFC][PATCH v4 3/3] sched: Periodically decay max cost of idle balance

2013-09-09 Thread Jason Low
On Mon, 2013-09-09 at 13:44 +0200, Peter Zijlstra wrote: > On Tue, Sep 03, 2013 at 11:02:59PM -0700, Jason Low wrote: > > On Fri, 2013-08-30 at 12:29 +0200, Peter Zijlstra wrote: > > > rcu_read_lock(); > > > for_each_domain(cpu, sd) { > > > + /* >

Re: [RFC][PATCH v4 3/3] sched: Periodically decay max cost of idle balance

2013-09-09 Thread Jason Low
On Mon, 2013-09-09 at 13:49 +0200, Peter Zijlstra wrote: > On Wed, Sep 04, 2013 at 12:10:01AM -0700, Jason Low wrote: > > On Fri, 2013-08-30 at 12:18 +0200, Peter Zijlstra wrote: > > > On Thu, Aug 29, 2013 at 01:05:36PM -0700, Jason Low wrote: > > > > diff --git

[PATCH v5 1/3] sched: Reduce overestimating rq->avg_idle

2013-09-13 Thread Jason Low
() first. Then, if avg_idle exceeds the max, we set it to the max. Signed-off-by: Jason Low Reviewed-by: Rik van Riel Reviewed-by: Srikar Dronamraju --- kernel/sched/core.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index

[PATCH v5 2/3] sched: Consider max cost of idle balance per sched domain

2013-09-13 Thread Jason Low
e CPU is not idle for longer than the cost to balance. Signed-off-by: Jason Low --- arch/metag/include/asm/topology.h |1 + include/linux/sched.h |1 + include/linux/topology.h |3 +++ kernel/sched/core.c |3 ++- kernel/sched/fair.c

[PATCH v5 0/3] sched: Limiting idle balance

2013-09-13 Thread Jason Low
| +23.1% | +5.1% | +0.0% shared | +3.0% | +4.5% | +1.4% -------- Jason Low (3): sched: Reduce overestimating rq->avg_idle sched: Consider max

[PATCH v5 3/3] sched: Periodically decay max cost of idle balance

2013-09-13 Thread Jason Low
v4->v5 - Increase the decay to 1% per second. - Peter rewrote much of the logic. This patch builds on patch 2 and periodically decays that max value to do idle balancing per sched domain by approximately 1% per second. Also decay the rq's max_idle_balance_cost value. Signed-off-by: J

[PATCH v4 1/3] sched: Reduce overestimating rq->avg_idle

2013-08-29 Thread Jason Low
() first. Then, if avg_idle exceeds the max, we set it to the max. Signed-off-by: Jason Low Reviewed-by: Rik van Riel --- kernel/sched/core.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 05c39f0..93b18ef 100644 --- a

[PATCH v4 2/3] sched: Consider max cost of idle balance per sched domain

2013-08-29 Thread Jason Low
average. This further reduces the chance we attempt balancing when the CPU is not idle for longer than the cost to balance. I also limited the max cost of each domain to 5*sysctl_sched_migration_cost as a way to prevent the max from becoming too inflated. Signed-off-by: Jason Low --- ar

[PATCH v4 0/3] sched: Limiting idle balance

2013-08-29 Thread Jason Low
-1.2% shared | +9.0% | +13.0% | +6.5% ---- Jason Low (3): sched: Reduce overestimating rq->avg_idle sched: Consider max cost of idle balance p

[RFC][PATCH v4 3/3] sched: Periodically decay max cost of idle balance

2013-08-29 Thread Jason Low
e with max cost to do idle balancing + sched_migration_cost. While using the max cost helps reduce overestimating the average idle, the sched_migration_cost can help account for those additional costs of idle balancing. Signed-off-by: Jason Low --- arch/metag/include/asm/topology.h |

Re: [PATCH v4 2/3] sched: Consider max cost of idle balance per sched domain

2013-09-03 Thread Jason Low
On Mon, 2013-09-02 at 12:24 +0530, Srikar Dronamraju wrote: > If we face a runq lock contention, then domain_cost can go up. > The runq lock contention could be temporary, but we carry the domain > cost forever (i.e till the next reboot). How about averaging the cost + > penalty for unsuccessful b

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-26 Thread Jason Low
On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen wrote: > We will need the MCS lock code for doing optimistic spinning for rwsem. > Extracting the MCS code from mutex.c and put into its own file allow us > to reuse this code easily for rwsem. > > Signed-off-by: Tim Chen > Signed-off-by: Davidlohr Bueso

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-26 Thread Jason Low
On Thu, 2013-09-26 at 13:06 -0700, Davidlohr Bueso wrote: > On Thu, 2013-09-26 at 12:27 -0700, Jason Low wrote: > > On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen > > wrote: > > > We will need the MCS lock code for doing optimistic spinning for rwsem. > > > Extract

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-26 Thread Jason Low
On Thu, 2013-09-26 at 13:40 -0700, Davidlohr Bueso wrote: > On Thu, 2013-09-26 at 13:23 -0700, Jason Low wrote: > > On Thu, 2013-09-26 at 13:06 -0700, Davidlohr Bueso wrote: > > > On Thu, 2013-09-26 at 12:27 -0700, Jason Low wrote: > > > > On Wed, Sep 25, 2013 at 3:1

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-26 Thread Jason Low
On Thu, 2013-09-26 at 14:41 -0700, Tim Chen wrote: > On Thu, 2013-09-26 at 14:09 -0700, Jason Low wrote: > > On Thu, 2013-09-26 at 13:40 -0700, Davidlohr Bueso wrote: > > > On Thu, 2013-09-26 at 13:23 -0700, Jason Low wrote: > > > > On Thu, 2013-09-26 at 13:0

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-26 Thread Jason Low
On Fri, 2013-09-27 at 08:02 +0200, Ingo Molnar wrote: > * Tim Chen wrote: > > > > If we prefer to optimize this a bit though, perhaps we can first move > > > the node->lock = 0 so that it gets executed after the "if (likely(prev > > > == NULL)) {}" code block and then delete "node->lock = 1" in

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Jason Low
ssignment so that it occurs after the if (likely(prev == NULL)) check. This might also help make it clearer as to how the node->locked variable is used in MCS locks. Signed-off-by: Jason Low --- include/linux/mcslock.h |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/inc

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Jason Low
On Fri, Sep 27, 2013 at 12:38 PM, Tim Chen wrote: > BTW, is the above memory barrier necessary? It seems like the xchg > instruction already provided a memory barrier. > > Now if we made the changes that Jason suggested: > > > /* Init node */ > - node->locked = 0; > node->n

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Jason Low
ry barrier so that it is before the "ACCESS_ONCE(next->locked) = 1;". Signed-off-by: Jason Low Signed-off-by: Paul E. McKenney Signed-off-by: Tim Chen --- include/linux/mcslock.h |7 +++ 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/include/linux/mcslock.h

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Jason Low
On Fri, Sep 27, 2013 at 7:19 PM, Paul E. McKenney wrote: > On Fri, Sep 27, 2013 at 04:54:06PM -0700, Jason Low wrote: >> On Fri, Sep 27, 2013 at 4:01 PM, Paul E. McKenney >> wrote: >> > Yep. The previous lock holder's smp_wmb() won't keep either the compil

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-30 Thread Jason Low
On Mon, 2013-09-30 at 11:51 -0400, Waiman Long wrote: > On 09/28/2013 12:34 AM, Jason Low wrote: > >> Also, below is what the mcs_spin_lock() and mcs_spin_unlock() > >> functions would look like after applying the proposed changes. > >> > >> static

Re: [PATCH v5 1/6] rwsem: check the lock before cpmxchg in down_write_trylock

2013-09-24 Thread Jason Low
Should we do something similar with __down_read_trylock, such as the following? Signed-off-by: Jason Low --- include/asm-generic/rwsem.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h index bb1e2cd..47990dc

Re: [patch v6 8/8] sched: remove blocked_load_avg in tg

2013-05-29 Thread Jason Low
ip kernel with no patches. When using a 3.10-rc2 tip kernel with just patches 1-7, the performance improvement of the workload over the vanilla 3.10-rc2 tip kernel was about 25%. Tested-by: Jason Low Thanks, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel&quo

Re: [PATCH RFC ticketlock] Auto-queued ticketlock

2013-06-11 Thread Jason Low
On Tue, Jun 11, 2013 at 12:49 PM, Paul E. McKenney wrote: > On Tue, Jun 11, 2013 at 02:41:59PM -0400, Waiman Long wrote: >> On 06/11/2013 12:36 PM, Paul E. McKenney wrote: >> > >> >>I am a bit concern about the size of the head queue table itself. >> >>RHEL6, for example, had defined CONFIG_NR_CPU

[RFC] sched: Limit idle_balance() when it is being used too frequently

2013-07-16 Thread Jason Low
--- All other % difference results were within a 2% noise range. Signed-off-by: Jason Low --- include/linux/sched.h |4 kernel/sched/core.c |3 +++ kernel/sched/fair.c | 26 ++ kernel/sched/sched.h |6 ++ kernel/sysctl.c | 11 ++

Re: [RFC] sched: Limit idle_balance() when it is being used too frequently

2013-07-16 Thread Jason Low
On Tue, 2013-07-16 at 22:20 +0200, Peter Zijlstra wrote: > On Tue, Jul 16, 2013 at 12:21:03PM -0700, Jason Low wrote: > > When running benchmarks on an 8 socket 80 core machine with a 3.10 kernel, > > there can be a lot of contention in idle_balance() and related functions. &

Re: [RFC] sched: Limit idle_balance() when it is being used too frequently

2013-07-17 Thread Jason Low
On Wed, 2013-07-17 at 09:25 +0200, Peter Zijlstra wrote: > On Tue, Jul 16, 2013 at 03:48:01PM -0700, Jason Low wrote: > > On Tue, 2013-07-16 at 22:20 +0200, Peter Zijlstra wrote: > > > On Tue, Jul 16, 2013 at 12:21:03PM -0700, Jason Low wrote: > > > > When running ben

Re: [RFC] sched: Limit idle_balance() when it is being used too frequently

2013-07-17 Thread Jason Low
Hi Peter, On Wed, 2013-07-17 at 11:39 +0200, Peter Zijlstra wrote: > On Wed, Jul 17, 2013 at 01:11:41AM -0700, Jason Low wrote: > > For the more complex model, are you suggesting that each completion time > > is the time it takes to complete 1 iteration of the for_each_do

Re: [RFC] sched: Limit idle_balance() when it is being used too frequently

2013-07-17 Thread Jason Low
On Wed, 2013-07-17 at 20:01 +0200, Peter Zijlstra wrote: > On Wed, Jul 17, 2013 at 01:51:51PM -0400, Rik van Riel wrote: > > On 07/17/2013 12:18 PM, Peter Zijlstra wrote: > > > >So the way I see things is that the only way newidle balance can slow down > > >things is if it runs when we could have

Re: [RFC] sched: Limit idle_balance() when it is being used too frequently

2013-07-17 Thread Jason Low
On Wed, 2013-07-17 at 20:01 +0200, Peter Zijlstra wrote: > On Wed, Jul 17, 2013 at 01:51:51PM -0400, Rik van Riel wrote: > > On 07/17/2013 12:18 PM, Peter Zijlstra wrote: > > > >So the way I see things is that the only way newidle balance can slow down > > >things is if it runs when we could have

[RFC PATCH] sched: Reduce overestimating avg_idle

2013-07-31 Thread Jason Low
idle balance need to be the same as the migration_cost in task_hot()? Can we keep migration_cost default value used in task_hot() the same, but have a different default value or increase migration_cost only when comparing it with avg_idle in idle balance? Signed-off-by: Jason Low --- kernel/sched/c

Re: [RFC PATCH] sched: Reduce overestimating avg_idle

2013-08-01 Thread Jason Low
> I wonder if we could get even more conservative values > of avg_idle by clamping delta to max, before calling > update_avg... > > Or rather, I wonder if that would matter enough to make > a difference, and in what direction that difference would > be. > > In other words: > > if (rq->idl

Re: [RFC PATCH] sched: Reduce overestimating avg_idle

2013-08-02 Thread Jason Low
On Wed, 2013-07-31 at 11:53 +0200, Peter Zijlstra wrote: > No they're quite unrelated. I think you can measure the max time we've > ever spend in newidle balance and use that to clip the values. So I tried using the rq's max newidle balance cost to compare with the average and used sysctl_migrati

[RFC PATCH v3] sched: Limit idle balance based on max cost per sched domain

2013-08-20 Thread Jason Low
hat avg_idle and max_cost are) if the previous attempt on the rq or domain succeeded in moving tasks. I was also wondering if we should periodically reset the max cost. Both would require an extra field to be added to either the rq or domain structure though. Signed-off-by: Jason Low --- arch/

Re: [RFC PATCH v3] sched: Limit idle balance based on max cost per sched domain

2013-08-22 Thread Jason Low
On Thu, 2013-08-22 at 13:10 +0200, Peter Zijlstra wrote: > Fully agreed, this is something we should do regardless -- for as long > as we preserve the avg_idle() machinery anyway :-) Okay, I'll have the avg_idle fix as part 1 of the v4 patchset. > The thing you 'forgot' to mention is if this patc

Re: [RFC] sched: Limit idle_balance() when it is being used too frequently

2013-07-18 Thread Jason Low
On Thu, 2013-07-18 at 17:42 +0530, Srikar Dronamraju wrote: > > > > > > idle_balance(u64 idle_duration) > > > { > > > u64 cost = 0; > > > > > > for_each_domain(sd) { > > > if (cost + sd->cost > idle_duration/N) > > > break; > > > > > > ... > > > > > > sd->cost = (sd->cost

Re: [RFC] sched: Limit idle_balance() when it is being used too frequently

2013-07-18 Thread Jason Low
On Thu, 2013-07-18 at 07:59 -0400, Rik van Riel wrote: > On 07/18/2013 05:32 AM, Peter Zijlstra wrote: > > On Wed, Jul 17, 2013 at 09:02:24PM -0700, Jason Low wrote: > > > >> I ran a few AIM7 workloads for the 8 socket HT enabled case and I needed > >> to set N to

[RFC PATCH v2] sched: Limit idle_balance()

2013-07-19 Thread Jason Low
balancing gets skipped if the approximate cost of load balancing will be greater than N% of the approximate time the CPU remains idle. Currently, N is set to 10% though I'm searching for a more "ideal" way to compute this. Suggested-by: Peter Zijlstra Suggested-by: Rik van Riel Signe

Re: [RFC] sched: Limit idle_balance() when it is being used too frequently

2013-07-19 Thread Jason Low
On Fri, 2013-07-19 at 20:37 +0200, Peter Zijlstra wrote: > On Thu, Jul 18, 2013 at 12:06:39PM -0700, Jason Low wrote: > > > N = 1 > > - > > 19.21% reaim [k] __read_lock_failed > > 14.79% reaim [k] mspin_lock

Re: [RFC PATCH v2] sched: Limit idle_balance()

2013-07-19 Thread Jason Low
On Fri, 2013-07-19 at 16:54 +0530, Preeti U Murthy wrote: > Hi Json, > > I ran ebizzy and kernbench benchmarks on your 3.11-rc1 + your"V1 > patch" on a 1 socket, 16 core powerpc machine. I thought I would let you > know the results before I try your V2. > > Ebizzy: 30 seconds run. The tab

Re: [RFC PATCH v2] sched: Limit idle_balance()

2013-07-22 Thread Jason Low
On Sun, 2013-07-21 at 23:02 +0530, Preeti U Murthy wrote: > Hi Json, > > With V2 of your patch here are the results for the ebizzy run on > 3.11-rc1 + patch on a 1 socket, 16 core powerpc machine. Each ebizzy > run was for 30 seconds. > > Number_of_threads %improvement_with_patch > 4

Re: [RFC PATCH v2] sched: Limit idle_balance()

2013-07-22 Thread Jason Low
On Mon, 2013-07-22 at 12:31 +0530, Srikar Dronamraju wrote: > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index e8b3350..da2cb3e 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -1348,6 +1348,8 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, >

Re: [RFC PATCH v2] sched: Limit idle_balance()

2013-07-23 Thread Jason Low
On Tue, 2013-07-23 at 16:36 +0530, Srikar Dronamraju wrote: > > > > A potential issue I have found with avg_idle is that it may sometimes be > > not quite as accurate for the purposes of this patch, because it is > > always given a max value (default is 100 ns). For example, a CPU > > could ha

Re: [RFC PATCH v2] sched: Limit idle_balance()

2013-07-24 Thread Jason Low
> > > Should we take the consideration of whether a idle_balance was > > > successful or not? > > > > I recently ran fserver on the 8 socket machine with HT-enabled and found > > that load balance was succeeding at a higher than average rate, but idle > > balance was still lowering performance of

[PATCH] sched: Give idle_balance() a break when it does not move tasks.

2013-08-12 Thread Jason Low
+10.7% Signed-off-by: Jason Low --- kernel/sched/core.c |1 + kernel/sched/fair.c | 10 +- kernel/sched/sched.h |5 + 3 files changed, 15 insertions(+), 1 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c

Re: [PATCH] sched: Give idle_balance() a break when it does not move tasks.

2013-08-12 Thread Jason Low
On Mon, 2013-08-12 at 16:30 +0530, Srikar Dronamraju wrote: > > /* > > @@ -5298,6 +5300,8 @@ void idle_balance(int this_cpu, struct rq *this_rq) > > continue; > > > > if (sd->flags & SD_BALANCE_NEWIDLE) { > > + load_balance_attempted = true; >

Re: [RFC] locking/mutex: Fix starvation of sleeping waiters

2016-07-19 Thread Jason Low
On Tue, 2016-07-19 at 19:53 +0300, Imre Deak wrote: > On ma, 2016-07-18 at 10:47 -0700, Jason Low wrote: > > On Mon, 2016-07-18 at 19:15 +0200, Peter Zijlstra wrote: > > > I think we went over this before, that will also completely destroy > > > performance

[RFC] Avoid mutex starvation when optimistic spinning is disabled

2016-07-19 Thread Jason Low
s disabled? Thanks. --- Signed-off-by: Jason Low --- include/linux/mutex.h | 2 ++ kernel/locking/mutex.c | 61 +- 2 files changed, 58 insertions(+), 5 deletions(-) diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 2cb7531..c1ca68d 10

Re: [RFC] Avoid mutex starvation when optimistic spinning is disabled

2016-07-19 Thread Jason Low
On Tue, 2016-07-19 at 16:04 -0700, Jason Low wrote: > Hi Imre, > > Here is a patch which prevents a thread from spending too much "time" > waiting for a mutex in the !CONFIG_MUTEX_SPIN_ON_OWNER case. > > Would you like to try this out and see if this addresses the

Re: [RFC] Avoid mutex starvation when optimistic spinning is disabled

2016-07-21 Thread Jason Low
On Wed, 2016-07-20 at 16:29 +0300, Imre Deak wrote: > On ti, 2016-07-19 at 21:39 -0700, Jason Low wrote: > > On Tue, 2016-07-19 at 16:04 -0700, Jason Low wrote: > > > Hi Imre, > > > > > > Here is a patch which prevents a thread from spending too much &qu

Re: [RFC] Avoid mutex starvation when optimistic spinning is disabled

2016-07-21 Thread Jason Low
On Wed, 2016-07-20 at 14:37 -0400, Waiman Long wrote: > On 07/20/2016 12:39 AM, Jason Low wrote: > > On Tue, 2016-07-19 at 16:04 -0700, Jason Low wrote: > >> Hi Imre, > >> > >> Here is a patch which prevents a thread from spending too much &q

Re: [RFC] Avoid mutex starvation when optimistic spinning is disabled

2016-07-22 Thread Jason Low
On Fri, 2016-07-22 at 12:34 +0300, Imre Deak wrote: > On to, 2016-07-21 at 15:29 -0700, Jason Low wrote: > > On Wed, 2016-07-20 at 14:37 -0400, Waiman Long wrote: > > > On 07/20/2016 12:39 AM, Jason Low wrote: > > > > On Tue, 2016-07-19 at 16:04 -0700, Jaso

Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex

2016-08-23 Thread Jason Low
On Tue, 2016-08-23 at 09:35 -0700, Jason Low wrote: > On Tue, 2016-08-23 at 09:17 -0700, Davidlohr Bueso wrote: > > I have not looked at the patches yet, but are there any performance minutia > > to be aware of? > > This would remove all of the mutex architecture specific o

Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

2015-08-27 Thread Jason Low
On Thu, 2015-08-27 at 18:43 -0400, George Spelvin wrote: > Jason Low wrote: > > Frederic suggested that we just use a single "status" variable and > > access the bits for the running and checking field. I am leaning towards > > that method, so I might not include the

Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()

2015-08-31 Thread Jason Low
On Mon, 2015-08-31 at 08:15 -0700, Davidlohr Bueso wrote: > On Tue, 2015-08-25 at 20:17 -0700, Jason Low wrote: > > In fastpath_timer_check(), the task_cputime() function is always > > called to compute the utime and stime values. However, this is not > > necessary if th

[PATCH v2 1/4] timer: Optimize fastpath_timer_check()

2015-10-14 Thread Jason Low
timers set. Signed-off-by: Jason Low Reviewed-by: Oleg Nesterov Reviewed-by: Frederic Weisbecker Reviewed-by: Davidlohr Bueso --- kernel/time/posix-cpu-timers.c | 11 +++ 1 files changed, 3 insertions(+), 8 deletions(-) diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix

[PATCH v2 0/4] timer: Improve itimers scalability

2015-10-14 Thread Jason Low
throughput by more than 30%. With this patch set (along with commit 1018016c706f mentioned above), the performance hit of itimers almost completely goes away on the 16 socket system. Jason Low (4): timer: Optimize fastpath_timer_check() timer: Check thread timers only when there are active thread

[PATCH v2 3/4] timer: Convert cputimer->running to bool

2015-10-14 Thread Jason Low
oleans. This is a preparatory patch to convert the existing running integer field to a boolean. Suggested-by: George Spelvin Signed-off-by: Jason Low --- include/linux/init_task.h |2 +- include/linux/sched.h |6 +++--- kernel/fork.c |2 +- kernel/time/pos

[PATCH v2 2/4] timer: Check thread timers only when there are active thread timers

2015-10-14 Thread Jason Low
there are no per-thread timers. As suggested by George, we can put the task_cputime_zero() check in check_thread_timers(), since that is more of an optization to the function. Similarly, we move the existing check of cputimer->running to check_process_timers(). Signed-off-by: Jason Low Revie

[PATCH v2 4/4] timer: Reduce unnecessary sighand lock contention

2015-10-14 Thread Jason Low
the thread_group_cputimer structure maintain a boolean to signify when a thread in the group is already checking for process wide timers, and adds extra logic in the fastpath to check the boolean. Signed-off-by: Jason Low Reviewed-by: Oleg Nesterov --- include/linux/init_task.h |1

Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-15 Thread Jason Low
On Wed, 2015-10-14 at 17:18 -0400, George Spelvin wrote: > I'm going to give 4/4 a closer look to see if the races with timer > expiration make more sense to me than last time around. > (E.g. do CPU time signals even work in CONFIG_NO_HZ_FULL?) > > But although I haven't yet convinced myself the c

Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-15 Thread Jason Low
On Thu, 2015-10-15 at 10:47 +0200, Ingo Molnar wrote: > * Jason Low wrote: > > > While running a database workload on a 16 socket machine, there were > > scalability issues related to itimers. The following link contains a > > more detailed summary of the issues

Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-16 Thread Jason Low
On Fri, 2015-10-16 at 09:12 +0200, Ingo Molnar wrote: > * Jason Low wrote: > > > > > With this patch set (along with commit 1018016c706f mentioned above), > > > > the performance hit of itimers almost completely goes away on the > > > > 16 so

Re: [PATCH] posix-cpu-timers: Merge running and checking_timer state in one field

2015-10-20 Thread Jason Low
On Tue, 2015-10-20 at 02:18 +0200, Frederic Weisbecker wrote: > This way we might consume less space in the signal struct (well, > depending on bool size or padding) and we don't need to worry about > ordering between the running and checking_timers fields. This looks fine to me. I ended up going

[RFC PATCH] sched, timer: Use atomics for thread_group_cputimer stats

2015-01-22 Thread Jason Low
spent updating thread group cputimer timers was reduced from 30% down to less than 1%. Signed-off-by: Jason Low --- include/linux/init_task.h |7 +++-- include/linux/sched.h | 12 +++-- kernel/fork.c |5 +--- kernel/sched/stats.h |

Re: [RFC PATCH] sched, timer: Use atomics for thread_group_cputimer stats

2015-01-23 Thread Jason Low
On Fri, 2015-01-23 at 10:33 +0100, Peter Zijlstra wrote: > > + .running = ATOMIC_INIT(0), \ > > + atomic_t running; > > + atomic_set(&sig->cputimer.running, 1); > > @@ -174,7 +174,7 @@ static inline bool cputimer_running(struct task_struct > > *ts

Re: [RFC PATCH] sched, timer: Use atomics for thread_group_cputimer stats

2015-01-23 Thread Jason Low
On Fri, 2015-01-23 at 10:25 +0100, Peter Zijlstra wrote: > On Thu, Jan 22, 2015 at 07:31:53PM -0800, Jason Low wrote: > > +static void update_gt_cputime(struct thread_group_cputimer *a, struct > > task_cputime *b) > > { > > + if (b->u

Re: [RFC PATCH] sched, timer: Use atomics for thread_group_cputimer stats

2015-01-23 Thread Jason Low
On Fri, 2015-01-23 at 21:08 +0100, Peter Zijlstra wrote: > On Fri, Jan 23, 2015 at 11:23:36AM -0800, Jason Low wrote: > > On Fri, 2015-01-23 at 10:25 +0100, Peter Zijlstra wrote: > > > On Thu, Jan 22, 2015 at 07:31:53PM -0800, Jason Low wrote: > > > > +static

Re: [RFC PATCH] sched, timer: Use atomics for thread_group_cputimer stats

2015-01-23 Thread Jason Low
On Fri, 2015-01-23 at 21:08 +0100, Peter Zijlstra wrote: > On Fri, Jan 23, 2015 at 11:23:36AM -0800, Jason Low wrote: > > On Fri, 2015-01-23 at 10:25 +0100, Peter Zijlstra wrote: > > > On Thu, Jan 22, 2015 at 07:31:53PM -0800, Jason Low wrote: > > > > +static

[PATCH] cgroup: Initialize root in cgroup_mount

2015-01-26 Thread Jason Low
that we can focus on catching warnings that can potentially cause bigger issues. Signed-off-by: Jason Low --- kernel/cgroup.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index bb263d0..66684f3 100644 --- a/kernel/cgroup.c +++ b/k

Re: [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners

2015-01-27 Thread Jason Low
On Sun, 2015-01-25 at 23:36 -0800, Davidlohr Bueso wrote: > When readers hold the semaphore, the ->owner is nil. As such, > and unlike mutexes, '!owner' does not necessarily imply that > the lock is free. This will cause writer spinners to potentially > spin excessively as they've been mislead to t

Re: [PATCH] cgroup: Initialize root in cgroup_mount

2015-01-27 Thread Jason Low
On Tue, 2015-01-27 at 11:10 -0500, Tejun Heo wrote: > On Mon, Jan 26, 2015 at 04:21:39PM -0800, Jason Low wrote: > > Compiling kernel/ causes warnings: > > > > ... ‘root’ may be used uninitialized in this function > > ... ‘root’ was declared here > > &g

Re: [PATCH 6/6] locking/rwsem: Check for active lock before bailing on spinning

2015-01-27 Thread Jason Low
possibility reader(s) may have the lock. > - * To be safe, avoid spinning in these situations. > - */ > - return on_cpu; > + ret = owner->on_cpu; > +done: > + rcu_read_unlock(); > + return ret; > } Acked-by: Jason Low -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

[RFC][PATCH] cpuset, sched: Fix cpuset sched_relax_domain_level

2015-01-28 Thread Jason Low
) in the cpuset traversal. Signed-off-by: Jason Low --- kernel/cpuset.c | 12 +++- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 64b257f..0f58c54 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -541,15 +541,17 @@ update_dom

Re: [PATCH 4/6] locking/rwsem: Avoid deceiving lock spinners

2015-01-28 Thread Jason Low
On Tue, 2015-01-27 at 19:54 -0800, Davidlohr Bueso wrote: > On Tue, 2015-01-27 at 09:23 -0800, Jason Low wrote: > > On Sun, 2015-01-25 at 23:36 -0800, Davidlohr Bueso wrote: > > > When readers hold the semaphore, the ->owner is nil. As such, > > > and unlike mutexes,

Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability

2015-03-19 Thread Jason Low
On Mon, 2015-03-02 at 13:49 -0800, Jason Low wrote: > On Mon, 2015-03-02 at 11:03 -0800, Linus Torvalds wrote: > > On Mon, Mar 2, 2015 at 10:42 AM, Jason Low wrote: > > > > > > This patch converts the timers to 64 bit atomic variables and use > > > atomic add

Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability

2015-03-19 Thread Jason Low
On Thu, 2015-03-19 at 10:59 -0700, Linus Torvalds wrote: > On Thu, Mar 19, 2015 at 10:21 AM, Jason Low wrote: > > > > I tested this patch on a 32 bit ARM system with 4 cores. Using the > > generic 64 bit atomics, I did not see any performance change with this > > patch,

Re: [PATCH V2] sched: Improve load balancing in the presence of idle CPUs

2015-04-01 Thread Jason Low
On Wed, 2015-04-01 at 14:03 +0100, Morten Rasmussen wrote: Hi Morten, > > Alright I see. But it is one additional wake up. And the wake up will be > > within the cluster. We will not wake up any CPU in the neighboring > > cluster unless there are tasks to be pulled. So, we can wake up a core > >

Re: sched: Improve load balancing in the presence of idle CPUs

2015-04-01 Thread Jason Low
On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote: > On 03/31/2015 12:25 AM, Jason Low wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index fdae26e..ba8ec1a 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > >

Re: sched: Improve load balancing in the presence of idle CPUs

2015-04-01 Thread Jason Low
On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote: > On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote: > > > > On 04/01/2015 12:24 AM, Jason Low wrote: > > > On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote: > > >> Hi Jason,

Re: sched: Improve load balancing in the presence of idle CPUs

2015-04-01 Thread Jason Low
On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote: > On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote: > > I am sorry I don't quite get this. Can you please elaborate? > > I think the scenario is that we are in nohz_idle_balance() and decide to > bail out because we have pu

Re: sched: Improve load balancing in the presence of idle CPUs

2015-04-02 Thread Jason Low
On Thu, 2015-04-02 at 10:17 +0100, Morten Rasmussen wrote: > On Thu, Apr 02, 2015 at 06:59:07AM +0100, Jason Low wrote: > > Also, below is an example patch. > > > > (Without the conversion to idle_cpu(), the check for rq->idle_balance > > would not be accurate a

Re: [PATCH v2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write

2015-04-23 Thread Jason Low
On Thu, 2015-04-23 at 14:24 -0400, Waiman Long wrote: > The table below shows the % improvement in throughput (1100-2000 users) > in the various AIM7's workloads: > > Workload% increase in throughput Missing table here? :) > --- > include/linux/osq_lock.h|5 +++ > kernel/

[PATCH 0/3] sched, timer: Improve scalability of itimers

2015-04-14 Thread Jason Low
This patchset improves the scalability of itimers, thread_group_cputimer and addresses a performance issue we found while running a database workload where more than 30% of total time is spent in the kernel trying to acquire the thread_group_cputimer spinlock. While we're modifying sched and timer

[PATCH 2/3] sched, timer: Use atomics for thread_group_cputimer to improve scalability

2015-04-14 Thread Jason Low
neric atomics and did not find the overhead to be much of an issue. An explanation for why this isn't an issue is that 32 bit systems usually have small numbers of CPUs, and cacheline contention from extra spinlocks called periodically is not really apparent on smaller systems. Signed-off-by:

[PATCH 3/3] sched, timer: Use cmpxchg to do updates in update_gt_cputime()

2015-04-14 Thread Jason Low
3, enables it after thread 1 checks !cputimer->running in thread_group_cputimer(), then there is a possibility that update_gt_cputime() is updating the cputimers while the cputimer is running. This patch uses cmpxchg and retry logic to ensure that update_gt_cputime() is making its updates atomically

[PATCH 1/3] sched, timer: Remove usages of ACCESS_ONCE in the scheduler

2015-04-14 Thread Jason Low
ACCESS_ONCE doesn't work reliably on non-scalar types. This patch removes the rest of the existing usages of ACCESS_ONCE in the scheduler, and use the new READ_ONCE and WRITE_ONCE APIs. Signed-off-by: Jason Low --- include/linux/sched.h |4 ++-- kernel/f

Re: [PATCH 1/3] sched, timer: Remove usages of ACCESS_ONCE in the scheduler

2015-04-14 Thread Jason Low
Hi Steven, On Tue, 2015-04-14 at 19:59 -0400, Steven Rostedt wrote: > On Tue, 14 Apr 2015 16:09:44 -0700 > Jason Low wrote: > > > > @@ -2088,7 +2088,7 @@ void task_numa_fault(int last_cpupid, int mem_node, > > int pages, int flags) > > > > static void r

Re: [PATCH 2/3] sched, timer: Use atomics for thread_group_cputimer to improve scalability

2015-04-15 Thread Jason Low
On Wed, 2015-04-15 at 09:35 +0200, Ingo Molnar wrote: > * Ingo Molnar wrote: > > > So after your changes we still have a separate: > > > > struct task_cputime { > > cputime_t utime; > > cputime_t stime; > > unsigned long long sum_exec_runtime; > > }; > > > > Which then w

Re: [PATCH 1/3] sched, timer: Remove usages of ACCESS_ONCE in the scheduler

2015-04-15 Thread Jason Low
On Wed, 2015-04-15 at 09:46 +0200, Ingo Molnar wrote: > * Steven Rostedt wrote: > > You are correct. Now I'm thinking that the WRITE_ONCE() is not needed, > > and just a: > > > > p->mm->numa_scan_seq = READ_ONCE(p->numa_scan_seq) + 1; > > > > Can be done. But I'm still trying to wrap my hea

Re: [PATCH 2/3] sched, timer: Use atomics for thread_group_cputimer to improve scalability

2015-04-15 Thread Jason Low
On Wed, 2015-04-15 at 16:07 +0530, Preeti U Murthy wrote: > On 04/15/2015 04:39 AM, Jason Low wrote: > > /* > > @@ -885,11 +890,8 @@ static void check_thread_timers(struct task_struct > > *tsk, > > static void stop_process_timers(struct signal_struct

Re: [PATCH 2/3] sched, timer: Use atomics for thread_group_cputimer to improve scalability

2015-04-15 Thread Jason Low
On Wed, 2015-04-15 at 15:32 +0200, Peter Zijlstra wrote: > On Wed, Apr 15, 2015 at 03:25:36PM +0200, Frederic Weisbecker wrote: > > On Tue, Apr 14, 2015 at 04:09:45PM -0700, Jason Low wrote: > > > void thread_group_cputimer(struct task_struct *tsk, struct task_cpu

Re: [PATCH 2/3] sched, timer: Use atomics for thread_group_cputimer to improve scalability

2015-04-15 Thread Jason Low
On Wed, 2015-04-15 at 07:23 -0700, Davidlohr Bueso wrote: > On Tue, 2015-04-14 at 16:09 -0700, Jason Low wrote: > > While running a database workload, we found a scalability issue with > > itimers. > > > > Much of the problem was caused by the thread_group_cputimer

Re: [PATCH 1/3] sched, timer: Remove usages of ACCESS_ONCE in the scheduler

2015-04-15 Thread Jason Low
On Tue, 2015-04-14 at 22:40 -0400, Steven Rostedt wrote: > You are correct. Now I'm thinking that the WRITE_ONCE() is not needed, > and just a: > > p->mm->numa_scan_seq = READ_ONCE(p->numa_scan_seq) + 1; Just to confirm, is this a typo? Because there really is a numa_scan_seq in the task_st

Re: [PATCH 1/3] sched, timer: Remove usages of ACCESS_ONCE in the scheduler

2015-04-15 Thread Jason Low
Hi Ingo, On Wed, 2015-04-15 at 09:46 +0200, Ingo Molnar wrote: > * Steven Rostedt wrote: > > You are correct. Now I'm thinking that the WRITE_ONCE() is not needed, > > and just a: > > > > p->mm->numa_scan_seq = READ_ONCE(p->numa_scan_seq) + 1; > > > > Can be done. But I'm still trying to wr

Re: [PATCH 1/3] sched, timer: Remove usages of ACCESS_ONCE in the scheduler

2015-04-16 Thread Jason Low
On Thu, 2015-04-16 at 20:15 +0200, Peter Zijlstra wrote: > On Thu, Apr 16, 2015 at 08:02:27PM +0200, Ingo Molnar wrote: > > > ACCESS_ONCE() is not a compiler barrier > > > > It's not a general compiler barrier (and I didn't claim so) but it is > > still a compiler barrier: it's documented as a we

Re: [PATCH 1/3] sched, timer: Remove usages of ACCESS_ONCE in the scheduler

2015-04-16 Thread Jason Low
On Thu, 2015-04-16 at 20:24 +0200, Ingo Molnar wrote: > Would it make sense to add a few comments to the seq field definition > site(s), about how it's supposed to be accessed - or to the > READ_ONCE()/WRITE_ONCE() sites, to keep people from wondering? How about this: --- diff --git a/kernel/sc

  1   2   3   4   5   >