Re: weird loadavg on idle machine post 5.7

2020-07-07 Thread Peter Zijlstra
On Tue, Jul 07, 2020 at 10:17:19AM +0200, Peter Zijlstra wrote: > Anyway, let me now endeavour to write a coherent Changelog for this mess I'll go stick this in sched/urgent and update that other documentation patch (again).. --- Subject: sched: Fix loadavg accounting race From: Peter Zijlstra

Re: weird loadavg on idle machine post 5.7

2020-07-07 Thread Valentin Schneider
On 07/07/20 09:17, Peter Zijlstra wrote: > On Tue, Jul 07, 2020 at 12:56:04AM +0100, Valentin Schneider wrote: > >> > @@ -2605,8 +2596,20 @@ try_to_wake_up(struct task_struct *p, unsigned int >> > state, int wake_flags) >> >* >> >* Pairs with the LOCK+smp_mb__after_spinlock() on

Re: weird loadavg on idle machine post 5.7

2020-07-07 Thread Peter Zijlstra
On Tue, Jul 07, 2020 at 10:20:05AM +0100, Qais Yousef wrote: > On 07/06/20 16:59, Peter Zijlstra wrote: > > + if (!preempt && prev_state && prev_state == prev->state) { > > I think the compiler won't optimize `prev_state == prev->state` out because of > the smp_mb__after_spinlock() which

Re: weird loadavg on idle machine post 5.7

2020-07-07 Thread Qais Yousef
On 07/06/20 16:59, Peter Zijlstra wrote: [...] > @@ -4104,12 +4108,19 @@ static void __sched notrace __schedule(bool preempt) > local_irq_disable(); > rcu_note_context_switch(preempt); > > + prev_state = prev->state; > + > /* > - * Make sure that

Re: weird loadavg on idle machine post 5.7

2020-07-07 Thread Peter Zijlstra
On Tue, Jul 07, 2020 at 12:56:04AM +0100, Valentin Schneider wrote: > > @@ -2605,8 +2596,20 @@ try_to_wake_up(struct task_struct *p, unsigned int > > state, int wake_flags) > >* > >* Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in > >* __schedule(). See the

Re: weird loadavg on idle machine post 5.7

2020-07-07 Thread Peter Zijlstra
On Mon, Jul 06, 2020 at 05:20:57PM -0400, Dave Jones wrote: > On Mon, Jul 06, 2020 at 04:59:52PM +0200, Peter Zijlstra wrote: > > On Fri, Jul 03, 2020 at 04:51:53PM -0400, Dave Jones wrote: > > > On Fri, Jul 03, 2020 at 12:40:33PM +0200, Peter Zijlstra wrote: > > > > > > looked promising the

Re: weird loadavg on idle machine post 5.7

2020-07-06 Thread Valentin Schneider
On 06/07/20 15:59, Peter Zijlstra wrote: > OK, lots of cursing later, I now have the below... > > The TL;DR is that while schedule() doesn't change p->state once it > starts, it does read it quite a bit, and ttwu() will actually change it > to TASK_WAKING. So if ttwu() changes it to WAKING

Re: weird loadavg on idle machine post 5.7

2020-07-06 Thread Dave Jones
On Mon, Jul 06, 2020 at 04:59:52PM +0200, Peter Zijlstra wrote: > On Fri, Jul 03, 2020 at 04:51:53PM -0400, Dave Jones wrote: > > On Fri, Jul 03, 2020 at 12:40:33PM +0200, Peter Zijlstra wrote: > > > > looked promising the first few hours, but as soon as it hit four hours > > of uptime,

Re: weird loadavg on idle machine post 5.7

2020-07-06 Thread Peter Zijlstra
On Fri, Jul 03, 2020 at 04:51:53PM -0400, Dave Jones wrote: > On Fri, Jul 03, 2020 at 12:40:33PM +0200, Peter Zijlstra wrote: > > > So ARM/Power/etc.. can speculate the load such that the > > task_contributes_to_load() value is from before ->on_rq. > > > > The compiler might similar

Re: weird loadavg on idle machine post 5.7

2020-07-03 Thread Dave Jones
On Fri, Jul 03, 2020 at 12:40:33PM +0200, Peter Zijlstra wrote: > So ARM/Power/etc.. can speculate the load such that the > task_contributes_to_load() value is from before ->on_rq. > > The compiler might similar re-order things -- although I've not found it > doing so with the few builds I

Re: weird loadavg on idle machine post 5.7

2020-07-03 Thread Paul Gortmaker
[Re: weird loadavg on idle machine post 5.7] On 02/07/2020 (Thu 17:15) Paul Gortmaker wrote: > [weird loadavg on idle machine post 5.7] On 02/07/2020 (Thu 13:15) Dave Jones > wrote: [...] > > both implicated this commit: > > > > commit c6e7bd7afaeb3af55ffac12282803

Re: weird loadavg on idle machine post 5.7

2020-07-03 Thread Peter Zijlstra
On Fri, Jul 03, 2020 at 11:02:26AM +0200, Peter Zijlstra wrote: > On Thu, Jul 02, 2020 at 10:36:27PM +0100, Mel Gorman wrote: > > > > commit c6e7bd7afaeb3af55ffac122828035f1c01d1d7b (refs/bisect/bad) > > > Author: Peter Zijlstra > > > Peter, I'm not supremely confident about this but could it

Re: weird loadavg on idle machine post 5.7

2020-07-03 Thread Peter Zijlstra
On Thu, Jul 02, 2020 at 10:36:27PM +0100, Mel Gorman wrote: > > commit c6e7bd7afaeb3af55ffac122828035f1c01d1d7b (refs/bisect/bad) > > Author: Peter Zijlstra > Peter, I'm not supremely confident about this but could it be because > "p->sched_contributes_to_load = !!task_contributes_to_load(p)"

Re: weird loadavg on idle machine post 5.7

2020-07-02 Thread Dave Jones
On Thu, Jul 02, 2020 at 10:36:27PM +0100, Mel Gorman wrote: > I'm thinking that the !!task_contributes_to_load(p) should still happen > after smp_cond_load_acquire() when on_cpu is stable and the pi_lock is > held to stabilised p->state against a parallel wakeup or updating the > task rq. I

Re: weird loadavg on idle machine post 5.7

2020-07-02 Thread Michal Kubecek
On Thu, Jul 02, 2020 at 10:36:27PM +0100, Mel Gorman wrote: > > It builds, not booted, it's for discussion but maybe Dave is feeling brave! > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index ca5db40392d4..52c73598b18a 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c

Re: weird loadavg on idle machine post 5.7

2020-07-02 Thread Mel Gorman
On Thu, Jul 02, 2020 at 01:15:48PM -0400, Dave Jones wrote: > When I upgraded my firewall to 5.7-rc2 I noticed that on a mostly > idle machine (that usually sees loadavg hover in the 0.xx range) > that it was consistently above 1.00 even when there was nothing running. > All that perf showed was

Re: weird loadavg on idle machine post 5.7

2020-07-02 Thread Paul Gortmaker
[weird loadavg on idle machine post 5.7] On 02/07/2020 (Thu 13:15) Dave Jones wrote: > When I upgraded my firewall to 5.7-rc2 I noticed that on a mostly > idle machine (that usually sees loadavg hover in the 0.xx range) > that it was consistently above 1.00 even when there was nothin

Re: weird loadavg on idle machine post 5.7

2020-07-02 Thread Dave Jones
On Thu, Jul 02, 2020 at 01:15:48PM -0400, Dave Jones wrote: > When I upgraded my firewall to 5.7-rc2 I noticed that on a mostly > idle machine (that usually sees loadavg hover in the 0.xx range) > that it was consistently above 1.00 even when there was nothing running. > All that perf showed

weird loadavg on idle machine post 5.7

2020-07-02 Thread Dave Jones
When I upgraded my firewall to 5.7-rc2 I noticed that on a mostly idle machine (that usually sees loadavg hover in the 0.xx range) that it was consistently above 1.00 even when there was nothing running. All that perf showed was the kernel was spending time in the idle loop (and running perf).