Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-07-12 Thread Greg KH
On Tue, Jul 12, 2016 at 03:12:35PM +0200, Peter Zijlstra wrote: > On Mon, Jul 11, 2016 at 03:33:53PM -0700, Greg KH wrote: > > > Oops, this commit does not apply cleanly to 4.6 or 4.4-stable trees. > > Can someone send me the backported verision that they have tested to > > work properly so I can

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-07-12 Thread Peter Zijlstra
On Mon, Jul 11, 2016 at 03:33:53PM -0700, Greg KH wrote: > Oops, this commit does not apply cleanly to 4.6 or 4.4-stable trees. > Can someone send me the backported verision that they have tested to > work properly so I can queue it up? I've never actually been able to reproduce, but the attached

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-07-11 Thread Greg KH
On Thu, Jul 07, 2016 at 09:42:32AM +0200, Peter Zijlstra wrote: > On Thu, Jul 07, 2016 at 11:20:36AM +1200, Campbell Steven wrote: > > > > commit 8974189222159154c55f24ddad33e3613960521a > > > Author: Peter Zijlstra > > > Date: Thu Jun 16 10:50:40 2016 +0200 > > > Since these early reports fro

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-07-08 Thread Greg KH
On Thu, Jul 07, 2016 at 09:42:32AM +0200, Peter Zijlstra wrote: > On Thu, Jul 07, 2016 at 11:20:36AM +1200, Campbell Steven wrote: > > > > commit 8974189222159154c55f24ddad33e3613960521a > > > Author: Peter Zijlstra > > > Date: Thu Jun 16 10:50:40 2016 +0200 > > > Since these early reports fro

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-07-07 Thread Peter Zijlstra
On Thu, Jul 07, 2016 at 11:20:36AM +1200, Campbell Steven wrote: > > commit 8974189222159154c55f24ddad33e3613960521a > > Author: Peter Zijlstra > > Date: Thu Jun 16 10:50:40 2016 +0200 > Since these early reports from Stefan and I it looks like it's been > hit but alot more folks now so I'd li

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-07-06 Thread Campbell Steven
On 22 June 2016 at 18:13, Peter Zijlstra wrote: > On Wed, Jun 22, 2016 at 01:19:54PM +1200, Campbell Steven wrote: >> >>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: >> >>> >> >>> update_cfs_rq_h_load(cfs_rq); >> >>> return div64_ul(p->se.avg.l

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-06-22 Thread Yannis Aribaud
21 juin 2016 14:13 "Yannis Aribaud" a écrit: > Hi everyone, > > I recently it this bug in the kernel using a vanilla 4.6.2 release. > It seems that somewhere in the load average calculation a division by 0 > occurs (see the stack trace > at the end). > > [snipped] > > I'm not an expert at all b

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-06-22 Thread Peter Zijlstra
On Wed, Jun 22, 2016 at 01:19:54PM +1200, Campbell Steven wrote: > >>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load: > >>> > >>> update_cfs_rq_h_load(cfs_rq); > >>> return div64_ul(p->se.avg.load_avg * cfs_rq->h_load, > >>>

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-06-21 Thread Campbell Steven
On 17 May 2016 at 21:21, Campbell Steven wrote: > Thanks Stefan, > > I am seeing this on 4.5.0 and 4.5.4 both are compiled from mainline > neither include any patches over and above the tree. I ran for well > over a month in production on 4.5.0 with no issues at all on a single > socket server (E5

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-06-21 Thread Yannis Aribaud
Hi everyone, I recently it this bug in the kernel using a vanilla 4.6.2 release. It seems that somewhere in the load average calculation a division by 0 occurs (see the stack trace at the end). After digging a bit (be fair it's my first time) in the kernel sources, I found that we "recently" ad

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-05-17 Thread Campbell Steven
Thanks Stefan, I am seeing this on 4.5.0 and 4.5.4 both are compiled from mainline neither include any patches over and above the tree. I ran for well over a month in production on 4.5.0 with no issues at all on a single socket server (E5-2670 v3 @ 2.30GHz) but as soon as we try to run either 4.5.

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-05-16 Thread Stefan Priebe - Profihost AG
Am 21.03.2016 um 14:38 schrieb Greg KH: > On Mon, Mar 21, 2016 at 11:52:23AM +0100, Stefan Priebe - Profihost AG wrote: >> >> Am 20.03.2016 um 22:41 schrieb Greg KH: >>> On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote: Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: > On

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-03-21 Thread Greg KH
On Mon, Mar 21, 2016 at 11:52:23AM +0100, Stefan Priebe - Profihost AG wrote: > > Am 20.03.2016 um 22:41 schrieb Greg KH: > > On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote: > >> > >> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: > >>> On 03/17/2016 07:45 PM, Greg KH wrote: >

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-03-21 Thread Stefan Priebe - Profihost AG
Am 20.03.2016 um 22:41 schrieb Greg KH: > On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote: >> >> Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: >>> On 03/17/2016 07:45 PM, Greg KH wrote: On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: > Hi, > > while

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-03-20 Thread Greg KH
On Sun, Mar 20, 2016 at 10:27:23PM +0100, Stefan Priebe wrote: > > Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: > >On 03/17/2016 07:45 PM, Greg KH wrote: > >>On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: > >>>Hi, > >>> > >>>while running qemu 2.5 on a host running 4.4.6 the hos

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-03-20 Thread Stefan Priebe
Am 19.03.2016 um 23:26 schrieb Vlastimil Babka: On 03/17/2016 07:45 PM, Greg KH wrote: On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: Hi, while running qemu 2.5 on a host running 4.4.6 the host system has crashed (load > 200) 3 times in the last 3 days. Always with this stack

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-03-19 Thread Greg KH
On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: > Hi, > > while running qemu 2.5 on a host running 4.4.6 the host system has crashed > (load > 200) 3 times in the last 3 days. > > Always with this stack trace: (copy left here: > http://pastebin.com/raw/bCWTLKyt) > > [69068.874268]

Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-03-19 Thread Vlastimil Babka
On 03/17/2016 07:45 PM, Greg KH wrote: On Thu, Mar 17, 2016 at 07:38:03PM +0100, Stefan Priebe wrote: Hi, while running qemu 2.5 on a host running 4.4.6 the host system has crashed (load > 200) 3 times in the last 3 days. Always with this stack trace: (copy left here: http://pastebin.com/raw/b

divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6

2016-03-19 Thread Stefan Priebe
Hi, while running qemu 2.5 on a host running 4.4.6 the host system has crashed (load > 200) 3 times in the last 3 days. Always with this stack trace: (copy left here: http://pastebin.com/raw/bCWTLKyt) [69068.874268] divide error: [#1] SMP [69068.875242] Modules linked in: ebtable_filte