Re: Yet more softlockups.

2013-07-12 Thread Vince Weaver
On Fri, 12 Jul 2013, Dave Jones wrote: > > Here's a fun trick: > > trinity -c perf_event_open -C4 -q -l off > > Within about a minute, that brings any of my boxes to its knees. > The softlockup detector starts going nuts, and then the box wedges solid. are you running with the patch [PATCH 1

Re: Yet more softlockups.

2013-07-12 Thread Dave Hansen
I added the WARN_ONCE() the first time we enable a perf event: The watchdog code looks to use perf these days: > [1.003260] [ cut here ] > [1.007943] WARNING: at > /home/davehans/linux.git/arch/x86/kernel/cpu/perf_event.c:471 > x86_pmu_event_init+0x249/0x430() >

Re: Yet more softlockups.

2013-07-12 Thread Dave Hansen
On 07/12/2013 11:07 AM, David Ahern wrote: > And Dave Hansen: I think nmi.c has the same do_div problem as > kernel/events/core.c that Stephane fixed. Your patch has: > > whole_msecs = do_div(delta, (1000 * 1000)); > decimal_msecs = do_div(delta, 1000) % 1000; Yup. There should b

Re: Yet more softlockups.

2013-07-12 Thread David Ahern
On 7/12/13 11:50 AM, Dave Jones wrote: Given you can run trinity long enough that you hit this however, makes me think you won't be able to triger the bug I'm talking about. Perhaps virtualised perf counters are somehow immune to this problem, because on bare-metal, it literally takes seconds.

Re: Yet more softlockups.

2013-07-12 Thread Dave Jones
On Fri, Jul 12, 2013 at 11:40:06AM -0600, David Ahern wrote: > On 7/12/13 11:18 AM, Dave Jones wrote: > > On Fri, Jul 12, 2013 at 11:12:13AM -0600, David Ahern wrote: > > > On 7/12/13 9:45 AM, Dave Jones wrote: > > > > Here's a fun trick: > > > > > > > > trinity -c perf_event_open -C4

Re: Yet more softlockups.

2013-07-12 Thread David Ahern
On 7/12/13 11:18 AM, Dave Jones wrote: On Fri, Jul 12, 2013 at 11:12:13AM -0600, David Ahern wrote: > On 7/12/13 9:45 AM, Dave Jones wrote: > > Here's a fun trick: > > > > trinity -c perf_event_open -C4 -q -l off > > > > Within about a minute, that brings any of my boxes to its knees.

Re: Yet more softlockups.

2013-07-12 Thread Dave Jones
On Fri, Jul 12, 2013 at 11:12:13AM -0600, David Ahern wrote: > On 7/12/13 9:45 AM, Dave Jones wrote: > > Here's a fun trick: > > > > trinity -c perf_event_open -C4 -q -l off > > > > Within about a minute, that brings any of my boxes to its knees. > > The softlockup detector starts going nuts

Re: Yet more softlockups.

2013-07-12 Thread David Ahern
On 7/12/13 9:45 AM, Dave Jones wrote: Here's a fun trick: trinity -c perf_event_open -C4 -q -l off Within about a minute, that brings any of my boxes to its knees. The softlockup detector starts going nuts, and then the box wedges solid. I tried that in a VM running latest Linus tree. I see t

Re: Yet more softlockups.

2013-07-12 Thread Dave Jones
On Fri, Jul 12, 2013 at 08:55:31AM -0700, Dave Hansen wrote: > I mean that somebody turned 'active_events' on without actually wanting > perf to be on. I'd be curious how it got set to something nonzero. > Could you stick a WARN_ONCE() or printk_ratelimit() on the three sites > that modify it

Re: Yet more softlockups.

2013-07-12 Thread Dave Hansen
On 07/12/2013 08:45 AM, Dave Jones wrote: > On Fri, Jul 12, 2013 at 08:38:52AM -0700, Dave Hansen wrote: > > Dave, for your case, my suspicion would be that it got turned on > > inadvertently, or that we somehow have a bug which bumped up > > perf_event.c's 'active_events' and we're running some

Re: Yet more softlockups.

2013-07-12 Thread Dave Jones
On Fri, Jul 12, 2013 at 08:38:52AM -0700, Dave Hansen wrote: > The warning comes from calling perf_sample_event_took(), which is only > called from one place: perf_event_nmi_handler(). > > So we can be pretty sure that the perf NMI is firing, or at least that > this handler code is running.

Re: Yet more softlockups.

2013-07-12 Thread Dave Hansen
On 07/12/2013 03:31 AM, Ingo Molnar wrote: > * Dave Jones wrote: >> On Wed, Jul 10, 2013 at 05:20:15PM +0200, Markus Trippelsdorf wrote: >> > On 2013.07.10 at 11:13 -0400, Dave Jones wrote: >> > > I get this right after booting.. >> > > >> > > [ 114.516619] perf samples too long (4262 > 2500

Re: Yet more softlockups.

2013-07-12 Thread Ingo Molnar
* Dave Jones wrote: > On Wed, Jul 10, 2013 at 05:20:15PM +0200, Markus Trippelsdorf wrote: > > On 2013.07.10 at 11:13 -0400, Dave Jones wrote: > > > I get this right after booting.. > > > > > > [ 114.516619] perf samples too long (4262 > 2500), lowering > kernel.perf_event_max_sample_rate

Re: Yet more softlockups.

2013-07-10 Thread Dave Jones
On Wed, Jul 10, 2013 at 11:39:50AM -0400, Vince Weaver wrote: > On Wed, 10 Jul 2013, Dave Jones wrote: > > > Something is really fucked up in the kernel side of perf. > > I get this right after booting.. > > > > [ 114.516619] perf samples too long (4262 > 2500), lowering > > kernel.perf_

Re: Yet more softlockups.

2013-07-10 Thread Dave Jones
On Wed, Jul 10, 2013 at 11:39:50AM -0400, Vince Weaver wrote: > On Wed, 10 Jul 2013, Dave Jones wrote: > > > Something is really fucked up in the kernel side of perf. > > I get this right after booting.. > > > > [ 114.516619] perf samples too long (4262 > 2500), lowering > > kernel.perf_

Re: Yet more softlockups.

2013-07-10 Thread Dave Jones
On Wed, Jul 10, 2013 at 05:20:15PM +0200, Markus Trippelsdorf wrote: > On 2013.07.10 at 11:13 -0400, Dave Jones wrote: > > I get this right after booting.. > > > > [ 114.516619] perf samples too long (4262 > 2500), lowering > > kernel.perf_event_max_sample_rate to 5 > > You can disab

Re: Yet more softlockups.

2013-07-10 Thread Vince Weaver
On Wed, 10 Jul 2013, Dave Jones wrote: > Something is really fucked up in the kernel side of perf. > I get this right after booting.. > > [ 114.516619] perf samples too long (4262 > 2500), lowering > kernel.perf_event_max_sample_rate to 5 > > That's before I even get a chance to log in, so

Re: Yet more softlockups.

2013-07-10 Thread Markus Trippelsdorf
On 2013.07.10 at 11:13 -0400, Dave Jones wrote: > On Sat, Jul 06, 2013 at 09:24:08AM +0200, Ingo Molnar wrote: > > > > * Dave Jones wrote: > > > > > On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote: > > > > On Fri, 5 Jul 2013, Dave Jones wrote: > > > > > > > > > BUG: so

Re: Yet more softlockups.

2013-07-10 Thread Dave Jones
On Sat, Jul 06, 2013 at 09:24:08AM +0200, Ingo Molnar wrote: > > * Dave Jones wrote: > > > On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote: > > > On Fri, 5 Jul 2013, Dave Jones wrote: > > > > > > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565] > > >

Re: Yet more softlockups.

2013-07-06 Thread Dave Jones
On Sat, Jul 06, 2013 at 09:24:08AM +0200, Ingo Molnar wrote: > > * Dave Jones wrote: > > > On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote: > > > On Fri, 5 Jul 2013, Dave Jones wrote: > > > > > > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565] > > >

Re: Yet more softlockups.

2013-07-06 Thread Ingo Molnar
* Dave Jones wrote: > On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote: > > On Fri, 5 Jul 2013, Dave Jones wrote: > > > > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565] > > > perf samples too long (2519 > 2500), lowering > kernel.perf_event_max_sample_rate

RE: Yet more softlockups.

2013-07-05 Thread Thomas Gleixner
On Fri, 5 Jul 2013, Seiji Aguchi wrote: > > -Original Message- > > Hmmm... this makes me wonder if the interrupt tracepoint stuff is at > > fault here, as it changes the IDT handling for NMI context. > > This softlockup happens while disabling the interrupt tracepoints, > Because if it is

RE: Yet more softlockups.

2013-07-05 Thread Seiji Aguchi
> -Original Message- > From: H. Peter Anvin [mailto:h...@zytor.com] > Sent: Friday, July 05, 2013 12:41 PM > To: Thomas Gleixner > Cc: Dave Jones; Linus Torvalds; Linux Kernel; Ingo Molnar; Peter Zijlstra; > Seiji Aguchi > Subject: Re: Yet more softlockups. >

Re: Yet more softlockups.

2013-07-05 Thread H. Peter Anvin
On 07/05/2013 09:02 AM, Thomas Gleixner wrote: On Fri, 5 Jul 2013, Dave Jones wrote: On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote: > On Fri, 5 Jul 2013, Dave Jones wrote: > > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565] > > perf samples too long (25

Re: Yet more softlockups.

2013-07-05 Thread Thomas Gleixner
On Fri, 5 Jul 2013, Dave Jones wrote: > On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote: > > On Fri, 5 Jul 2013, Dave Jones wrote: > > > > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565] > > > perf samples too long (2519 > 2500), lowering > kernel.perf_event_m

Re: Yet more softlockups.

2013-07-05 Thread Dave Jones
On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote: > On Fri, 5 Jul 2013, Dave Jones wrote: > > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565] > > perf samples too long (2519 > 2500), lowering > > kernel.perf_event_max_sample_rate to 5 > > INFO: NMI handle

Re: Yet more softlockups.

2013-07-05 Thread Thomas Gleixner
On Fri, 5 Jul 2013, Dave Jones wrote: > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565] > perf samples too long (2519 > 2500), lowering > kernel.perf_event_max_sample_rate to 5 > INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 > msecs So we see a s

Yet more softlockups.

2013-07-05 Thread Dave Jones
On Wed, Jul 03, 2013 at 07:49:18PM -0700, Linus Torvalds wrote: > Does trinity save enough pseudo-random state that it can be > repeatable, because if it's something repeatable it might be > interesting to see what the last few system calls and traps were... > > > Box is wedged, and I won't