Re: perf/tracepoint: another fuzzer generated lockup

2013-11-16 Thread Masami Hiramatsu
(2013/11/15 23:28), Frederic Weisbecker wrote: > On Fri, Nov 15, 2013 at 09:15:21AM -0500, Steven Rostedt wrote: >> On Fri, 15 Nov 2013 13:28:33 +0100 >> Peter Zijlstra wrote: >> >>> On Fri, Nov 15, 2013 at 10:16:18AM +0900, Masami Hiramatsu wrote: Kprobes itself can detect nested call by usi

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-15 Thread Frederic Weisbecker
On Fri, Nov 15, 2013 at 09:15:21AM -0500, Steven Rostedt wrote: > On Fri, 15 Nov 2013 13:28:33 +0100 > Peter Zijlstra wrote: > > > On Fri, Nov 15, 2013 at 10:16:18AM +0900, Masami Hiramatsu wrote: > > > Kprobes itself can detect nested call by using per-cpu current-running > > > kprobe pointer. A

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-15 Thread Steven Rostedt
On Fri, 15 Nov 2013 13:28:33 +0100 Peter Zijlstra wrote: > On Fri, Nov 15, 2013 at 10:16:18AM +0900, Masami Hiramatsu wrote: > > Kprobes itself can detect nested call by using per-cpu current-running > > kprobe pointer. And if it is nested, it just skips calling handlers. > > Anyway, I don't reco

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-15 Thread Peter Zijlstra
On Fri, Nov 15, 2013 at 10:16:18AM +0900, Masami Hiramatsu wrote: > Kprobes itself can detect nested call by using per-cpu current-running > kprobe pointer. And if it is nested, it just skips calling handlers. > Anyway, I don't recommend to probe inside the handlers, but yes, > you can trace perf-h

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-14 Thread Masami Hiramatsu
(2013/11/15 0:33), Peter Zijlstra wrote: > On Thu, Nov 14, 2013 at 04:23:04PM +0100, Peter Zijlstra wrote: >> /* >> + * We must dis-allow sampling irq_work_exit() because perf event sampling >> + * itself can cause irq_work, which would lead to an infinite loop; >> + * >> + * 1) irq_work_exit hap

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-14 Thread Vince Weaver
On Thu, 14 Nov 2013, Peter Zijlstra wrote: > On Thu, Nov 14, 2013 at 12:20:35PM -0500, Vince Weaver wrote: > > > > possibly not an issue with your patch, but found in nearby code: > > > > in perf_trace_init: > > > > int event_id = p_event->attr.config; > > > > ... > > > >

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-14 Thread Steven Rostedt
On Thu, 14 Nov 2013 18:14:05 +0100 Peter Zijlstra wrote: > Can't blame you; took me a fair chunk of the day to come up with this. > That trace event stuff is horrid. I feel honored ;-) -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-14 Thread Peter Zijlstra
On Thu, Nov 14, 2013 at 12:20:35PM -0500, Vince Weaver wrote: > > possibly not an issue with your patch, but found in nearby code: > > in perf_trace_init: > > int event_id = p_event->attr.config; > > ... > > if (tp_event->event.type == event_id && > > > note tha

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-14 Thread Vince Weaver
possibly not an issue with your patch, but found in nearby code: in perf_trace_init: int event_id = p_event->attr.config; ... if (tp_event->event.type == event_id && note that "attr.config" is 64-bit but "event_id" is only 32-bit, so things like 0x

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-14 Thread Frederic Weisbecker
On Thu, Nov 14, 2013 at 04:23:04PM +0100, Peter Zijlstra wrote: > On Sat, Nov 09, 2013 at 04:10:14PM +0100, Peter Zijlstra wrote: > > Cute.. so what appears to happen is that: > > > > 1) we trace irq_work_exit > > 2) we generate event > > 3) event needs to deliver signal > > 4) we queue irq_work t

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-14 Thread Peter Zijlstra
On Thu, Nov 14, 2013 at 04:23:04PM +0100, Peter Zijlstra wrote: > /* > + * We must dis-allow sampling irq_work_exit() because perf event sampling > + * itself can cause irq_work, which would lead to an infinite loop; > + * > + * 1) irq_work_exit happens > + * 2) generates perf sample > + * 3) g

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-14 Thread Frederic Weisbecker
On Thu, Nov 14, 2013 at 04:33:01PM +0100, Peter Zijlstra wrote: > On Thu, Nov 14, 2013 at 04:23:04PM +0100, Peter Zijlstra wrote: > > /* > > + * We must dis-allow sampling irq_work_exit() because perf event sampling > > + * itself can cause irq_work, which would lead to an infinite loop; > > + * >

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-14 Thread Peter Zijlstra
On Sat, Nov 09, 2013 at 04:10:14PM +0100, Peter Zijlstra wrote: > Cute.. so what appears to happen is that: > > 1) we trace irq_work_exit > 2) we generate event > 3) event needs to deliver signal > 4) we queue irq_work to send signal > 5) goto 1 --- arch/x86/include/asm/trace/irq_vectors.h | 11

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-11 Thread Ingo Molnar
* Peter Zijlstra wrote: > On Mon, Nov 11, 2013 at 01:44:19PM +0100, Ingo Molnar wrote: > > > > * Frederic Weisbecker wrote: > > > > > > That said, I'm not sure what kernel you're running, but there were > > > > some issues with time-keeping hereabouts, but more importantly that > > > > seco

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-11 Thread Peter Zijlstra
On Mon, Nov 11, 2013 at 01:44:19PM +0100, Ingo Molnar wrote: > > * Frederic Weisbecker wrote: > > > > That said, I'm not sure what kernel you're running, but there were > > > some issues with time-keeping hereabouts, but more importantly that > > > second timing includes the printk() call of t

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-11 Thread Ingo Molnar
* Frederic Weisbecker wrote: > > That said, I'm not sure what kernel you're running, but there were > > some issues with time-keeping hereabouts, but more importantly that > > second timing includes the printk() call of the first -- so that's > > always going to be fucked. > > It's a recent

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-09 Thread Frederic Weisbecker
On Sat, Nov 09, 2013 at 04:59:38PM +0100, Peter Zijlstra wrote: > On Sat, Nov 09, 2013 at 04:27:01PM +0100, Frederic Weisbecker wrote: > > In fact, raising an irq work from an irq work should simply be prohibited. > > That's not a sane > > behaviour. > > Well, it is because as you raised on IRC w

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-09 Thread Peter Zijlstra
On Sat, Nov 09, 2013 at 04:27:01PM +0100, Frederic Weisbecker wrote: > In fact, raising an irq work from an irq work should simply be prohibited. > That's not a sane > behaviour. Well, it is because as you raised on IRC we could be holding locks and trying to avoid deadlocks. This is the very rea

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-09 Thread Peter Zijlstra
On Sat, Nov 09, 2013 at 04:22:57PM +0100, Frederic Weisbecker wrote: > > --- > > kernel/events/core.c | 14 -- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index 4dc078d18929..a3ad40f347c4 100644 > > --- a/ke

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-09 Thread Frederic Weisbecker
On Sat, Nov 09, 2013 at 04:13:56PM +0100, Peter Zijlstra wrote: > On Sat, Nov 09, 2013 at 03:52:59PM +0100, Frederic Weisbecker wrote: > > So, an idea of what may be happening: an event overflows while FASYNC flag > > is set so it triggers an irq work > > to send the signal (kill_fasync). > > Afte

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-09 Thread Frederic Weisbecker
On Sat, Nov 09, 2013 at 04:11:01PM +0100, Peter Zijlstra wrote: > On Fri, Nov 08, 2013 at 11:36:58PM +0100, Frederic Weisbecker wrote: > > [ 237.359091] [] perf_callchain_kernel+0x51/0x70 > > [ 237.365155] [] perf_callchain+0x256/0x2c0 > > [ 237.370783] [] perf_prepare_sample+0x27b/0x300 > >

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-09 Thread Peter Zijlstra
On Sat, Nov 09, 2013 at 03:52:59PM +0100, Frederic Weisbecker wrote: > So, an idea of what may be happening: an event overflows while FASYNC flag is > set so it triggers an irq work > to send the signal (kill_fasync). > After the irq work triggers, it generates an irq_work_exit event, which in >

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-09 Thread Peter Zijlstra
On Fri, Nov 08, 2013 at 11:36:58PM +0100, Frederic Weisbecker wrote: > [ 237.359091] [] perf_callchain_kernel+0x51/0x70 > [ 237.365155] [] perf_callchain+0x256/0x2c0 > [ 237.370783] [] perf_prepare_sample+0x27b/0x300 > [ 237.376849] [] ? __rcu_is_watching+0x1a/0x30 > [ 237.382736] [] __pe

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-09 Thread Frederic Weisbecker
On Sat, Nov 09, 2013 at 03:10:39PM +0100, Peter Zijlstra wrote: > On Fri, Nov 08, 2013 at 11:36:58PM +0100, Frederic Weisbecker wrote: > > [ 237.627769] perf samples too long (3397569 > 2500), lowering > > kernel.perf_event_max_sample_rate to 5 > > [ 237.637124] INFO: NMI handler (perf_event

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-09 Thread Frederic Weisbecker
On Sat, Nov 09, 2013 at 03:10:39PM +0100, Peter Zijlstra wrote: > On Fri, Nov 08, 2013 at 11:36:58PM +0100, Frederic Weisbecker wrote: > > [ 237.627769] perf samples too long (3397569 > 2500), lowering > > kernel.perf_event_max_sample_rate to 5 > > [ 237.637124] INFO: NMI handler (perf_event

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-09 Thread Peter Zijlstra
On Fri, Nov 08, 2013 at 11:36:58PM +0100, Frederic Weisbecker wrote: > [ 237.627769] perf samples too long (3397569 > 2500), lowering > kernel.perf_event_max_sample_rate to 5 > [ 237.637124] INFO: NMI handler (perf_event_nmi_handler) took too long to > run: 444.233 msecs > > 444 msecs is h

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-08 Thread Steven Rostedt
On Fri, 8 Nov 2013 23:36:58 +0100 Frederic Weisbecker wrote: > [ 237.623178] ---[ end trace 40cda30d05d0ffa6 ]--- > [ 237.627769] perf samples too long (3397569 > 2500), lowering > kernel.perf_event_max_sample_rate to 5 > [ 237.637124] INFO: NMI handler (perf_event_nmi_handler) took too

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-08 Thread Frederic Weisbecker
On Fri, Nov 08, 2013 at 04:15:21PM -0500, Vince Weaver wrote: > > int main(int argc, char **argv) { > /* 1 */ > /* fd = 82 */ > > memset(&pe[82],0,sizeof(struct perf_event_attr)); > pe[82].type=PERF_TYPE_TRACEPOINT; > pe[82].size=80; > pe[82].config=0x18; I did some more

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-08 Thread Frederic Weisbecker
On Fri, Nov 08, 2013 at 04:15:21PM -0500, Vince Weaver wrote: > On Fri, 8 Nov 2013, Frederic Weisbecker wrote: > > > On Fri, Nov 08, 2013 at 03:23:07PM -0500, Vince Weaver wrote: > > > On Fri, 8 Nov 2013, Frederic Weisbecker wrote: > > > > > > > > There seem to be a loop that takes too long in in

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-08 Thread Frederic Weisbecker
On Fri, Nov 08, 2013 at 04:15:21PM -0500, Vince Weaver wrote: > On Fri, 8 Nov 2013, Frederic Weisbecker wrote: > > > On Fri, Nov 08, 2013 at 03:23:07PM -0500, Vince Weaver wrote: > > > On Fri, 8 Nov 2013, Frederic Weisbecker wrote: > > > > > > > > There seem to be a loop that takes too long in in

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-08 Thread Vince Weaver
On Fri, 8 Nov 2013, Frederic Weisbecker wrote: > On Fri, Nov 08, 2013 at 03:23:07PM -0500, Vince Weaver wrote: > > On Fri, 8 Nov 2013, Frederic Weisbecker wrote: > > > > > > There seem to be a loop that takes too long in intel_pmu_handle_irq(). > > > Your two > > > previous reports seemed to sug

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-08 Thread Frederic Weisbecker
On Fri, Nov 08, 2013 at 03:23:07PM -0500, Vince Weaver wrote: > On Fri, 8 Nov 2013, Frederic Weisbecker wrote: > > > > There seem to be a loop that takes too long in intel_pmu_handle_irq(). Your > > two > > previous reports seemed to suggest that lbr is involved, but not this one. > > I may be w

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-08 Thread Vince Weaver
On Fri, 8 Nov 2013, Frederic Weisbecker wrote: > > There seem to be a loop that takes too long in intel_pmu_handle_irq(). Your > two > previous reports seemed to suggest that lbr is involved, but not this one. I may be wrong but I think everything between and is just noise from the NMI perf-ev

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-08 Thread Frederic Weisbecker
On Fri, Nov 08, 2013 at 03:06:11PM -0500, Vince Weaver wrote: > > and again, this time after 600k successful syscalls or so. > > This is on a core2 machine. > > [ 1020.396002] [ cut here ] > [ 1020.396002] WARNING: CPU: 1 PID: 3036 at kernel/watchdog.c:245 > watchdog_ove

Re: perf/tracepoint: another fuzzer generated lockup

2013-11-08 Thread Vince Weaver
and again, this time after 600k successful syscalls or so. This is on a core2 machine. [ 1020.396002] [ cut here ] [ 1020.396002] WARNING: CPU: 1 PID: 3036 at kernel/watchdog.c:245 watchdog_over) [ 1020.396002] Watchdog detected hard LOCKUP on cpu 1 [ 1020.396002] Modules

perf/tracepoint: another fuzzer generated lockup

2013-11-08 Thread Vince Weaver
The perf_fuzzer hits this bug pretty reliably within about an hour, but unfortunately after a large number of syscalls so not easy to isolate. This is against stock 3.12 with the no-ftrace-function-tracing-unless-root patch applied. I can trigger this is a normal user, no root involved. [10665