On Thu, 2 Oct 2014, Vince Weaver wrote:
> It looks like this is easily reproducible (just wedged the machine again)
> so let me check back after testing the patch.
no, can still wedge the machine even with this patch applied.
Will try messing with ftrace to see if I can figure out what's going o
On Wed, 1 Oct 2014, Sasha Levin wrote:
> On 09/30/2014 01:23 PM, Peter Zijlstra wrote:
> > How about this then?
> >
> > ---
> > Subject: perf: Fix unclone_ctx() vs locking
> >
> > The idiot who did 4a1c0f262f88 forgot to pay attention and fix all
> > similar cases. Do so now.
> >
> > In particu
On 09/30/2014 01:23 PM, Peter Zijlstra wrote:
> How about this then?
>
> ---
> Subject: perf: Fix unclone_ctx() vs locking
>
> The idiot who did 4a1c0f262f88 forgot to pay attention and fix all
> similar cases. Do so now.
>
> In particular, unclone_ctx() must be called while holding ctx->lock,
>
On Sun, Sep 28, 2014 at 10:21 PM, Vince Weaver wrote:
> On Thu, 25 Sep 2014, Cong Wang wrote:
>
>> On Wed, Sep 24, 2014 at 9:59 PM, Vince Weaver
>> wrote:
>
>> > Now that just might mean the patch pushed the code around enough so my
>> > test doesn't trigger, but there is hope that maybe this fi
How about this then?
---
Subject: perf: Fix unclone_ctx() vs locking
The idiot who did 4a1c0f262f88 forgot to pay attention and fix all
similar cases. Do so now.
In particular, unclone_ctx() must be called while holding ctx->lock,
therefore all such sites are broken for the same reason. Pull th
On Mon, Sep 29, 2014 at 01:01:33PM -0400, Sasha Levin wrote:
> On 09/29/2014 07:11 AM, Peter Zijlstra wrote:
> > On Sun, Sep 28, 2014 at 12:09:09AM -0400, Sasha Levin wrote:
> >
> >> > [ 690.801720] 2 locks held by trinity-c95/17888:
> >> > [ 690.801738] #0: (cpu_hotplug.lock){++}, at: get_o
On 09/29/2014 07:11 AM, Peter Zijlstra wrote:
> On Sun, Sep 28, 2014 at 12:09:09AM -0400, Sasha Levin wrote:
>
>> > [ 690.801720] 2 locks held by trinity-c95/17888:
>> > [ 690.801738] #0: (cpu_hotplug.lock){++}, at: get_online_cpus
>> > (kernel/cpu.c:92)
>> > [ 690.801754] #1: (&ctx->lock)
On Sun, Sep 28, 2014 at 12:09:09AM -0400, Sasha Levin wrote:
> [ 690.801720] 2 locks held by trinity-c95/17888:
> [ 690.801738] #0: (cpu_hotplug.lock){++}, at: get_online_cpus
> (kernel/cpu.c:92)
> [ 690.801754] #1: (&ctx->lock){-.-...}, at: perf_lock_task_context
> (kernel/events/core.c:
On Thu, 25 Sep 2014, Cong Wang wrote:
> On Wed, Sep 24, 2014 at 9:59 PM, Vince Weaver
> wrote:
> > Now that just might mean the patch pushed the code around enough so my
> > test doesn't trigger, but there is hope that maybe this fixes things.
>
> I read this as it fixes your crash as well?
I
On 09/25/2014 12:38 PM, Cong Wang wrote:
> On Wed, Sep 24, 2014 at 9:59 PM, Vince Weaver
> wrote:
>> >
>> > So I noticed Cong Wang's patch (3577af70a2ce4853d58e57d832e687d739281479)
>> > perf: Fix a race condition in perf_remove_from_context()
>> >
>> > and that sounds a lot like the weir
On Wed, Sep 24, 2014 at 9:59 PM, Vince Weaver wrote:
>
> So I noticed Cong Wang's patch (3577af70a2ce4853d58e57d832e687d739281479)
> perf: Fix a race condition in perf_remove_from_context()
>
> and that sounds a lot like the weird fork()/memory-corruption bug that the
> fuzzer has been tri
So I noticed Cong Wang's patch (3577af70a2ce4853d58e57d832e687d739281479)
perf: Fix a race condition in perf_remove_from_context()
and that sounds a lot like the weird fork()/memory-corruption bug that the
fuzzer has been triggering.
So I applied that patch alone on top of the 3.17-rc4
On Wed, 10 Sep 2014, Peter Zijlstra wrote:
> > I've been trying for months now to make progress on these but this type of
> > bug is really hard to debug.
>
> Did we actually fix some at least? I had the idea we did get a few
> sorted. But yes, this is tedious and hard going :/
we did fix some
On Wed, Sep 10, 2014 at 10:30:31AM -0400, Vince Weaver wrote:
> On Wed, 10 Sep 2014, Sasha Levin wrote:
>
> > On 09/10/2014 09:18 AM, Vince Weaver wrote:
> > > that's what got me looking at things again, the trinity reports. Though
> > > I
> > > think those involve CPU hotplugging which my fuzz
On Wed, 10 Sep 2014, Sasha Levin wrote:
> On 09/10/2014 09:18 AM, Vince Weaver wrote:
> > that's what got me looking at things again, the trinity reports. Though I
> > think those involve CPU hotplugging which my fuzzer shouldn't trigger.
> >
> > I do think this is the same memory corruption/re
On 09/10/2014 09:18 AM, Vince Weaver wrote:
> that's what got me looking at things again, the trinity reports. Though I
> think those involve CPU hotplugging which my fuzzer shouldn't trigger.
>
> I do think this is the same memory corruption/reboot bug that I reported
> back in February (the t
On Wed, Sep 10, 2014 at 09:18:35AM -0400, Vince Weaver wrote:
> Somehow something is stomping over memory with a forking workload (likely
> an improper free with RCU like we've seen before) but the fact that it
> causes a reboot immediately makes it *really* hard to debug this.
Yes, the insta r
On Wed, 10 Sep 2014, Peter Zijlstra wrote:
>
> Sasha reported something from his KVM based fuzzing, maybe that's the
> same. But that x86_exceptions thing is interesting, lemme go look at
> that first.
that's what got me looking at things again, the trinity reports. Though I
think those involve
On Tue, Sep 09, 2014 at 01:53:50PM -0400, Vince Weaver wrote:
>
> OK so trying to use ftrace to track this issue, and this happens (on
> core2, 3.17-rc4)
>
> [ 295.992012] PANIC: double fault, error_code: 0x0
> [ 295.992012] CPU: 1 PID: 2916 Comm: trace-cmd Not tainted 3.17.0-rc4+ #82
> [ 295
OK so trying to use ftrace to track this issue, and this happens (on
core2, 3.17-rc4)
[ 295.992012] PANIC: double fault, error_code: 0x0
[ 295.992012] CPU: 1 PID: 2916 Comm: trace-cmd Not tainted 3.17.0-rc4+ #82
[ 295.992012] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012,
BIOS
On Tue, 9 Sep 2014, Vince Weaver wrote:
>
> [ 751.656861] NOHZ: local_softirq_pending 100
> [ 756.009271] traps: perf_fuzzer[4236] trap invalid opcode ip:4044c0
> sp:7fffb0ed6f90 error:0
> [ 756.017590] BUG: unable to handle kernel paging request
so it turns out that while it seems reproducib
On Mon, 8 Sep 2014, Peter Zijlstra wrote:
> On Mon, Sep 08, 2014 at 01:47:11PM -0400, Vince Weaver wrote:
> > Hello
> >
> > so I finally had time to run my perf_fuzzer again and it has rapidly
> > turned up an alarming crash that instant-reboots my core2 test machine.
>
> Urgh, of course :/
So
On Mon, 8 Sep 2014, Peter Zijlstra wrote:
> On Mon, Sep 08, 2014 at 01:47:11PM -0400, Vince Weaver wrote:
> > Hello
> >
> > so I finally had time to run my perf_fuzzer again and it has rapidly
> > turned up an alarming crash that instant-reboots my core2 test machine.
>
> Urgh, of course :/
I
On Mon, Sep 08, 2014 at 01:47:11PM -0400, Vince Weaver wrote:
> Hello
>
> so I finally had time to run my perf_fuzzer again and it has rapidly
> turned up an alarming crash that instant-reboots my core2 test machine.
Urgh, of course :/
pgp64KOaVtMNd.pgp
Description: PGP signature
Hello
so I finally had time to run my perf_fuzzer again and it has rapidly
turned up an alarming crash that instant-reboots my core2 test machine.
This is on 3.17-rc4. It is reproducible.
The first time all that appeared on the serial console was:
[ 2616.535995] Kernel panic - not syncing: Lo
25 matches
Mail list logo