Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-05 Thread Ben Greear
On 06/05/2013 11:48 AM, Tejun Heo wrote: Hello, Ben. On Wed, Jun 05, 2013 at 09:59:00AM -0700, Ben Greear wrote: One pattern I notice repeating for at least most of the hangs is that all but one CPU thread has irqs disabled and is in state 2. But, there will be one thread in state 1 that

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-05 Thread Tejun Heo
Hello, Ben. On Wed, Jun 05, 2013 at 09:59:00AM -0700, Ben Greear wrote: > One pattern I notice repeating for at least most of the hangs is that all but > one > CPU thread has irqs disabled and is in state 2. But, there will be one thread > in state 1 that still has IRQs enabled and it is

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-05 Thread Rusty Russell
Greg KH writes: > On Mon, Jun 03, 2013 at 10:17:17AM -0400, Joe Lawrence wrote: >> [Cc: sta...@vger.kernel.org] >> >> Third time is a charm? The stable address was incorrect from the first >> msg in this thread, but the relevant bits remain quoted below... > > Really? I'm totally confused...

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-05 Thread Tejun Heo
Hello, On Wed, Jun 05, 2013 at 01:47:43PM +0930, Rusty Russell wrote: > > I have some printk debugging in (see bottom of email) and was using a > > serial console, so things > > were probably running a bit slower than on most systems. Here is trace > > from my kernel with local patches and not

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-05 Thread Tejun Heo
Hello, On Wed, Jun 05, 2013 at 01:47:43PM +0930, Rusty Russell wrote: I have some printk debugging in (see bottom of email) and was using a serial console, so things were probably running a bit slower than on most systems. Here is trace from my kernel with local patches and not so much

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-05 Thread Rusty Russell
Greg KH gre...@linuxfoundation.org writes: On Mon, Jun 03, 2013 at 10:17:17AM -0400, Joe Lawrence wrote: [Cc: sta...@vger.kernel.org] Third time is a charm? The stable address was incorrect from the first msg in this thread, but the relevant bits remain quoted below... Really? I'm

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-05 Thread Tejun Heo
Hello, Ben. On Wed, Jun 05, 2013 at 09:59:00AM -0700, Ben Greear wrote: One pattern I notice repeating for at least most of the hangs is that all but one CPU thread has irqs disabled and is in state 2. But, there will be one thread in state 1 that still has IRQs enabled and it is reported

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-05 Thread Ben Greear
On 06/05/2013 11:48 AM, Tejun Heo wrote: Hello, Ben. On Wed, Jun 05, 2013 at 09:59:00AM -0700, Ben Greear wrote: One pattern I notice repeating for at least most of the hangs is that all but one CPU thread has irqs disabled and is in state 2. But, there will be one thread in state 1 that

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Greg KH
On Mon, Jun 03, 2013 at 10:17:17AM -0400, Joe Lawrence wrote: > [Cc: sta...@vger.kernel.org] > > Third time is a charm? The stable address was incorrect from the first > msg in this thread, but the relevant bits remain quoted below... Really? I'm totally confused... > On Mon, 3 Jun 2013, Joe

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Rusty Russell
Joe Lawrence writes: > On Tue, 04 Jun 2013 15:26:28 +0930 > Rusty Russell wrote: > >> Do you have a backtrace of the 3.9.4 crash? You can add "CFLAGS_module.o >> = -O0" to get a clearer backtrace if you want... > > Hi Rusty, > > See my 3.9 stack traces below, which may or may not be what Ben

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Rusty Russell
Ben Greear writes: > On 06/04/2013 09:53 AM, Ben Greear wrote: >> On 06/04/2013 07:07 AM, Joe Lawrence wrote: >>> On Tue, 04 Jun 2013 15:26:28 +0930 >>> Rusty Russell wrote: >>> Do you have a backtrace of the 3.9.4 crash? You can add "CFLAGS_module.o = -O0" to get a clearer backtrace

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Ben Greear
On 06/04/2013 09:53 AM, Ben Greear wrote: On 06/04/2013 07:07 AM, Joe Lawrence wrote: On Tue, 04 Jun 2013 15:26:28 +0930 Rusty Russell wrote: Do you have a backtrace of the 3.9.4 crash? You can add "CFLAGS_module.o = -O0" to get a clearer backtrace if you want... Hi Rusty, See my 3.9

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Joe Lawrence
On Tue, 4 Jun 2013, Joe Lawrence wrote: > Hi Rusty, > > See my 3.9 stack traces below, which may or may not be what Ben had > been seeing. If you like, I can try a similar loop as the one you were > testing in the other email. With a modified version of your module load/unload loop (only

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Ben Greear
On 06/04/2013 07:07 AM, Joe Lawrence wrote: On Tue, 04 Jun 2013 15:26:28 +0930 Rusty Russell wrote: Do you have a backtrace of the 3.9.4 crash? You can add "CFLAGS_module.o = -O0" to get a clearer backtrace if you want... Hi Rusty, See my 3.9 stack traces below, which may or may not be

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Joe Lawrence
On Tue, 04 Jun 2013 15:26:28 +0930 Rusty Russell wrote: > Do you have a backtrace of the 3.9.4 crash? You can add "CFLAGS_module.o > = -O0" to get a clearer backtrace if you want... Hi Rusty, See my 3.9 stack traces below, which may or may not be what Ben had been seeing. If you like, I can

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Rusty Russell
Ben Greear writes: > On 06/03/2013 08:59 AM, Ben Greear wrote: >> On 06/03/2013 07:17 AM, Joe Lawrence wrote: >> > Hi Rusty, > > I had pointed Ben (offlist) to that bugzilla entry without realizing > there were other earlier related fixes in this space. Re-viewing bz- >

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Rusty Russell
Ben Greear writes: >> It at least works around the problem for me as well. But, a more rare >> migration/[0-3] (I think) related lockup still exists in 3.9.4 for me, >> so I will also try applying that other kobject patch and continue testing >> today... > > Well, that other kobject patch is

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Rusty Russell
Ben Greear gree...@candelatech.com writes: It at least works around the problem for me as well. But, a more rare migration/[0-3] (I think) related lockup still exists in 3.9.4 for me, so I will also try applying that other kobject patch and continue testing today... Well, that other kobject

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Rusty Russell
Ben Greear gree...@candelatech.com writes: On 06/03/2013 08:59 AM, Ben Greear wrote: On 06/03/2013 07:17 AM, Joe Lawrence wrote: Hi Rusty, I had pointed Ben (offlist) to that bugzilla entry without realizing there were other earlier related fixes in this space. Re-viewing bz- 58011, it

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Joe Lawrence
On Tue, 04 Jun 2013 15:26:28 +0930 Rusty Russell ru...@rustcorp.com.au wrote: Do you have a backtrace of the 3.9.4 crash? You can add CFLAGS_module.o = -O0 to get a clearer backtrace if you want... Hi Rusty, See my 3.9 stack traces below, which may or may not be what Ben had been seeing. If

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Ben Greear
On 06/04/2013 07:07 AM, Joe Lawrence wrote: On Tue, 04 Jun 2013 15:26:28 +0930 Rusty Russell ru...@rustcorp.com.au wrote: Do you have a backtrace of the 3.9.4 crash? You can add CFLAGS_module.o = -O0 to get a clearer backtrace if you want... Hi Rusty, See my 3.9 stack traces below, which

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Joe Lawrence
On Tue, 4 Jun 2013, Joe Lawrence wrote: Hi Rusty, See my 3.9 stack traces below, which may or may not be what Ben had been seeing. If you like, I can try a similar loop as the one you were testing in the other email. With a modified version of your module load/unload loop (only needed

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Ben Greear
On 06/04/2013 09:53 AM, Ben Greear wrote: On 06/04/2013 07:07 AM, Joe Lawrence wrote: On Tue, 04 Jun 2013 15:26:28 +0930 Rusty Russell ru...@rustcorp.com.au wrote: Do you have a backtrace of the 3.9.4 crash? You can add CFLAGS_module.o = -O0 to get a clearer backtrace if you want... Hi

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Rusty Russell
Joe Lawrence joe.lawre...@stratus.com writes: On Tue, 04 Jun 2013 15:26:28 +0930 Rusty Russell ru...@rustcorp.com.au wrote: Do you have a backtrace of the 3.9.4 crash? You can add CFLAGS_module.o = -O0 to get a clearer backtrace if you want... Hi Rusty, See my 3.9 stack traces below,

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Rusty Russell
Ben Greear gree...@candelatech.com writes: On 06/04/2013 09:53 AM, Ben Greear wrote: On 06/04/2013 07:07 AM, Joe Lawrence wrote: On Tue, 04 Jun 2013 15:26:28 +0930 Rusty Russell ru...@rustcorp.com.au wrote: Do you have a backtrace of the 3.9.4 crash? You can add CFLAGS_module.o = -O0 to

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-04 Thread Greg KH
On Mon, Jun 03, 2013 at 10:17:17AM -0400, Joe Lawrence wrote: [Cc: sta...@vger.kernel.org] Third time is a charm? The stable address was incorrect from the first msg in this thread, but the relevant bits remain quoted below... Really? I'm totally confused... On Mon, 3 Jun 2013, Joe

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-03 Thread Ben Greear
On 06/03/2013 08:59 AM, Ben Greear wrote: On 06/03/2013 07:17 AM, Joe Lawrence wrote: Hi Rusty, I had pointed Ben (offlist) to that bugzilla entry without realizing there were other earlier related fixes in this space. Re-viewing bz- 58011, it looks like it was opened against 3.8.12, while

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-03 Thread Ben Greear
On 06/03/2013 07:17 AM, Joe Lawrence wrote: Hi Rusty, I had pointed Ben (offlist) to that bugzilla entry without realizing there were other earlier related fixes in this space. Re-viewing bz- 58011, it looks like it was opened against 3.8.12, while Ben and myself had encountered module

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-03 Thread Joe Lawrence
[Cc: sta...@vger.kernel.org] Third time is a charm? The stable address was incorrect from the first msg in this thread, but the relevant bits remain quoted below... On Mon, 3 Jun 2013, Joe Lawrence wrote: > [fixing Cc: sta...@kernel.org address] > > On Sun, 2 Jun 2013, Joe Lawrence wrote: >

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-03 Thread Joe Lawrence
[fixing Cc: sta...@kernel.org address] On Sun, 2 Jun 2013, Joe Lawrence wrote: > On Sun, 2 Jun 2013, Rusty Russell wrote: > > > Ben Greear writes: > > > > > It turns out, the bug I spent yesterday chasing in various 3.9 kernels is > > > apparently > > > fixed by the commit in the title > >

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-03 Thread Joe Lawrence
[fixing Cc: sta...@kernel.org address] On Sun, 2 Jun 2013, Joe Lawrence wrote: On Sun, 2 Jun 2013, Rusty Russell wrote: Ben Greear gree...@candelatech.com writes: It turns out, the bug I spent yesterday chasing in various 3.9 kernels is apparently fixed by the commit in the

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-03 Thread Joe Lawrence
[Cc: sta...@vger.kernel.org] Third time is a charm? The stable address was incorrect from the first msg in this thread, but the relevant bits remain quoted below... On Mon, 3 Jun 2013, Joe Lawrence wrote: [fixing Cc: sta...@kernel.org address] On Sun, 2 Jun 2013, Joe Lawrence wrote:

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-03 Thread Ben Greear
On 06/03/2013 07:17 AM, Joe Lawrence wrote: Hi Rusty, I had pointed Ben (offlist) to that bugzilla entry without realizing there were other earlier related fixes in this space. Re-viewing bz- 58011, it looks like it was opened against 3.8.12, while Ben and myself had encountered module

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-03 Thread Ben Greear
On 06/03/2013 08:59 AM, Ben Greear wrote: On 06/03/2013 07:17 AM, Joe Lawrence wrote: Hi Rusty, I had pointed Ben (offlist) to that bugzilla entry without realizing there were other earlier related fixes in this space. Re-viewing bz- 58011, it looks like it was opened against 3.8.12, while

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-02 Thread Joe Lawrence
On Sun, 2 Jun 2013, Rusty Russell wrote: > Ben Greear writes: > > > It turns out, the bug I spent yesterday chasing in various 3.9 kernels is > > apparently > > fixed by the commit in the title (c9c390bb5535380d40614571894ef0c00bc026ff). > > Apparently being the operative word. > > This

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-02 Thread Rusty Russell
Ben Greear writes: > It turns out, the bug I spent yesterday chasing in various 3.9 kernels is > apparently > fixed by the commit in the title (c9c390bb5535380d40614571894ef0c00bc026ff). Apparently being the operative word. This commit avoids the entire "module insert failed due to sysfs

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-02 Thread Rusty Russell
Ben Greear gree...@candelatech.com writes: It turns out, the bug I spent yesterday chasing in various 3.9 kernels is apparently fixed by the commit in the title (c9c390bb5535380d40614571894ef0c00bc026ff). Apparently being the operative word. This commit avoids the entire module insert

Re: Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-06-02 Thread Joe Lawrence
On Sun, 2 Jun 2013, Rusty Russell wrote: Ben Greear gree...@candelatech.com writes: It turns out, the bug I spent yesterday chasing in various 3.9 kernels is apparently fixed by the commit in the title (c9c390bb5535380d40614571894ef0c00bc026ff). Apparently being the operative word.

Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-05-31 Thread Ben Greear
It turns out, the bug I spent yesterday chasing in various 3.9 kernels is apparently fixed by the commit in the title (c9c390bb5535380d40614571894ef0c00bc026ff). Fortunately, Joe Lawrence somehow saw my email to lkml and pointed me to the bug report below, which mentions the commit...

Please add to stable: module: don't unlink the module until we've removed all exposure.

2013-05-31 Thread Ben Greear
It turns out, the bug I spent yesterday chasing in various 3.9 kernels is apparently fixed by the commit in the title (c9c390bb5535380d40614571894ef0c00bc026ff). Fortunately, Joe Lawrence somehow saw my email to lkml and pointed me to the bug report below, which mentions the commit...