Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Lei Wen
On Tue, Mar 12, 2013 at 2:13 PM, Tejun Heo wrote: > On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote: >> I see... >> How about only check those workqueue structure not on stack? >> For current onstack usage is rare, and should be easier to check with. > > No, kzalloc is not required. The

Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Tejun Heo
On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote: > I see... > How about only check those workqueue structure not on stack? > For current onstack usage is rare, and should be easier to check with. No, kzalloc is not required. The memory area can come from any source. If you're interested

Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Lei Wen
On Tue, Mar 12, 2013 at 1:40 PM, Tejun Heo wrote: > On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote: >> > Memory areas aren't always zero on allocation. >> >> Shouldn't work structure be allocated with kzalloc? > > It's not required to. work_struct can also be on stack. It's "init" >

Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Lei Wen
On Tue, Mar 12, 2013 at 1:40 PM, Tejun Heo t...@kernel.org wrote: On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote: Memory areas aren't always zero on allocation. Shouldn't work structure be allocated with kzalloc? It's not required to. work_struct can also be on stack. It's init

Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Tejun Heo
On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote: I see... How about only check those workqueue structure not on stack? For current onstack usage is rare, and should be easier to check with. No, kzalloc is not required. The memory area can come from any source. If you're interested in

Re: workqueue panic in 3.4 kernel

2013-03-12 Thread Lei Wen
On Tue, Mar 12, 2013 at 2:13 PM, Tejun Heo t...@kernel.org wrote: On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote: I see... How about only check those workqueue structure not on stack? For current onstack usage is rare, and should be easier to check with. No, kzalloc is not required.

Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote: > > Memory areas aren't always zero on allocation. > > Shouldn't work structure be allocated with kzalloc? It's not required to. work_struct can also be on stack. It's "init" after all. Also, if you require clearing the memory before

Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Lei Wen
On Tue, Mar 12, 2013 at 1:24 PM, Tejun Heo wrote: > On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote: >> > You're initializing random piece of memory which may contain any >> > garbage and triggering BUG if some bit is set on it. No, you can't do >> > that. debugobj is the right tool for

Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote: > > You're initializing random piece of memory which may contain any > > garbage and triggering BUG if some bit is set on it. No, you can't do > > that. debugobj is the right tool for debugging object lifetime issues > > and is already

Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Lei Wen
Tejun, On Tue, Mar 12, 2013 at 1:12 PM, Tejun Heo wrote: > Hello, > > On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote: >> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h >> index 8afab27..425d5a2 100644 >> --- a/include/linux/workqueue.h >> +++

Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
Hello, On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote: > diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h > index 8afab27..425d5a2 100644 > --- a/include/linux/workqueue.h > +++ b/include/linux/workqueue.h > @@ -189,12 +189,16 @@ static inline unsigned int

Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
Hello, On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote: diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 8afab27..425d5a2 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -189,12 +189,16 @@ static inline unsigned int work_static(struct

Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Lei Wen
Tejun, On Tue, Mar 12, 2013 at 1:12 PM, Tejun Heo t...@kernel.org wrote: Hello, On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote: diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 8afab27..425d5a2 100644 --- a/include/linux/workqueue.h +++

Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote: You're initializing random piece of memory which may contain any garbage and triggering BUG if some bit is set on it. No, you can't do that. debugobj is the right tool for debugging object lifetime issues and is already supported.

Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Lei Wen
On Tue, Mar 12, 2013 at 1:24 PM, Tejun Heo t...@kernel.org wrote: On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote: You're initializing random piece of memory which may contain any garbage and triggering BUG if some bit is set on it. No, you can't do that. debugobj is the right tool

Re: workqueue panic in 3.4 kernel

2013-03-11 Thread Tejun Heo
On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote: Memory areas aren't always zero on allocation. Shouldn't work structure be allocated with kzalloc? It's not required to. work_struct can also be on stack. It's init after all. Also, if you require clearing the memory before initing,

Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Thomas Gleixner
On Thu, 7 Mar 2013, Tejun Heo wrote: > I can't see how something like that would happen and still find it > quite unlikely this would be a generic problem in either timer or > workqueue given how widely those are used and your case is the only > similar case that came up till now (and 3.4 is a

Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Tejun Heo
(cc'ing Thomas, hi!) Hello, Lei is seeing a problem where a delayed_work item gets corrupted (its work->data gets cleared while still queued on the timer). He thinks what's going on is that del_timer() is returning 1 but the timer function still gets executed. On Thu, Mar 07, 2013 at

Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Lei Wen
Tejun, On Thu, Mar 7, 2013 at 9:15 AM, Lei Wen wrote: > Hi Tejun, > > On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo wrote: >> Hello, Lei. >> >> On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: >>> We find a race condition as below: >>> CPU0

Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Lei Wen
Tejun, On Thu, Mar 7, 2013 at 9:15 AM, Lei Wen adrian.w...@gmail.com wrote: Hi Tejun, On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo t...@kernel.org wrote: Hello, Lei. On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: We find a race condition as below: CPU0

Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Tejun Heo
(cc'ing Thomas, hi!) Hello, Lei is seeing a problem where a delayed_work item gets corrupted (its work-data gets cleared while still queued on the timer). He thinks what's going on is that del_timer() is returning 1 but the timer function still gets executed. On Thu, Mar 07, 2013 at 11:22:40PM

Re: workqueue panic in 3.4 kernel

2013-03-07 Thread Thomas Gleixner
On Thu, 7 Mar 2013, Tejun Heo wrote: I can't see how something like that would happen and still find it quite unlikely this would be a generic problem in either timer or workqueue given how widely those are used and your case is the only similar case that came up till now (and 3.4 is a long

Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Lei Wen
Hi Tejun, On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo wrote: > Hello, Lei. > > On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: >> We find a race condition as below: >> CPU0 CPU1 >> timer interrupt happen >> __run_timers >>

Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Tejun Heo
Hello, Lei. On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: > We find a race condition as below: > CPU0 CPU1 > timer interrupt happen > __run_timers >__run_timers::spin_lock_irq(>lock) >

Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Lei Wen
Hi Tejun On Wed, Mar 6, 2013 at 12:32 AM, Tejun Heo wrote: > Hello, > > On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote: >> With checking memory, we find work->data becomes 0x300, when it try >> to call get_work_cwq > > Why would that become 0x300? Who's writing to that memory? Nobody

Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Lei Wen
Hi Tejun On Wed, Mar 6, 2013 at 12:32 AM, Tejun Heo t...@kernel.org wrote: Hello, On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote: With checking memory, we find work-data becomes 0x300, when it try to call get_work_cwq Why would that become 0x300? Who's writing to that memory?

Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Tejun Heo
Hello, Lei. On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: We find a race condition as below: CPU0 CPU1 timer interrupt happen __run_timers __run_timers::spin_lock_irq(base-lock)

Re: workqueue panic in 3.4 kernel

2013-03-06 Thread Lei Wen
Hi Tejun, On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo t...@kernel.org wrote: Hello, Lei. On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: We find a race condition as below: CPU0 CPU1 timer interrupt happen __run_timers

Re: workqueue panic in 3.4 kernel

2013-03-05 Thread Tejun Heo
Hello, On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote: > With checking memory, we find work->data becomes 0x300, when it try > to call get_work_cwq Why would that become 0x300? Who's writing to that memory? Nobody should be. > in delayed_work_timer_fn. Thus cwq becomes NULL before

Re: workqueue panic in 3.4 kernel

2013-03-05 Thread Tejun Heo
Hello, On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote: With checking memory, we find work-data becomes 0x300, when it try to call get_work_cwq Why would that become 0x300? Who's writing to that memory? Nobody should be. in delayed_work_timer_fn. Thus cwq becomes NULL before calls

workqueue panic in 3.4 kernel

2013-03-04 Thread Lei Wen
Hi Tejun, We met one panic issue related workqueue based over 3.4.5 Linux kernel. Panic log as: [153587.035369] Unable to handle kernel NULL pointer dereference at virtual address 0004 [153587.043731] pgd = e1e74000 [153587.046691] [0004] *pgd= [153587.050567] Internal error:

workqueue panic in 3.4 kernel

2013-03-04 Thread Lei Wen
Hi Tejun, We met one panic issue related workqueue based over 3.4.5 Linux kernel. Panic log as: [153587.035369] Unable to handle kernel NULL pointer dereference at virtual address 0004 [153587.043731] pgd = e1e74000 [153587.046691] [0004] *pgd= [153587.050567] Internal error: