Re: corruption causing crash in __queue_work

2015-12-21 Thread Tejun Heo
On Mon, Dec 21, 2015 at 4:44 PM, Tejun Heo wrote: > On Sat, Dec 19, 2015 at 03:34:45PM +0200, Nikolay Borisov wrote: >> Ping as Tejun might have missed this email. I'm also interested in knowing >> the logic behind the comment. > > Didn't I already reply to that? > > http://thread.gmane.org/gmane.

Re: corruption causing crash in __queue_work

2015-12-21 Thread Tejun Heo
On Sat, Dec 19, 2015 at 03:34:45PM +0200, Nikolay Borisov wrote: > Ping as Tejun might have missed this email. I'm also interested in knowing > the logic behind the comment. Didn't I already reply to that? http://thread.gmane.org/gmane.linux.kernel/2104051 -- tejun -- To unsubscribe from this l

Re: corruption causing crash in __queue_work

2015-12-17 Thread Mike Snitzer
On Thu, Dec 17 2015 at 10:50am -0500, Tejun Heo wrote: > Hello, Nikolay. > > On Thu, Dec 17, 2015 at 05:43:12PM +0200, Nikolay Borisov wrote: > > Right, but my initial understanding was that when canceling the delayed > > work and then issuing flush_workqueue would act the same way as if > > can

Re: corruption causing crash in __queue_work

2015-12-17 Thread Tejun Heo
Hello, Nikolay. On Thu, Dec 17, 2015 at 05:43:12PM +0200, Nikolay Borisov wrote: > Right, but my initial understanding was that when canceling the delayed > work and then issuing flush_workqueue would act the same way as if > cancel_delayed_work_sync is called wrt to this particular delayed item,

Re: corruption causing crash in __queue_work

2015-12-17 Thread Nikolay Borisov
On 12/17/2015 05:33 PM, Tejun Heo wrote: > Hello, Nikolay. > > On Thu, Dec 17, 2015 at 12:46:10PM +0200, Nikolay Borisov wrote: >> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c >> index 493c38e08bd2..ccbbf7823cf3 100644 >> --- a/drivers/md/dm-thin.c >> +++ b/drivers/md/dm-thin.c >> @@

Re: corruption causing crash in __queue_work

2015-12-17 Thread Tejun Heo
Hello, Nikolay. On Thu, Dec 17, 2015 at 12:46:10PM +0200, Nikolay Borisov wrote: > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c > index 493c38e08bd2..ccbbf7823cf3 100644 > --- a/drivers/md/dm-thin.c > +++ b/drivers/md/dm-thin.c > @@ -3506,8 +3506,8 @@ static void pool_postsuspend(struc

Re: corruption causing crash in __queue_work

2015-12-17 Thread Nikolay Borisov
On 12/14/2015 10:31 PM, Mike Snitzer wrote: > On Mon, Dec 14 2015 at 3:11pm -0500, > Nikolay Borisov wrote: > >> On Mon, Dec 14, 2015 at 5:31 PM, Mike Snitzer wrote: >>> On Mon, Dec 14 2015 at 3:41P -0500, >>> Nikolay Borisov wrote: >>> Had another poke at the backtrace that is produce

Re: corruption causing crash in __queue_work

2015-12-14 Thread Mike Snitzer
On Mon, Dec 14 2015 at 3:11pm -0500, Nikolay Borisov wrote: > On Mon, Dec 14, 2015 at 5:31 PM, Mike Snitzer wrote: > > On Mon, Dec 14 2015 at 3:41P -0500, > > Nikolay Borisov wrote: > > > >> Had another poke at the backtrace that is produced and here what the > >> delayed_work looks like: > >

Re: corruption causing crash in __queue_work

2015-12-14 Thread Nikolay Borisov
On Mon, Dec 14, 2015 at 5:31 PM, Mike Snitzer wrote: > On Mon, Dec 14 2015 at 3:41P -0500, > Nikolay Borisov wrote: > >> Had another poke at the backtrace that is produced and here what the >> delayed_work looks like: >> >> crash> struct delayed_work 88036772c8c0 >> struct delayed_work { >>

Re: corruption causing crash in __queue_work

2015-12-14 Thread Mike Snitzer
On Mon, Dec 14 2015 at 3:41P -0500, Nikolay Borisov wrote: > Had another poke at the backtrace that is produced and here what the > delayed_work looks like: > > crash> struct delayed_work 88036772c8c0 > struct delayed_work { > work = { > data = { > counter = 1537 > }, >

Re: corruption causing crash in __queue_work

2015-12-14 Thread Nikolay Borisov
On 12/11/2015 07:08 PM, Tejun Heo wrote: > Hello, Nikolay. > > On Fri, Dec 11, 2015 at 05:57:22PM +0200, Nikolay Borisov wrote: >> So I had a server with the patch just crash on me: >> >> Here is how the queue looks like: >> crash> struct workqueue_struct 0x8802420a4a00 >> struct workqueue_

Re: corruption causing crash in __queue_work

2015-12-12 Thread Nikolay Borisov
On 12/11/2015 09:14 PM, Mike Snitzer wrote: > On Fri, Dec 11 2015 at 1:00pm -0500, > Nikolay Borisov wrote: > >> On Fri, Dec 11, 2015 at 7:08 PM, Tejun Heo wrote: >>> >>> Hmmm... No idea why it didn't show up in the debug log but the only >>> way a workqueue could be in the above state is eit

Re: corruption causing crash in __queue_work

2015-12-11 Thread Mike Snitzer
On Fri, Dec 11 2015 at 1:00pm -0500, Nikolay Borisov wrote: > On Fri, Dec 11, 2015 at 7:08 PM, Tejun Heo wrote: > > > > Hmmm... No idea why it didn't show up in the debug log but the only > > way a workqueue could be in the above state is either it got > > explicitly destroyed or somehow pwq re

Re: corruption causing crash in __queue_work

2015-12-11 Thread Nikolay Borisov
On Fri, Dec 11, 2015 at 7:08 PM, Tejun Heo wrote: > Hello, Nikolay. > > On Fri, Dec 11, 2015 at 05:57:22PM +0200, Nikolay Borisov wrote: >> So I had a server with the patch just crash on me: >> >> Here is how the queue looks like: >> crash> struct workqueue_struct 0x8802420a4a00 >> struct wor

Re: corruption causing crash in __queue_work

2015-12-11 Thread Tejun Heo
Hello, Nikolay. On Fri, Dec 11, 2015 at 05:57:22PM +0200, Nikolay Borisov wrote: > So I had a server with the patch just crash on me: > > Here is how the queue looks like: > crash> struct workqueue_struct 0x8802420a4a00 > struct workqueue_struct { > pwqs = { > next = 0x8802420a4c00

Re: corruption causing crash in __queue_work

2015-12-11 Thread Nikolay Borisov
On 12/10/2015 05:29 PM, Tejun Heo wrote: > On Thu, Dec 10, 2015 at 11:28:02AM +0200, Nikolay Borisov wrote: >> On 12/09/2015 06:27 PM, Tejun Heo wrote: >>> Hello, >>> >>> On Wed, Dec 09, 2015 at 06:23:15PM +0200, Nikolay Borisov wrote: I think we are seeing this at least daily on at least 1

Re: corruption causing crash in __queue_work

2015-12-10 Thread Tejun Heo
On Thu, Dec 10, 2015 at 11:28:02AM +0200, Nikolay Borisov wrote: > On 12/09/2015 06:27 PM, Tejun Heo wrote: > > Hello, > > > > On Wed, Dec 09, 2015 at 06:23:15PM +0200, Nikolay Borisov wrote: > >> I think we are seeing this at least daily on at least 1 server (we have > >> multiple servers like th

Re: corruption causing crash in __queue_work

2015-12-10 Thread Nikolay Borisov
On 12/09/2015 06:27 PM, Tejun Heo wrote: > Hello, > > On Wed, Dec 09, 2015 at 06:23:15PM +0200, Nikolay Borisov wrote: >> I think we are seeing this at least daily on at least 1 server (we have >> multiple servers like that). So adding printk's would likely be the way >> to go, anything in parti

Re: corruption causing crash in __queue_work

2015-12-09 Thread Nikolay Borisov
On 12/09/2015 06:08 PM, Tejun Heo wrote: > Hello, Nikolay. > > On Wed, Dec 09, 2015 at 02:08:56PM +0200, Nikolay Borisov wrote: >> 73309.529940] BUG: unable to handle kernel NULL pointer dereference at >> (null) >> [73309.530238] IP: [] __queue_work+0xb3/0x390 > ... >> [73309.537319]

Re: corruption causing crash in __queue_work

2015-12-09 Thread Tejun Heo
Hello, On Wed, Dec 09, 2015 at 06:23:15PM +0200, Nikolay Borisov wrote: > I think we are seeing this at least daily on at least 1 server (we have > multiple servers like that). So adding printk's would likely be the way > to go, anything in particular you might be interested in knowing? I see > RC

Re: corruption causing crash in __queue_work

2015-12-09 Thread Tejun Heo
Hello, Nikolay. On Wed, Dec 09, 2015 at 02:08:56PM +0200, Nikolay Borisov wrote: > 73309.529940] BUG: unable to handle kernel NULL pointer dereference at >(null) > [73309.530238] IP: [] __queue_work+0xb3/0x390 ... > [73309.537319] > [73309.537373] [] ? __queue_work+0x390/0x390 > [7

corruption causing crash in __queue_work

2015-12-09 Thread Nikolay Borisov
Hello Tejun, I've been observing the following crashes on kernel 4.2.6 : 73309.529940] BUG: unable to handle kernel NULL pointer dereference at (null) [73309.530238] IP: [] __queue_work+0xb3/0x390 [73309.530466] PGD 0 [73309.530681] Oops: [#1] SMP [73309.530947] Modules linked