Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-27 Thread Tejun Heo
On Wed, Apr 27, 2016 at 1:50 AM, Hannes Reinecke wrote: >> that, I'd be happy to apply it to wq/for-3.7. >> ^^^ > Ah. Time warp. > I knew it would happen eventually :-) lol wrong tree too. Sorry about that. -- tejun

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-27 Thread Tejun Heo
On Wed, Apr 27, 2016 at 1:50 AM, Hannes Reinecke wrote: >> that, I'd be happy to apply it to wq/for-3.7. >> ^^^ > Ah. Time warp. > I knew it would happen eventually :-) lol wrong tree too. Sorry about that. -- tejun

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Hannes Reinecke
On 04/26/2016 07:45 PM, Tejun Heo wrote: > Hello, Peter. > > On Tue, Apr 26, 2016 at 10:27:59AM -0700, Peter Hurley wrote: >>> It's unlikely to make any measureable difference. Is xchg() actually >>> cheaper than store + rmb? >> >> store + mfence (full barrier), yes. Roughly 2x faster. >> >>

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Hannes Reinecke
On 04/26/2016 07:45 PM, Tejun Heo wrote: > Hello, Peter. > > On Tue, Apr 26, 2016 at 10:27:59AM -0700, Peter Hurley wrote: >>> It's unlikely to make any measureable difference. Is xchg() actually >>> cheaper than store + rmb? >> >> store + mfence (full barrier), yes. Roughly 2x faster. >> >>

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Peter Hurley
On 04/26/2016 10:45 AM, Tejun Heo wrote: > As long as what's happening is clearly documented, I think either is > fine. I'm gonna go with Roman's mb patch for -stable fix but think > it'd be nice to have a separate patch to consolidate the paths which > clear PENDING and make them use xchg. If

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Peter Hurley
On 04/26/2016 10:45 AM, Tejun Heo wrote: > As long as what's happening is clearly documented, I think either is > fine. I'm gonna go with Roman's mb patch for -stable fix but think > it'd be nice to have a separate patch to consolidate the paths which > clear PENDING and make them use xchg. If

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Tejun Heo
Hello, Peter. On Tue, Apr 26, 2016 at 10:27:59AM -0700, Peter Hurley wrote: > > It's unlikely to make any measureable difference. Is xchg() actually > > cheaper than store + rmb? > > store + mfence (full barrier), yes. Roughly 2x faster. > > https://lkml.org/lkml/2015/11/2/607 Ah, didn't know

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Tejun Heo
Hello, Peter. On Tue, Apr 26, 2016 at 10:27:59AM -0700, Peter Hurley wrote: > > It's unlikely to make any measureable difference. Is xchg() actually > > cheaper than store + rmb? > > store + mfence (full barrier), yes. Roughly 2x faster. > > https://lkml.org/lkml/2015/11/2/607 Ah, didn't know

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Peter Hurley
Hi Tejun, On 04/26/2016 08:15 AM, Tejun Heo wrote: > Hello, Peter. > > On Mon, Apr 25, 2016 at 06:22:01PM -0700, Peter Hurley wrote: >> This is the same bug I wrote about 2 yrs ago (but with the wrong fix). >> >> http://lkml.iu.edu/hypermail/linux/kernel/1402.2/04697.html >> >> Unfortunately I

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Peter Hurley
Hi Tejun, On 04/26/2016 08:15 AM, Tejun Heo wrote: > Hello, Peter. > > On Mon, Apr 25, 2016 at 06:22:01PM -0700, Peter Hurley wrote: >> This is the same bug I wrote about 2 yrs ago (but with the wrong fix). >> >> http://lkml.iu.edu/hypermail/linux/kernel/1402.2/04697.html >> >> Unfortunately I

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Tejun Heo
Hello, Peter. On Mon, Apr 25, 2016 at 06:22:01PM -0700, Peter Hurley wrote: > This is the same bug I wrote about 2 yrs ago (but with the wrong fix). > > http://lkml.iu.edu/hypermail/linux/kernel/1402.2/04697.html > > Unfortunately I didn't have a reproducer at all :/ Ah, bummer. > The

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Tejun Heo
Hello, Peter. On Mon, Apr 25, 2016 at 06:22:01PM -0700, Peter Hurley wrote: > This is the same bug I wrote about 2 yrs ago (but with the wrong fix). > > http://lkml.iu.edu/hypermail/linux/kernel/1402.2/04697.html > > Unfortunately I didn't have a reproducer at all :/ Ah, bummer. > The

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Peter Hurley
On 04/25/2016 08:48 AM, Tejun Heo wrote: > Hello, Roman. > > On Mon, Apr 25, 2016 at 05:22:51PM +0200, Roman Pen wrote: > ... >> CPU#6CPU#2 >> reqeust 884000343600 inserted >> hctx marked as pended >> kblockd_schedule...() returns 1 >> >> ***

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Peter Hurley
On 04/25/2016 08:48 AM, Tejun Heo wrote: > Hello, Roman. > > On Mon, Apr 25, 2016 at 05:22:51PM +0200, Roman Pen wrote: > ... >> CPU#6CPU#2 >> reqeust 884000343600 inserted >> hctx marked as pended >> kblockd_schedule...() returns 1 >> >> ***

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Tejun Heo
Hello, Roman. On Mon, Apr 25, 2016 at 07:39:52PM +0200, Roman Penyaev wrote: > Ok, that's clear now. Thanks. I was confused also by a spin lock, which > is being released just after clear pending: > >set_work_pool_and_clear_pending(work, pool->id); >spin_unlock_irq(>lock); >... >

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Tejun Heo
Hello, Roman. On Mon, Apr 25, 2016 at 07:39:52PM +0200, Roman Penyaev wrote: > Ok, that's clear now. Thanks. I was confused also by a spin lock, which > is being released just after clear pending: > >set_work_pool_and_clear_pending(work, pool->id); >spin_unlock_irq(>lock); >... >

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Roman Penyaev
On Mon, Apr 25, 2016 at 7:03 PM, Tejun Heo wrote: > Hello, Roman. > > On Mon, Apr 25, 2016 at 06:34:45PM +0200, Roman Penyaev wrote: >> I can assure you that smp_mb() helps (at least running for 30 minutes >> under IO). That was my first variant, but I did not like it because I

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Roman Penyaev
On Mon, Apr 25, 2016 at 7:03 PM, Tejun Heo wrote: > Hello, Roman. > > On Mon, Apr 25, 2016 at 06:34:45PM +0200, Roman Penyaev wrote: >> I can assure you that smp_mb() helps (at least running for 30 minutes >> under IO). That was my first variant, but I did not like it because I >> could not

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Tejun Heo
Hello, Roman. On Mon, Apr 25, 2016 at 06:34:45PM +0200, Roman Penyaev wrote: > I can assure you that smp_mb() helps (at least running for 30 minutes > under IO). That was my first variant, but I did not like it because I > could not explain myself why: > > 1. not smp_wmb()? We need to do flush

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Tejun Heo
Hello, Roman. On Mon, Apr 25, 2016 at 06:34:45PM +0200, Roman Penyaev wrote: > I can assure you that smp_mb() helps (at least running for 30 minutes > under IO). That was my first variant, but I did not like it because I > could not explain myself why: > > 1. not smp_wmb()? We need to do flush

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Roman Penyaev
On Mon, Apr 25, 2016 at 6:00 PM, Tejun Heo wrote: > On Mon, Apr 25, 2016 at 11:48:47AM -0400, Tejun Heo wrote: >> Heh, excellent debugging. I wonder how old this bug is. cc'ing David > > I just went through the history. So, we used to have clear_bit() > instead of

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Roman Penyaev
On Mon, Apr 25, 2016 at 6:00 PM, Tejun Heo wrote: > On Mon, Apr 25, 2016 at 11:48:47AM -0400, Tejun Heo wrote: >> Heh, excellent debugging. I wonder how old this bug is. cc'ing David > > I just went through the history. So, we used to have clear_bit() > instead of atomic_long_set() but

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Roman Penyaev
Hello, Tejun, On Mon, Apr 25, 2016 at 5:48 PM, Tejun Heo wrote: > Hello, Roman. > > On Mon, Apr 25, 2016 at 05:22:51PM +0200, Roman Pen wrote: > ... >> CPU#6CPU#2 >> reqeust 884000343600 inserted >> hctx marked as pended >>

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Roman Penyaev
Hello, Tejun, On Mon, Apr 25, 2016 at 5:48 PM, Tejun Heo wrote: > Hello, Roman. > > On Mon, Apr 25, 2016 at 05:22:51PM +0200, Roman Pen wrote: > ... >> CPU#6CPU#2 >> reqeust 884000343600 inserted >> hctx marked as pended >> kblockd_schedule...()

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Tejun Heo
On Mon, Apr 25, 2016 at 11:48:47AM -0400, Tejun Heo wrote: > Heh, excellent debugging. I wonder how old this bug is. cc'ing David I just went through the history. So, we used to have clear_bit() instead of atomic_long_set() but clear_bit() doesn't imply any barrier either, so kudos to you.

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Tejun Heo
On Mon, Apr 25, 2016 at 11:48:47AM -0400, Tejun Heo wrote: > Heh, excellent debugging. I wonder how old this bug is. cc'ing David I just went through the history. So, we used to have clear_bit() instead of atomic_long_set() but clear_bit() doesn't imply any barrier either, so kudos to you.

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Tejun Heo
Hello, Roman. On Mon, Apr 25, 2016 at 05:22:51PM +0200, Roman Pen wrote: ... > CPU#6CPU#2 > reqeust 884000343600 inserted > hctx marked as pended > kblockd_schedule...() returns 1 > > *** WORK_STRUCT_PENDING_BIT is cleared *** > flush_busy_ctxs()

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Tejun Heo
Hello, Roman. On Mon, Apr 25, 2016 at 05:22:51PM +0200, Roman Pen wrote: ... > CPU#6CPU#2 > reqeust 884000343600 inserted > hctx marked as pended > kblockd_schedule...() returns 1 > > *** WORK_STRUCT_PENDING_BIT is cleared *** > flush_busy_ctxs()

[PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Roman Pen
Hi, This is RFC, because for last couple of days I hunt a mystery bug and since now I do not have a strong feeling that the following story is nothing but bug's trick and attempt to cheat me. [Sorry, the whole story and explanation are quite long] The bug is reproduced quite often on our server

[PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Roman Pen
Hi, This is RFC, because for last couple of days I hunt a mystery bug and since now I do not have a strong feeling that the following story is nothing but bug's trick and attempt to cheat me. [Sorry, the whole story and explanation are quite long] The bug is reproduced quite often on our server