Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-21 Thread Jeff Layton
On Fri, 21 Mar 2014 12:32:12 +0400 Pavel Shilovsky wrote: > 2014-03-21 6:23 GMT+04:00 Steven Rostedt : > > On Thu, 20 Mar 2014 17:02:39 -0400 > > Jeff Layton wrote: > > > >> Eventually the server should just allow the read to complete even if > >> the client doesn't respond to the oplock break.

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-21 Thread Steven Rostedt
On Fri, 21 Mar 2014 08:41:28 -0400 Jeff Layton wrote: > That'd probably work fine too. The main point is to make sure oplock > breaks run on a different workqueue from where read or write completion > jobs run since they are operating on the lock_sem. The other jobs that > get queued to cifsiod_

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-21 Thread Jeff Layton
On Fri, 21 Mar 2014 08:17:06 -0400 Steven Rostedt wrote: > On Fri, 21 Mar 2014 12:32:12 +0400 > Pavel Shilovsky wrote: > > > > Read and write codepaths both obtain lock_sem for read and then wait > > for cifsiod_wq to complete and release lock_sem. They don't do any > > lock_sem operations ins

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-21 Thread Steven Rostedt
On Fri, 21 Mar 2014 12:32:12 +0400 Pavel Shilovsky wrote: > Read and write codepaths both obtain lock_sem for read and then wait > for cifsiod_wq to complete and release lock_sem. They don't do any > lock_sem operations inside their work task queued to cifsiod_wq. But > oplock code can obtain/re

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-21 Thread Pavel Shilovsky
2014-03-21 6:23 GMT+04:00 Steven Rostedt : > On Thu, 20 Mar 2014 17:02:39 -0400 > Jeff Layton wrote: > >> Eventually the server should just allow the read to complete even if >> the client doesn't respond to the oplock break. It has to since clients >> can suddenly drop off the net while holding a

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-20 Thread Steven Rostedt
On Thu, 20 Mar 2014 17:02:39 -0400 Jeff Layton wrote: > Eventually the server should just allow the read to complete even if > the client doesn't respond to the oplock break. It has to since clients > can suddenly drop off the net while holding an oplock. That should > allow everything to unwedge

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-20 Thread Steven Rostedt
On Thu, 20 Mar 2014 19:53:46 -0400 Jeff Layton wrote: > Wait...why does the work running on CPU1 end up blocked on down_read()? > Is it really getting stuck on the down_write you mention? > rwsems are fair locks. Readers will not block on a reader lock unless there's a writer waiting. That's t

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-20 Thread Jeff Layton
On Wed, 19 Mar 2014 15:12:52 -0400 Steven Rostedt wrote: > We just had a customer report a deadlock in their 3.8-rt kernel. > Looking into this, it is very possible to have the same deadlock in > mainline in Linus's tree as it stands today. > > It is much easier to deadlock in the -rt kernel bec

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-20 Thread Jeff Layton
On Thu, 20 Mar 2014 16:57:03 -0400 Steven Rostedt wrote: > On Thu, 20 Mar 2014 15:28:33 -0400 > Jeffrey Layton wrote: > > > > Nice analysis! I think eventually we'll need to overhaul this code not > > Note, Ulrich Obergfell helped a bit in the initial analysis. He found > from a customer cor

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-20 Thread Steven Rostedt
On Thu, 20 Mar 2014 15:28:33 -0400 Jeffrey Layton wrote: > Nice analysis! I think eventually we'll need to overhaul this code not Note, Ulrich Obergfell helped a bit in the initial analysis. He found from a customer core dump that the kworker thread was blocked on the cinode->lock_sem, and the

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-20 Thread Jeffrey Layton
On Wed, 19 Mar 2014 15:12:52 -0400 Steven Rostedt wrote: > We just had a customer report a deadlock in their 3.8-rt kernel. > Looking into this, it is very possible to have the same deadlock in > mainline in Linus's tree as it stands today. > > It is much easier to deadlock in the -rt kernel bec

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-19 Thread Tejun Heo
Hello, Steven, Peter. On Wed, Mar 19, 2014 at 08:34:07PM +0100, Peter Zijlstra wrote: > The way I understand workqueues is that we cannot guarantee concurrency > like this. It tries, but there's no guarantee. So, the guarantee is that if a workqueue has WQ_MEM_RECLAIM, it'll always have at least

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-19 Thread Steven Rostedt
On Wed, 19 Mar 2014 20:34:07 +0100 Peter Zijlstra wrote: > On Wed, Mar 19, 2014 at 03:12:52PM -0400, Steven Rostedt wrote: > > My question to Tejun is, if we create another workqueue, to add the > > rdata->work to, would that prevent the above problem? Or what other > > fixes can we do? > > The

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-19 Thread Peter Zijlstra
On Wed, Mar 19, 2014 at 03:43:39PM -0400, Steven Rostedt wrote: > On Wed, 19 Mar 2014 20:34:07 +0100 > Peter Zijlstra wrote: > > > On Wed, Mar 19, 2014 at 03:12:52PM -0400, Steven Rostedt wrote: > > > My question to Tejun is, if we create another workqueue, to add the > > > rdata->work to, would

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-19 Thread Steven Rostedt
On Wed, 19 Mar 2014 15:43:39 -0400 Steven Rostedt wrote: > On Wed, 19 Mar 2014 20:34:07 +0100 > Peter Zijlstra wrote: > > > On Wed, Mar 19, 2014 at 03:12:52PM -0400, Steven Rostedt wrote: > > > My question to Tejun is, if we create another workqueue, to add the > > > rdata->work to, would that

Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-19 Thread Peter Zijlstra
On Wed, Mar 19, 2014 at 03:12:52PM -0400, Steven Rostedt wrote: > My question to Tejun is, if we create another workqueue, to add the > rdata->work to, would that prevent the above problem? Or what other > fixes can we do? The way I understand workqueues is that we cannot guarantee concurrency lik

[RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-19 Thread Steven Rostedt
We just had a customer report a deadlock in their 3.8-rt kernel. Looking into this, it is very possible to have the same deadlock in mainline in Linus's tree as it stands today. It is much easier to deadlock in the -rt kernel because reader locks are serialized, where a down_read() can block anoth