Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-10 Thread Alex G.
I've observed a very nasty NULL pointer dereference with NVME drives on hot-unplug [1]. While I haven't gone in the details of this change, it does fix the NULL dereference I've been seeing. Alex [1] http://lists.infradead.org/pipermail/linux-nvme/2018-April/016623.html

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread t...@kernel.org
Hello, Bart. On Mon, Apr 09, 2018 at 09:30:27PM +, Bart Van Assche wrote: > On Mon, 2018-04-09 at 11:56 -0700, t...@kernel.org wrote: > > On Mon, Apr 09, 2018 at 05:03:05PM +, Bart Van Assche wrote: > > > exist today in the blk-mq timeout handling code cannot be fixed completely > > >

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 11:56 -0700, t...@kernel.org wrote: > On Mon, Apr 09, 2018 at 05:03:05PM +, Bart Van Assche wrote: > > exist today in the blk-mq timeout handling code cannot be fixed completely > > using RCU only. > > I really don't think that is that complicated. Let's first confirm >

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 09:47 -0700, Tejun Heo wrote: > On Sun, Apr 08, 2018 at 10:20:38PM -0700, Bart Van Assche wrote: > > If a completion occurs after blk_mq_rq_timed_out() has reset > > rq->aborted_gstate and the request is again in flight when the timeout > > expires then a request will be

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Tejun Heo
Hello, Sagi. On Mon, Apr 09, 2018 at 11:37:15AM +0300, Sagi Grimberg wrote: > > >If a completion occurs after blk_mq_rq_timed_out() has reset > >rq->aborted_gstate and the request is again in flight when the timeout > >expires then a request will be completed twice: a first time by the >

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Tejun Heo
Hey, Bart. On Sun, Apr 08, 2018 at 10:20:38PM -0700, Bart Van Assche wrote: > If a completion occurs after blk_mq_rq_timed_out() has reset > rq->aborted_gstate and the request is again in flight when the timeout > expires then a request will be completed twice: a first time by the > timeout

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Israel Rukshin
On 4/9/2018 11:37 AM, Sagi Grimberg wrote: If a completion occurs after blk_mq_rq_timed_out() has reset rq->aborted_gstate and the request is again in flight when the timeout expires then a request will be completed twice: a first time by the timeout handler and a second time when the regular

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Jens Axboe
On 4/9/18 8:58 AM, Bart Van Assche wrote: > On Mon, 2018-04-09 at 11:37 +0200, Christoph Hellwig wrote: >> This looks sensible, but I'm worried about taking a whole spinlock >> for every request completion, including irq disabling. However it seems >> like your new updated pattern would fit use

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 11:37 +0200, Christoph Hellwig wrote: > This looks sensible, but I'm worried about taking a whole spinlock > for every request completion, including irq disabling. However it seems > like your new updated pattern would fit use of cmpxchg() very nicely. Hello Christoph,

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Bart Van Assche
On Mon, 2018-04-09 at 11:37 +0300, Sagi Grimberg wrote: > > If a completion occurs after blk_mq_rq_timed_out() has reset > > rq->aborted_gstate and the request is again in flight when the timeout > > expires then a request will be completed twice: a first time by the > > timeout handler and a

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Christoph Hellwig
This looks sensible, but I'm worried about taking a whole spinlock for every request completion, including irq disabling. However it seems like your new updated pattern would fit use of cmpxchg() very nicely.

Re: [PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-09 Thread Sagi Grimberg
If a completion occurs after blk_mq_rq_timed_out() has reset rq->aborted_gstate and the request is again in flight when the timeout expires then a request will be completed twice: a first time by the timeout handler and a second time when the regular completion occurs. Additionally, the blk-mq

[PATCH] blk-mq: Fix recently introduced races in the timeout handling code

2018-04-08 Thread Bart Van Assche
If a completion occurs after blk_mq_rq_timed_out() has reset rq->aborted_gstate and the request is again in flight when the timeout expires then a request will be completed twice: a first time by the timeout handler and a second time when the regular completion occurs. Additionally, the blk-mq