Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread Keith Busch
On Thu, Jul 19, 2018 at 06:29:24PM +0200, h...@lst.de wrote: > How do we get a fix into 4.18 at this part of the cycle? I think that > is the most important prirority right now. Even if you were okay at this point to incorporate the concepts from Bart's patch, it still looks like trouble for

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread h...@lst.de
On Thu, Jul 19, 2018 at 10:22:40AM -0600, Keith Busch wrote: > On Thu, Jul 19, 2018 at 04:04:46PM +, Bart Van Assche wrote: > > On Thu, 2018-07-19 at 09:56 -0600, Keith Busch wrote: > > > Even some scsi drivers are susceptible to losing their requests with the > > > reverted behavior: take

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread Keith Busch
On Thu, Jul 19, 2018 at 04:04:46PM +, Bart Van Assche wrote: > On Thu, 2018-07-19 at 09:56 -0600, Keith Busch wrote: > > Even some scsi drivers are susceptible to losing their requests with the > > reverted behavior: take virtio-scsi for example, which returns RESET_TIMER > > from it's timeout

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread Bart Van Assche
On Thu, 2018-07-19 at 09:56 -0600, Keith Busch wrote: > Even some scsi drivers are susceptible to losing their requests with the > reverted behavior: take virtio-scsi for example, which returns RESET_TIMER > from it's timeout handler. With the behavior everyone seems to want, > a natural

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread Keith Busch
On Thu, Jul 19, 2018 at 08:59:31AM -0600, Keith Busch wrote: > > And we should never even hit the timeout handler for those as it > > is rather pointless (although it looks we currently do..). > > I don't see why we'd expect to never hit timeout for at least some of > these. It's not a stretch to

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread Keith Busch
On Thu, Jul 19, 2018 at 03:19:04PM +0200, h...@lst.de wrote: > On Wed, Jul 18, 2018 at 03:17:11PM -0600, Keith Busch wrote: > > And there may be other drivers that don't want their completions > > ignored, so breaking them again is also a step in the wrong direction. > > > > There are not that

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread h...@lst.de
On Wed, Jul 18, 2018 at 03:17:11PM -0600, Keith Busch wrote: > And there may be other drivers that don't want their completions > ignored, so breaking them again is also a step in the wrong direction. > > There are not that many blk-mq drivers, so we can go through them all. I think the point is

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread Keith Busch
On Wed, Jul 18, 2018 at 10:39:36PM +0200, h...@lst.de wrote: > On Wed, Jul 18, 2018 at 09:56:50PM +0200, h...@lst.de wrote: > > On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote: > > > Of the two you mentioned, yours is preferable IMO. While I appreciate > > > Jianchao's detailed

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread Keith Busch
On Wed, Jul 18, 2018 at 09:30:11PM +, Bart Van Assche wrote: > On Wed, 2018-07-18 at 15:17 -0600, Keith Busch wrote: > > There are not that many blk-mq drivers, so we can go through them all. > > I'm not sure that's the right approach. I think it is the responsibility of > the block layer to

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread Bart Van Assche
On Wed, 2018-07-18 at 15:17 -0600, Keith Busch wrote: > There are not that many blk-mq drivers, so we can go through them all. I'm not sure that's the right approach. I think it is the responsibility of the block layer to handle races between completion handling and timeout handling and that this

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread Keith Busch
On Wed, Jul 18, 2018 at 08:58:48PM +, Bart Van Assche wrote: > On Wed, 2018-07-18 at 14:53 -0600, Keith Busch wrote: > > If scsi needs this behavior, why not just put that behavior in scsi? It > > can set the state to complete and then everything can play out as > > before. > > [ ... ] > >

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread Bart Van Assche
On Wed, 2018-07-18 at 22:39 +0200, h...@lst.de wrote: > On Wed, Jul 18, 2018 at 09:56:50PM +0200, h...@lst.de wrote: > > The important bit is that we need to fix this issue quickly. We are > > past -rc5 so I'm rather concerned about anything too complicated. > > > > I'm not even sure SCSI has a

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread Bart Van Assche
On Wed, 2018-07-18 at 14:53 -0600, Keith Busch wrote: > If scsi needs this behavior, why not just put that behavior in scsi? It > can set the state to complete and then everything can play out as > before. > [ ... ] There may be other drivers that need the same protection the SCSI core needs so I

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread Keith Busch
On Wed, Jul 18, 2018 at 09:56:50PM +0200, h...@lst.de wrote: > On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote: > > Of the two you mentioned, yours is preferable IMO. While I appreciate > > Jianchao's detailed analysis, it's hard to take a proposal seriously > > that so colourfully

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread h...@lst.de
On Wed, Jul 18, 2018 at 09:56:50PM +0200, h...@lst.de wrote: > On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote: > > Of the two you mentioned, yours is preferable IMO. While I appreciate > > Jianchao's detailed analysis, it's hard to take a proposal seriously > > that so colourfully

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread h...@lst.de
On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote: > Of the two you mentioned, yours is preferable IMO. While I appreciate > Jianchao's detailed analysis, it's hard to take a proposal seriously > that so colourfully calls everyone else "dangerous" while advocating > for silently losing

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-13 Thread Keith Busch
On Fri, Jul 13, 2018 at 11:03:18PM +, Bart Van Assche wrote: > How do you want to go forward from here? Do you prefer the approach of the > patch I had posted (https://www.spinics.net/lists/linux-block/msg26489.html), > Jianchao's approach (https://marc.info/?l=linux-block=152950093831738) or

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-13 Thread Bart Van Assche
On Fri, 2018-07-13 at 12:47 -0600, Keith Busch wrote: > On Fri, Jul 13, 2018 at 03:52:38PM +, Bart Van Assche wrote: > > I think that behavior change even can trigger a kernel crash. > > I think we are past acknowledging issues exist with timeouts. Hello Keith, How do you want to go forward

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-13 Thread Keith Busch
On Fri, Jul 13, 2018 at 03:52:38PM +, Bart Van Assche wrote: > No. What I'm saying is that a behavior change has been introduced in the block > layer that was not documented in the patch description. Did you read the changelog? > I think that behavior change even can trigger a kernel crash.

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-13 Thread Bart Van Assche
On Fri, 2018-07-13 at 09:43 -0600, Keith Busch wrote: > On Thu, Jul 12, 2018 at 10:24:42PM +, Bart Van Assche wrote: > > Before commit 12f5b9314545 ("blk-mq: Remove generation seqeunce"), if a > > request completion was reported after request timeout processing had > > started, completion

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-13 Thread Keith Busch
On Thu, Jul 12, 2018 at 10:24:42PM +, Bart Van Assche wrote: > Before commit 12f5b9314545 ("blk-mq: Remove generation seqeunce"), if a > request completion was reported after request timeout processing had > started, completion handling was skipped. The following code in >

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-12 Thread jianchao.wang
On 07/13/2018 06:24 AM, Bart Van Assche wrote: > Hello Keith, > > Before commit 12f5b9314545 ("blk-mq: Remove generation seqeunce"), if a > request completion was reported after request timeout processing had > started, completion handling was skipped. The following code in >

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-12 Thread jianchao.wang
On 07/13/2018 06:24 AM, Bart Van Assche wrote: > On Thu, 2018-07-12 at 13:24 -0600, Keith Busch wrote: >> On Thu, Jul 12, 2018 at 06:16:12PM +, Bart Van Assche wrote: >>> What prevents that a request finishes and gets reused after the >>> blk_mq_req_expired() call has finished and before

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-12 Thread Bart Van Assche
On Thu, 2018-07-12 at 13:24 -0600, Keith Busch wrote: > On Thu, Jul 12, 2018 at 06:16:12PM +, Bart Van Assche wrote: > > What prevents that a request finishes and gets reused after the > > blk_mq_req_expired() call has finished and before kref_get_unless_zero() is > > called? Is this perhaps a

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-12 Thread Keith Busch
On Thu, Jul 12, 2018 at 06:16:12PM +, Bart Van Assche wrote: > On Mon, 2018-05-21 at 17:11 -0600, Keith Busch wrote: > > /* > > -* We marked @rq->aborted_gstate and waited for RCU. If there were > > -* completions that we lost to, they would have finished and > > -* updated

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-12 Thread Bart Van Assche
On Mon, 2018-05-21 at 17:11 -0600, Keith Busch wrote: > /* > - * We marked @rq->aborted_gstate and waited for RCU. If there were > - * completions that we lost to, they would have finished and > - * updated @rq->gstate by now; otherwise, the completion path is > - * now

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-23 Thread Ming Lei
On Wed, May 23, 2018 at 08:35:40AM -0600, Keith Busch wrote: > On Wed, May 23, 2018 at 08:34:48AM +0800, Ming Lei wrote: > > Let's consider the normal NVMe timeout code path: > > > > 1) one request is timed out; > > > > 2) controller is shutdown, this timed-out request is requeued from > >

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-23 Thread Keith Busch
On Wed, May 23, 2018 at 08:34:48AM +0800, Ming Lei wrote: > Let's consider the normal NVMe timeout code path: > > 1) one request is timed out; > > 2) controller is shutdown, this timed-out request is requeued from > nvme_cancel_request(), but can't dispatch because queues are quiesced > > 3)

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Hannes Reinecke
On 05/22/2018 06:17 PM, Christoph Hellwig wrote: Hi Keith, I like this series a lot. One comment that is probably close to the big discussion in the thread: switch (ret) { case BLK_EH_HANDLED: /* +* If the request is still in flight, the driver

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Ming Lei
On Tue, May 22, 2018 at 06:17:04PM +0200, Christoph Hellwig wrote: > Hi Keith, > > I like this series a lot. One comment that is probably close > to the big discussion in the thread: > > > switch (ret) { > > case BLK_EH_HANDLED: > > /* > > +* If the request is

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Bart Van Assche
On Tue, 2018-05-22 at 10:34 -0600, Keith Busch wrote: > On Tue, May 22, 2018 at 04:29:07PM +, Bart Van Assche wrote: > > Please have another look at the current code that handles request timeouts > > and completions. The current implementation guarantees that no double > > completions can

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 04:29:07PM +, Bart Van Assche wrote: > Please have another look at the current code that handles request timeouts > and completions. The current implementation guarantees that no double > completions can occur but your patch removes essential aspects of that >

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Bart Van Assche
On Tue, 2018-05-22 at 08:15 -0600, Keith Busch wrote: > This shouldn't be introducing any new concorrent calls to > __blk_mq_complete_request if they don't already exist. The timeout work > calls it only if the driver's timeout returns BLK_EH_HANDLED, which means > the driver is claiming the

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Christoph Hellwig
Hi Keith, I like this series a lot. One comment that is probably close to the big discussion in the thread: > switch (ret) { > case BLK_EH_HANDLED: > /* > + * If the request is still in flight, the driver is requesting > + * blk-mq complete

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Ming Lei
On Tue, May 22, 2018 at 11:17 PM, Keith Busch wrote: > On Tue, May 22, 2018 at 11:07:07PM +0800, Ming Lei wrote: >> > At approximately the same time, you're saying the driver that returned >> > EH_HANDLED may then call blk_mq_complete_request() in reference to the >>

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 11:07:07PM +0800, Ming Lei wrote: > > At approximately the same time, you're saying the driver that returned > > EH_HANDLED may then call blk_mq_complete_request() in reference to the > > *old* request that it returned EH_HANDLED for, incorrectly completing > > Because

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Ming Lei
On Tue, May 22, 2018 at 11:01 PM, Keith Busch wrote: > On Tue, May 22, 2018 at 10:57:40PM +0800, Ming Lei wrote: >> OK, that still depends on driver's behaviour, even though it is true >> for NVMe, you still have to audit other drivers about this sync >> because it

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 10:57:40PM +0800, Ming Lei wrote: > OK, that still depends on driver's behaviour, even though it is true > for NVMe, you still have to audit other drivers about this sync > because it is required by your patch. Okay, forget about nvme for a moment here. Please run this

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Ming Lei
On Tue, May 22, 2018 at 10:46 PM, Keith Busch wrote: > On Tue, May 22, 2018 at 10:37:32PM +0800, Ming Lei wrote: >> On Tue, May 22, 2018 at 10:20 PM, Keith Busch >> wrote: >> > In the event the driver requests a normal completion, the

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 10:37:32PM +0800, Ming Lei wrote: > On Tue, May 22, 2018 at 10:20 PM, Keith Busch > wrote: > > In the event the driver requests a normal completion, the timeout work > > releasing the last reference doesn't do a second completion: it only > >

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Ming Lei
On Tue, May 22, 2018 at 10:20 PM, Keith Busch wrote: > On Tue, May 22, 2018 at 10:49:11AM +0800, Ming Lei wrote: >> On Mon, May 21, 2018 at 05:11:31PM -0600, Keith Busch wrote: >> > -static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx, >> > +static void

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Jens Axboe
On 5/22/18 2:51 AM, Ming Lei wrote: > On Mon, May 21, 2018 at 09:51:19PM -0600, Jens Axboe wrote: >> On May 21, 2018, at 9:47 PM, Ming Lei wrote: >>> On Mon, May 21, 2018 at 09:16:33PM -0600, Jens Axboe wrote: > On 5/21/18 8:49 PM, Ming Lei wrote: >> On Mon, May

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 10:49:11AM +0800, Ming Lei wrote: > On Mon, May 21, 2018 at 05:11:31PM -0600, Keith Busch wrote: > > -static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx, > > +static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx, > > struct request *rq, void

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Mon, May 21, 2018 at 11:29:06PM +, Bart Van Assche wrote: > On Mon, 2018-05-21 at 17:11 -0600, Keith Busch wrote: > > switch (ret) { > > case BLK_EH_HANDLED: > > - __blk_mq_complete_request(req); > > - break; > > - case BLK_EH_RESET_TIMER: > > /* >

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Ming Lei
On Mon, May 21, 2018 at 09:51:19PM -0600, Jens Axboe wrote: > On May 21, 2018, at 9:47 PM, Ming Lei wrote: > > > >> On Mon, May 21, 2018 at 09:16:33PM -0600, Jens Axboe wrote: > >>> On 5/21/18 8:49 PM, Ming Lei wrote: > On Mon, May 21, 2018 at 05:11:31PM -0600, Keith

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-21 Thread Jens Axboe
On May 21, 2018, at 9:47 PM, Ming Lei wrote: > >> On Mon, May 21, 2018 at 09:16:33PM -0600, Jens Axboe wrote: >>> On 5/21/18 8:49 PM, Ming Lei wrote: On Mon, May 21, 2018 at 05:11:31PM -0600, Keith Busch wrote: This patch simplifies the timeout handling by relying

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-21 Thread Ming Lei
On Mon, May 21, 2018 at 09:16:33PM -0600, Jens Axboe wrote: > On 5/21/18 8:49 PM, Ming Lei wrote: > > On Mon, May 21, 2018 at 05:11:31PM -0600, Keith Busch wrote: > >> This patch simplifies the timeout handling by relying on the request > >> reference counting to ensure the iterator is operating

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-21 Thread Jens Axboe
On 5/21/18 8:49 PM, Ming Lei wrote: > On Mon, May 21, 2018 at 05:11:31PM -0600, Keith Busch wrote: >> This patch simplifies the timeout handling by relying on the request >> reference counting to ensure the iterator is operating on an inflight > > The reference counting isn't free, what is the

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-21 Thread Ming Lei
On Mon, May 21, 2018 at 05:11:31PM -0600, Keith Busch wrote: > This patch simplifies the timeout handling by relying on the request > reference counting to ensure the iterator is operating on an inflight The reference counting isn't free, what is the real benefit in this way? > and truly timed

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-21 Thread Bart Van Assche
On Mon, 2018-05-21 at 17:11 -0600, Keith Busch wrote: > @@ -650,27 +600,10 @@ static u64 blk_mq_rq_aborted_gstate(struct request *rq) > void blk_mq_complete_request(struct request *rq) > { > struct request_queue *q = rq->q; > - struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q,

[RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-21 Thread Keith Busch
This patch simplifies the timeout handling by relying on the request reference counting to ensure the iterator is operating on an inflight and truly timed out request. Since the reference counting prevents the tag from being reallocated, the block layer no longer needs to prevent drivers from