Re: [PATCH 10/33] iomap: add an iomap-based bmap implementation

2018-05-11 Thread Darrick J. Wong
On Fri, May 11, 2018 at 08:25:27AM +0200, Christoph Hellwig wrote: > On Thu, May 10, 2018 at 08:08:38AM -0700, Darrick J. Wong wrote: > > > > > + sector_t *bno = data; > > > > > + > > > > > + if (iomap->type == IOMAP_MAPPED) > > > > > + *bno = (iomap->addr + pos -

Re: [PATCH V5 0/9] nvme: pci: fix & improve timeout handling

2018-05-11 Thread Ming Lei
Hi Keith, On Fri, May 11, 2018 at 02:50:28PM -0600, Keith Busch wrote: > On Fri, May 11, 2018 at 08:29:24PM +0800, Ming Lei wrote: > > Hi, > > > > The 1st patch introduces blk_quiesce_timeout() and blk_unquiesce_timeout() > > for NVMe, meantime fixes blk_sync_queue(). > > > > The 2nd patch

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-11 Thread Logan Gunthorpe
On 5/11/2018 4:24 PM, Stephen Bates wrote: All  Alex (or anyone else) can you point to where IOVA addresses are generated? A case of RTFM perhaps (though a pointer to the code would still be appreciated). https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt Some exceptions to IOVA

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-11 Thread Stephen Bates
All > Alex (or anyone else) can you point to where IOVA addresses are generated? A case of RTFM perhaps (though a pointer to the code would still be appreciated). https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt Some exceptions to IOVA --- Interrupt ranges are not

Re: [PATCH v8] blk-mq: Rework blk-mq timeout handling again

2018-05-11 Thread Bart Van Assche
On Fri, 2018-05-11 at 15:21 -0600, Jens Axboe wrote: > On 5/11/18 3:08 PM, Bart Van Assche wrote: > blk_mq_rq_update_aborted_gstate(rq, gstate); > > + union blk_deadline_and_state das = READ_ONCE(rq->das); > > + unsigned long now = jiffies; > > + int32_t diff_jiffies = das.deadline - now; >

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-11 Thread Stephen Bates
>I find this hard to believe. There's always the possibility that some >part of the system doesn't support ACS so if the PCI bus addresses and >IOVA overlap there's a good chance that P2P and ATS won't work at all on >some hardware. I tend to agree but this comes down to how

Re: [PATCH v8] blk-mq: Rework blk-mq timeout handling again

2018-05-11 Thread Jens Axboe
On 5/11/18 3:08 PM, Bart Van Assche wrote: blk_mq_rq_update_aborted_gstate(rq, gstate); > + union blk_deadline_and_state das = READ_ONCE(rq->das); > + unsigned long now = jiffies; > + int32_t diff_jiffies = das.deadline - now; > + int32_t diff_next = das.deadline - data->next; > +

Re: [PATCH 00/10] Misc block layer patches for bcachefs

2018-05-11 Thread Jens Axboe
On 5/8/18 7:33 PM, Kent Overstreet wrote: > - Add separately allowed mempools, biosets: bcachefs uses both all over the >place > > - Bit of utility code - bio_copy_data_iter(), zero_fill_bio_iter() > > - bio_list_copy_data(), the bi_next check - defensiveness because of a bug I >had

Re: [PATCH 01/10] mempool: Add mempool_init()/mempool_exit()

2018-05-11 Thread Jens Axboe
On 5/8/18 7:33 PM, Kent Overstreet wrote: > Allows mempools to be embedded in other structs, getting rid of a > pointer indirection from allocation fastpaths. > > mempool_exit() is safe to call on an uninitialized but zeroed mempool. Looks fine to me. I'd like to carry it through the block

[PATCH v8] blk-mq: Rework blk-mq timeout handling again

2018-05-11 Thread Bart Van Assche
Recently the blk-mq timeout handling code was reworked. See also Tejun Heo, "[PATCHSET v4] blk-mq: reimplement timeout handling", 08 Jan 2018 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg16985.html). This patch reworks the blk-mq timeout handling code again. The timeout handling

Re: make a few block drivers highmem safe

2018-05-11 Thread Jens Axboe
On 5/9/18 7:59 AM, Christoph Hellwig wrote: > Hi all, > > this series converts a few random block drivers to be highmem safe, > in preparation of eventually getting rid of the block layer bounce > buffering support. Applied, thanks. -- Jens Axboe

Re: [PATCH V5 0/9] nvme: pci: fix & improve timeout handling

2018-05-11 Thread Keith Busch
On Fri, May 11, 2018 at 08:29:24PM +0800, Ming Lei wrote: > Hi, > > The 1st patch introduces blk_quiesce_timeout() and blk_unquiesce_timeout() > for NVMe, meantime fixes blk_sync_queue(). > > The 2nd patch covers timeout for admin commands for recovering controller > for avoiding possible

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-11 Thread Logan Gunthorpe
On 5/11/2018 2:52 AM, Christian König wrote: This only works when the IOVA and the PCI bus addresses never overlap. I'm not sure how the IOVA allocation works but I don't think we guarantee that on Linux. I find this hard to believe. There's always the possibility that some part of the

Re: [PATCH] blk-mq: Rework blk-mq timeout handling again

2018-05-11 Thread Bart Van Assche
On Fri, 2018-05-11 at 14:35 +0200, Christoph Hellwig wrote: > > It should be due to union blk_deadline_and_state. > > +union blk_deadline_and_state { > > + struct { > > + uint32_t generation:30; > > + uint32_t state:2; > > + uint32_t deadline; > > + }; > > +

Re: [PATCH] blk-mq: Rework blk-mq timeout handling again

2018-05-11 Thread Bart Van Assche
On Fri, 2018-05-11 at 20:06 +0800, jianchao.wang wrote: > Hi bart > > I add debug log in blk_mq_add_timer as following > > void blk_mq_add_timer(struct request *req, enum mq_rq_state old, > enum mq_rq_state new) > { >struct request_queue *q = req->q; > >if

Re: [PATCH] blk-mq: Rework blk-mq timeout handling again

2018-05-11 Thread Christoph Hellwig
> It should be due to union blk_deadline_and_state. > +union blk_deadline_and_state { > + struct { > + uint32_t generation:30; > + uint32_t state:2; > + uint32_t deadline; > + }; > + unsigned long legacy_deadline; > + uint64_t das; > +}; Yikes.

[PATCH V5 8/9] nvme: core: introduce nvme_force_change_ctrl_state()

2018-05-11 Thread Ming Lei
When controller is being reset, timeout still may be triggered, for handling this situation, the contoller state has to be changed to NVME_CTRL_RESETTING first. So introduce nvme_force_change_ctrl_state() for this purpose. Cc: James Smart Cc: Jianchao Wang

[PATCH V5 7/9] nvme: pci: don't unfreeze queue until controller state updating succeeds

2018-05-11 Thread Ming Lei
If it fails to update controller state into LIVE or ADMIN_ONLY, the controller will be removed, so not necessary to unfreeze queue any more. This way will make the following patch easier to not leak the freeze couner. Cc: James Smart Cc: Jianchao Wang

[PATCH V5 9/9] nvme: pci: support nested EH

2018-05-11 Thread Ming Lei
When one req is timed out, now nvme_timeout() handles it by the following way: nvme_dev_disable(dev, false); nvme_reset_ctrl(>ctrl); return BLK_EH_HANDLED. There are several issues about the above approach: 1) IO may fail during resetting Admin IO timeout may be

[PATCH V5 6/9] nvme: pci: move error handling out of nvme_reset_dev()

2018-05-11 Thread Ming Lei
Once nested EH is introduced, we may not need to handle error in the inner EH, so move error handling out of nvme_reset_dev(). Meantime return the reset result to caller. Cc: James Smart Cc: Jianchao Wang Cc: Christoph Hellwig

[PATCH V5 2/9] nvme: pci: cover timeout for admin commands running in EH

2018-05-11 Thread Ming Lei
When admin commands are used in EH for recovering controller, we have to cover their timeout and can't depend on block's timeout since deadlock may be caused when these commands are timed-out by block layer again. Cc: James Smart Cc: Jianchao Wang

[PATCH V5 1/9] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout()

2018-05-11 Thread Ming Lei
Turns out the current way can't drain timout completely because mod_timer() can be triggered in the work func, which can be just run inside the synced timeout work: del_timer_sync(>timeout); cancel_work_sync(>timeout_work); This patch introduces one flag of 'timeout_off' for

[PATCH V5 4/9] nvme: pci: freeze queue in nvme_dev_disable() in case of error recovery

2018-05-11 Thread Ming Lei
When nvme_dev_disable() is used for error recovery, we should always freeze queues before shutdown controller: - reset handler supposes queues are frozen, and will wait_freeze & unfreeze them explicitly, if queues aren't frozen during nvme_dev_disable(), reset handler may wait forever even though

[PATCH V5 3/9] nvme: pci: only wait freezing if queue is frozen

2018-05-11 Thread Ming Lei
In nvme_dev_disable() called during shutting down controler, nvme_wait_freeze_timeout() may be done on the controller not frozen yet, so add the check for avoiding the case. Cc: James Smart Cc: Jianchao Wang Cc: Christoph Hellwig

[PATCH V5 0/9] nvme: pci: fix & improve timeout handling

2018-05-11 Thread Ming Lei
Hi, The 1st patch introduces blk_quiesce_timeout() and blk_unquiesce_timeout() for NVMe, meantime fixes blk_sync_queue(). The 2nd patch covers timeout for admin commands for recovering controller for avoiding possible deadlock. The 3rd and 4th patches avoid to wait_freeze on queues which aren't

[PATCH V5 5/9] nvme: pci: prepare for supporting error recovery from resetting context

2018-05-11 Thread Ming Lei
Either the admin or normal IO in reset context may be timed out because controller error happens. When this timeout happens, we may have to start controller recovery again. This patch introduces 'reset_lock' and holds this lock when running reset, so that we may support nested reset in the

Re: [PATCH] blk-mq: Rework blk-mq timeout handling again

2018-05-11 Thread jianchao.wang
Hi bart I add debug log in blk_mq_add_timer as following void blk_mq_add_timer(struct request *req, enum mq_rq_state old, enum mq_rq_state new) { struct request_queue *q = req->q; if (!req->timeout) req->timeout = q->rq_timeout; if

Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-05-11 Thread Christian König
Am 10.05.2018 um 19:15 schrieb Logan Gunthorpe: On 10/05/18 11:11 AM, Stephen Bates wrote: Not to me. In the p2pdma code we specifically program DMA engines with the PCI bus address. Ah yes of course. Brain fart on my part. We are not programming the P2PDMA initiator with an IOVA but with

Re: stop using buffer heads in xfs and iomap

2018-05-11 Thread Darrick J. Wong
On Fri, May 11, 2018 at 08:22:08AM +0200, Christoph Hellwig wrote: > On Thu, May 10, 2018 at 08:13:03AM -0700, Darrick J. Wong wrote: > > I ran xfstests on this for fun last night but hung in g/095: > > > > FSTYP -- xfs (debug) > > PLATFORM -- Linux/x86_64 submarine-djwong-mtr01

Re: [PATCH 01/33] block: add a lower-level bio_add_page interface

2018-05-11 Thread Christoph Hellwig
On Thu, May 10, 2018 at 03:49:53PM -0600, Andreas Dilger wrote: > Would it make sense to change the bio_add_page() and bio_add_pc_page() > to use the more common convention instead of continuing the spread of > this non-standard calling convention? This is doubly problematic since > "off" and

Re: [PATCH 10/33] iomap: add an iomap-based bmap implementation

2018-05-11 Thread Christoph Hellwig
On Thu, May 10, 2018 at 08:08:38AM -0700, Darrick J. Wong wrote: > > > > + sector_t *bno = data; > > > > + > > > > + if (iomap->type == IOMAP_MAPPED) > > > > + *bno = (iomap->addr + pos - iomap->offset) >> > > > > inode->i_blkbits; > > > > > > Does this need to be

Re: [PATCH 01/33] block: add a lower-level bio_add_page interface

2018-05-11 Thread Christoph Hellwig
On Thu, May 10, 2018 at 04:52:00PM +0800, Ming Lei wrote: > On Wed, May 9, 2018 at 3:47 PM, Christoph Hellwig wrote: > > For the upcoming removal of buffer heads in XFS we need to keep track of > > the number of outstanding writeback requests per page. For this we need > > to know

Re: stop using buffer heads in xfs and iomap

2018-05-11 Thread Christoph Hellwig
On Thu, May 10, 2018 at 08:13:03AM -0700, Darrick J. Wong wrote: > I ran xfstests on this for fun last night but hung in g/095: > > FSTYP -- xfs (debug) > PLATFORM -- Linux/x86_64 submarine-djwong-mtr01 4.17.0-rc4-djw > MKFS_OPTIONS -- -f -m reflink=1,rmapbt=1, -i sparse=1, -b