Re: [PATCH 05/14] block: add a max_user_discard_sectors queue limit

2024-01-31 Thread Keith Busch
max_discard_bytes >> SECTOR_SHIFT; > + q->limits.max_discard_sectors = > + min_not_zero(q->limits.max_hw_discard_sectors, > + q->limits.max_user_discard_sectors); s/min_not_zero/min Otherwise the whole series looks pretty good! And with that: Reviewed-by: Keith Busch

Re: [PATCH 1/2] blk-mq: introduce blk_mq_tagset_wait_request_completed()

2024-01-24 Thread Keith Busch
On Wed, Jan 24, 2024 at 07:22:21PM +0800, yi sun wrote: > In my case, I want all hw queues owned by this device to be clean. > Because in the virtio device, each hw queue corresponds to a virtqueue, > and all virtqueues will be deleted when vdev suspends. > > The blk_mq_tagset_wait_request_complet

Re: [PATCH 05/15] block: add a max_user_discard_sectors queue limit

2024-01-24 Thread Keith Busch
On Mon, Jan 22, 2024 at 07:38:57PM +0100, Christoph Hellwig wrote: > On Mon, Jan 22, 2024 at 11:27:15AM -0700, Keith Busch wrote: > > > + q->limits.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT; > > > + q->limits.max_discard_sectors

Re: [Report] requests are submitted to hardware in reverse order from nvme/virtio-blk queue_rqs()

2024-01-24 Thread Keith Busch
On Wed, Jan 24, 2024 at 07:59:54PM +0800, Ming Lei wrote: > Requests are added to plug list in reverse order, and both virtio-blk > and nvme retrieves request from plug list in order, so finally requests > are submitted to hardware in reverse order via nvme_queue_rqs() or > virtio_queue_rqs, see: >

Re: [PATCH 1/2] blk-mq: introduce blk_mq_tagset_wait_request_completed()

2024-01-23 Thread Keith Busch
On Mon, Jan 22, 2024 at 07:07:21PM +0800, Yi Sun wrote: > In some cases, it is necessary to wait for all requests to become complete > status before performing other operations. Otherwise, these requests will > never > be processed successfully. > > For example, when the virtio device is in hiber

Re: [PATCH 05/15] block: add a max_user_discard_sectors queue limit

2024-01-22 Thread Keith Busch
On Mon, Jan 22, 2024 at 06:36:35PM +0100, Christoph Hellwig wrote: > @@ -174,23 +174,23 @@ static ssize_t queue_discard_max_show(struct > request_queue *q, char *page) > static ssize_t queue_discard_max_store(struct request_queue *q, > const char *page, size_t

Re: [PATCH] virtio_blk: set the default scheduler to none

2023-12-07 Thread Keith Busch
On Fri, Dec 08, 2023 at 10:00:36AM +0800, Ming Lei wrote: > On Thu, Dec 07, 2023 at 12:31:05PM +0800, Li Feng wrote: > > virtio-blk is generally used in cloud computing scenarios, where the > > performance of virtual disks is very important. The mq-deadline scheduler > > has a big performance drop

Re: [PATCH 7/7] block: remove ->rw_page

2023-01-25 Thread Keith Busch
On Wed, Jan 25, 2023 at 02:34:36PM +0100, Christoph Hellwig wrote: > @@ -363,8 +384,10 @@ void __swap_writepage(struct page *page, struct > writeback_control *wbc) >*/ > if (data_race(sis->flags & SWP_FS_OPS)) > swap_writepage_fs(page, wbc); > + else if (sis->flags

Re: [PATCH 2/7] mm: remove the swap_readpage return value

2023-01-25 Thread Keith Busch
On Wed, Jan 25, 2023 at 02:34:31PM +0100, Christoph Hellwig wrote: > -static inline int swap_readpage(struct page *page, bool do_poll, > - struct swap_iocb **plug) > +static inline void swap_readpage(struct page *page, bool do_poll, > + struct swap_iocb **plu

Re: [PATCH 02/13] nvme-multipath: add error handling support for add_disk()

2021-10-15 Thread Keith Busch
mespace head. Looks good, thank you. Reviewed-by: Keith Busch

Re: [PATCH v2 03/10] nvme-multipath: add error handling support for add_disk()

2021-09-27 Thread Keith Busch
On Mon, Sep 27, 2021 at 03:00:32PM -0700, Luis Chamberlain wrote: > + /* > + * test_and_set_bit() is used because it is protecting against two nvme > + * paths simultaneously calling device_add_disk() on the same namespace > + * head. > + */ > if (!test_and_set_bit(NVM

Re: [PATCH 03/10] nvme-multipath: add error handling support for add_disk()

2021-08-27 Thread Keith Busch
On Fri, Aug 27, 2021 at 12:18:02PM -0700, Luis Chamberlain wrote: > @@ -479,13 +479,17 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, > struct nvme_ns_head *head) > static void nvme_mpath_set_live(struct nvme_ns *ns) > { > struct nvme_ns_head *head = ns->head; > + int rc; > >

Re: [PATCH 0/2 v2] Small cleanups

2019-10-21 Thread Keith Busch
On Mon, Oct 21, 2019 at 12:40:02PM +0900, Damien Le Moal wrote: > This is series is a couple of cleanup patches. The first one cleans up > the helper function nvme_block_nr() using SECTOR_SHIFT instead of the > hard coded value 9 and clarifies the helper role by renaming it to > nvme_sect_to_lba().

Re: [PATCH v9 04/12] nvmet: make nvmet_copy_ns_identifier() non-static

2019-10-09 Thread Keith Busch
the request SGL. > > [chaitanya.kulka...@wdc.com: this was factored out of a patch > originally authored by Chaitanya] > Signed-off-by: Chaitanya Kulkarni > Signed-off-by: Logan Gunthorpe > Reviewed-by: Sagi Grimberg > --- Looks fine Reviewed-by: Keith Busch

Re: [PATCH v9 03/12] nvmet: add return value to nvmet_add_async_event()

2019-10-09 Thread Keith Busch
berg > --- Looks fine, but let's remove the version comments out of commit log if we're applying this one. Reviewed-by: Keith Busch > drivers/nvme/target/core.c | 6 -- > drivers/nvme/target/nvmet.h | 2 +- > 2 files changed, 5 insertions(+), 3 deletions(-) > > dif

Re: [PATCH v9 02/12] nvme-core: export existing ctrl and ns interfaces

2019-10-09 Thread Keith Busch
target > passthru feature. > > Signed-off-by: Chaitanya Kulkarni > Signed-off-by: Logan Gunthorpe > Reviewed-by: Sagi Grimberg Looks fine. Reviewed-by: Keith Busch

Re: [PATCH v9 01/12] nvme-core: introduce nvme_ctrl_get_by_path()

2019-10-09 Thread Keith Busch
to obtain a pointer to the struct nvme_ctrl. If the fops of the > file do not match, -EINVAL is returned. > > The purpose of this function is to support NVMe-OF target passthru. > > Signed-off-by: Logan Gunthorpe > Reviewed-by: Max Gurtovoy > Reviewed-by: Sagi Grimberg Looks fine. Reviewed-by: Keith Busch

Re: [PATCH v4 2/3] block: don't remap ref tag for T10 PI type 0

2019-09-08 Thread Keith Busch
On Sun, Sep 08, 2019 at 10:22:50PM -0400, Martin K. Petersen wrote: > > Max, > > > Only type 1 and type 2 have a reference tag by definition. > > DIX Type 0 needs remapping so this assertion is not correct. At least for nvme, type 0 means you have meta data but not for protection information, s

Re: [PATCH v2 1/1] block: centralize PI remapping logic to the block layer

2019-09-04 Thread Keith Busch
On Wed, Sep 04, 2019 at 07:27:32PM +0300, Max Gurtovoy wrote: > + if (blk_integrity_rq(req) && req_op(req) == REQ_OP_READ && > + error == BLK_STS_OK) > + t10_pi_complete(req, > + nr_bytes >> blk_integrity_interval_shift(req->q)); This is not created by y

Re: [PATCH] nvme: Use first ctrl->instance id as subsystem id

2019-08-14 Thread Keith Busch
On Wed, Aug 14, 2019 at 11:29:17AM -0700, Guilherme G. Piccoli wrote: > It is a suggestion from my colleague Dan (CCed here), something like: > for non-multipath nvme, keep nvmeC and nvmeCnN (C=controller ida, > N=namespace); for multipath nvme, use nvmeScCnN (S=subsystem ida). This will inevitabl

Re: [PATCH] nvme: Use first ctrl->instance id as subsystem id

2019-08-14 Thread Keith Busch
On Wed, Aug 14, 2019 at 09:18:22AM -0700, Guilherme G. Piccoli wrote: > On 14/08/2019 13:06, Keith Busch wrote: > > On Wed, Aug 14, 2019 at 07:28:36AM -0700, Guilherme G. Piccoli wrote: > >>[...] > > > > The subsystem lifetime is not tied to a single controlle

Re: [PATCH] nvme: Use first ctrl->instance id as subsystem id

2019-08-14 Thread Keith Busch
On Wed, Aug 14, 2019 at 07:28:36AM -0700, Guilherme G. Piccoli wrote: > Since after the introduction of NVMe multipath, we have a struct to > track subsystems, and more important, we have now the nvme block device > name bound to the subsystem id instead of ctrl->instance as before. > This is not a

Re: [PATCH] genirq/affinity: create affinity mask for single vector

2019-08-08 Thread Keith Busch
case. > > So still create affinity mask for single vector, since > irq_create_affinity_masks() > is capable of handling that. Hi Ming, Looks good to me. Reviewed-by: Keith Busch > --- > kernel/irq/affinity.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(

Re: WARNING: refcount bug in blk_mq_free_request (2)

2019-08-06 Thread Keith Busch
On Mon, Aug 05, 2019 at 10:52:07AM -0700, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit:e21a712a Linux 5.3-rc3 > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=10cf349a60 > kernel config: https://syzkaller.appspot.co

Re: [PATCH 26/34] mm/gup_benchmark.c: convert put_page() to put_user_page*()

2019-08-02 Thread Keith Busch
s is part a tree-wide conversion, as described in commit fc1d8e7cca2d > ("mm: introduce put_user_page*(), placeholder versions"). > > Cc: Dan Carpenter > Cc: Greg Kroah-Hartman > Cc: Keith Busch > Cc: Kirill A. Shutemov > Cc: Michael S. Tsirkin > Cc: YueHaibing &

Re: [PATCH v6 04/16] nvme-core: introduce nvme_get_by_path()

2019-07-25 Thread Keith Busch
On Thu, Jul 25, 2019 at 02:28:28PM -0600, Logan Gunthorpe wrote: > > > On 2019-07-25 1:58 p.m., Keith Busch wrote: > > On Thu, Jul 25, 2019 at 11:54:18AM -0600, Logan Gunthorpe wrote: > >> > >> > >> On 2019-07-25 11:50 a.m., Matthew Wilcox wrote: >

Re: [PATCH v6 04/16] nvme-core: introduce nvme_get_by_path()

2019-07-25 Thread Keith Busch
On Thu, Jul 25, 2019 at 11:54:18AM -0600, Logan Gunthorpe wrote: > > > On 2019-07-25 11:50 a.m., Matthew Wilcox wrote: > > On Thu, Jul 25, 2019 at 11:23:23AM -0600, Logan Gunthorpe wrote: > >> nvme_get_by_path() is analagous to blkdev_get_by_path() except it > >> gets a struct nvme_ctrl from the

Re: fstrim error - AORUS NVMe Gen4 SSD

2019-07-24 Thread Keith Busch
On Tue, Jul 23, 2019 at 12:38:04PM +0800, Ming Lei wrote: > From the IO trace, discard command(nvme_cmd_dsm) is failed: > > kworker/15:1H-462 [015] 91814.342452: nvme_setup_cmd: nvme0: > disk=nvme0n1, qid=7, cmdid=552, nsid=1, flags=0x0, meta=0x0, > cmd=(nvme_cmd_dsm nr=0, attributes=4)

Re: [PATCH 3/5] nvme: don't abort completed request in nvme_cancel_request

2019-07-22 Thread Keith Busch
PI. > > > > So don't abort one request if it is marked as completed, otherwise > > we may abort one normal completed request. > > > > Cc: Max Gurtovoy > > Cc: Sagi Grimberg > > Cc: Keith Busch > > Cc: Christoph Hellwig > > Signed-off-

Re: [PATCH 0/2] Reset timeout for paused hardware

2019-05-23 Thread Keith Busch
On Thu, May 23, 2019 at 04:10:54PM +0200, Christoph Hellwig wrote: > On Thu, May 23, 2019 at 07:23:04AM -0600, Keith Busch wrote: > > > Figure 49: Asynchronous Event Information - Notice > > > > > > 1hFirmware Activation Starting: The controller

Re: [PATCH 2/2] nvme: reset request timeouts during fw activation

2019-05-23 Thread Keith Busch
On Thu, May 23, 2019 at 03:19:54AM -0700, Ming Lei wrote: > On Wed, May 22, 2019 at 11:48:12AM -0600, Keith Busch wrote: > > @@ -3605,6 +3606,11 @@ static void nvme_fw_act_work(struct work_struct > > *work) > > msecs_to_jiffies

Re: [PATCH 0/2] Reset timeout for paused hardware

2019-05-23 Thread Keith Busch
On Thu, May 23, 2019 at 03:13:11AM -0700, Christoph Hellwig wrote: > On Wed, May 22, 2019 at 09:48:10PM -0600, Keith Busch wrote: > > Yeah, that's a good question. A FW update may have been initiated out > > of band or from another host entirely. The driver can't c

Re: [PATCH 0/2] Reset timeout for paused hardware

2019-05-22 Thread Keith Busch
On Wed, May 22, 2019, 9:29 PM Ming Lei wrote: > > On Wed, May 22, 2019 at 11:48:10AM -0600, Keith Busch wrote: > > Hardware may temporarily stop processing commands that have > > been dispatched to it while activating new firmware. Some target > > implementation's

Re: [PATCH 0/2] Reset timeout for paused hardware

2019-05-22 Thread Keith Busch
On Wed, May 22, 2019 at 10:20:45PM +0200, Bart Van Assche wrote: > On 5/22/19 7:48 PM, Keith Busch wrote: > > Hardware may temporarily stop processing commands that have > > been dispatched to it while activating new firmware. Some target > > implementation's paused stat

[PATCH 2/2] nvme: reset request timeouts during fw activation

2019-05-22 Thread Keith Busch
when the hardware exists that state. This action applies to IO and admin queues. Signed-off-by: Keith Busch --- drivers/nvme/host/core.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 1b7c2afd84cb..37a9a66ada22 1

[PATCH 1/2] blk-mq: provide way to reset rq deadline

2019-05-22 Thread Keith Busch
jiffies so that time accrued during the paused state doesn't count against that request. Signed-off-by: Keith Busch --- block/blk-mq.c | 30 ++ include/linux/blk-mq.h | 1 + 2 files changed, 31 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c

[PATCH 0/2] Reset timeout for paused hardware

2019-05-22 Thread Keith Busch
will time out, and handling this may interrupt the firmware activation. This two-part series provides a way for drivers to reset dispatched requests' timeout deadline, then uses this new mechanism from the nvme driver's fw activation work. Keith Busch (2): blk-mq: provide way to rese

[PATCHv2] fio: Add advise THP option to mmap engine

2019-04-18 Thread Keith Busch
uring configure. If the option is set, fio can test THP when used with private anonymous memory (i.e. mmap /dev/zero). Signed-off-by: Keith Busch --- v1 -> v2: Added a 'configure' check for MADV_HUGEPAGE support rather than just consider only OS Linux Fixed cases when MADV_HUGEPAG

Re: [PATCH] fio: Add advise THP option to mmap engine

2019-04-17 Thread Keith Busch
On Wed, Apr 17, 2019 at 04:01:28PM -0600, Jens Axboe wrote: > On 4/17/19 3:28 PM, Keith Busch wrote: > > The transparent hugepage Linux-specific memory advisory has potentially > > significant implications for how the memory management behaves. Add a > > new mmap specifi

[PATCH] fio: Add advise THP option to mmap engine

2019-04-17 Thread Keith Busch
: Keith Busch --- engines/mmap.c | 39 --- optgroup.h | 2 ++ 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/engines/mmap.c b/engines/mmap.c index 308b4665..7bca6c20 100644 --- a/engines/mmap.c +++ b/engines/mmap.c @@ -11,6 +11,7

Re: [PATCH v3] block: fix use-after-free on gendisk

2019-04-15 Thread Keith Busch
to Jan Kara for providing the solution and more clear comments > for the code. > > Fixes: 2da78092dda1 ("block: Fix dev_t minor allocation lifetime") > Cc: Al Viro > Cc: Bart Van Assche > Cc: Keith Busch > Suggested-by: Jan Kara > Signed-off-by: Yufen Yu Looks good to me. Reviewed-by: Keith Busch

Re: [PATCH V4 0/2] blk-mq/nvme: cancel request synchronously

2019-04-10 Thread Keith Busch
+++ > > drivers/nvme/host/core.c | 2 +- > > include/linux/blk-mq.h | 1 + > > 3 files changed, 9 insertions(+), 1 deletion(-) > > > > Cc: Keith Busch > > Cc: Sagi Grimberg > > Cc: Bart Van Assche > > Cc: James Smart > > Cc: Christoph

Re: [PATCH V3 1/2] blk-mq: introduce blk_mq_complete_request_sync()

2019-04-08 Thread Keith Busch
q_ops->complete(rq) remotelly and asynchronously, and > ->complete(rq) may be run after #4. > > This patch introduces blk_mq_complete_request_sync() for fixing the > above race. > > Cc: Keith Busch > Cc: Sagi Grimberg > Cc: Bart Van Assche > Cc: James Smart &

Re: [PATCH] blk-mq: Wait for for hctx requests on CPU unplug

2019-04-08 Thread Keith Busch
get that by registering a single callback for the request_queue and loop only the affected hctx's. But this patch looks good to me too. Reviewed-by: Keith Busch > Signed-off-by: Dongli Zhang > --- > block/blk-mq.c | 4 > 1 file changed, 4 insertions(+) > >

Re: [PATCH] blk-mq: Wait for for hctx requests on CPU unplug

2019-04-08 Thread Keith Busch
On Sun, Apr 07, 2019 at 12:51:23AM -0700, Christoph Hellwig wrote: > On Fri, Apr 05, 2019 at 05:36:32PM -0600, Keith Busch wrote: > > On Fri, Apr 5, 2019 at 5:04 PM Jens Axboe wrote: > > > Looking at current peak testing, I've got around 1.2% in queue enter > > &g

Re: [PATCH] blk-mq: Wait for for hctx requests on CPU unplug

2019-04-08 Thread Keith Busch
On Sat, Apr 06, 2019 at 02:27:10PM -0700, Ming Lei wrote: > On Fri, Apr 05, 2019 at 05:36:32PM -0600, Keith Busch wrote: > > On Fri, Apr 5, 2019 at 5:04 PM Jens Axboe wrote: > > > Looking at current peak testing, I've got around 1.2% in queue enter > > > and exit.

Re: [PATCH] blk-mq: Wait for for hctx requests on CPU unplug

2019-04-05 Thread Keith Busch
On Fri, Apr 5, 2019 at 5:04 PM Jens Axboe wrote: > Looking at current peak testing, I've got around 1.2% in queue enter > and exit. It's definitely not free, hence my question. Probably safe > to assume that we'll double that cycle counter, per IO. Okay, that's not negligible at all. I don't know

Re: [PATCH] blk-mq: Wait for for hctx requests on CPU unplug

2019-04-05 Thread Keith Busch
On Fri, Apr 05, 2019 at 04:23:27PM -0600, Jens Axboe wrote: > On 4/5/19 3:59 PM, Keith Busch wrote: > > Managed interrupts can not migrate affinity when their CPUs are offline. > > If the CPU is allowed to shutdown before they're returned, commands > > dispatched to manag

[PATCH] blk-mq: Wait for for hctx requests on CPU unplug

2019-04-05 Thread Keith Busch
e CPU dead notification for all allocated requests to complete if an hctx's last CPU is being taken offline. Cc: Ming Lei Cc: Thomas Gleixner Signed-off-by: Keith Busch --- block/blk-mq-sched.c | 2 ++ block/blk-mq-sysfs.c | 1 + block/blk-mq-tag.c | 1 + block/blk-mq

Re: [PATCH 5/5] nvme/pci: Remove queue IO flushing hack

2019-03-27 Thread Keith Busch
On Thu, Mar 28, 2019 at 09:42:51AM +0800, jianchao.wang wrote: > On 3/27/19 9:21 PM, Keith Busch wrote: > > +void blk_mq_terminate_queued_requests(struct request_queue *q, int > > hctx_idx) > > +{ > > + if (WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth)))

Re: [PATCH V2 1/2] blk-mq: introduce blk_mq_complete_request_sync()

2019-03-27 Thread Keith Busch
On Wed, Mar 27, 2019 at 04:51:13PM +0800, Ming Lei wrote: > @@ -594,8 +594,11 @@ static void __blk_mq_complete_request(struct request *rq) > /* >* For a polled request, always complete locallly, it's pointless >* to redirect the completion. > + * > + * If driver requ

Re: [PATCH 5/5] nvme/pci: Remove queue IO flushing hack

2019-03-27 Thread Keith Busch
t it through the proper tests (I no longer have a hotplug machine), but this is what I'd written if you can give it a quick look: >From 5afd8e3765eabf859100fda84e646a96683d7751 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Tue, 12 Mar 2019 13:58:12 -0600 Subject: [PATCH] blk-mq: Provide re

Re: [PATCH] block: don't call blk_mq_run_hw_queues() for dead or dying queues

2019-03-26 Thread Keith Busch
On Tue, Mar 26, 2019 at 01:07:12PM +0100, Hannes Reinecke wrote: > When a queue is dying or dead there is no point in calling > blk_mq_run_hw_queues() in blk_mq_unquiesce_queue(); in fact, doing > so might crash the machine as the queue structures are in the > process of being deleted. > > Signed-

Re: [PATCH v2 3/3] block: bio: introduce BIO_ALLOCED flag and check it in bio_free

2019-03-22 Thread Keith Busch
On Fri, Mar 22, 2019 at 03:05:46PM +0100, Hannes Reinecke wrote: > On 3/22/19 3:02 PM, Christoph Hellwig wrote: > > But how do you manage to get the tiny on-stack bios split? What kind > > of setup is this? > > > It's not tiny if you send a 2M file via direct-io, _and_ have a non-zero > MDTS sett

Re: [RFC] optimize nvme single segment I/O

2019-03-22 Thread Keith Busch
ic > testing because I've been a bit busy, but I thought it might be > worthwhile to get it out for feedback. Tests well here with a measurable IOPs improvement at lower queue depths. Series looks good to me, especially patch 5! :p Reviewed-by: Keith Busch

Re: Error while enabling io_poll for NVMe SSD

2019-03-18 Thread Keith Busch
On Mon, Mar 18, 2019 at 06:14:01PM -0400, Nikhil Sambhus wrote: > Hi, > > On a Linux Kernel 5.0.0+ machine (Ubuntu 16.04) I am using the > following command as a root user to enable polling for a NVMe SSD > device. > > # echo 1 > /sys/block/nvme2n1/queue/io_poll > > I get the following error: >

Re: [PATCH 1/2] blk-mq: introduce blk_mq_complete_request_sync()

2019-03-18 Thread Keith Busch
On Sun, Mar 17, 2019 at 09:09:09PM -0700, Bart Van Assche wrote: > On 3/17/19 8:29 PM, Ming Lei wrote: > > In NVMe's error handler, follows the typical steps for tearing down > > hardware: > > > > 1) stop blk_mq hw queues > > 2) stop the real hw queues > > 3) cancel in-flight requests via > >

Re: [PATCH 5/5] nvme/pci: Remove queue IO flushing hack

2019-03-11 Thread Keith Busch
On Mon, Mar 11, 2019 at 07:40:31PM +0100, Christoph Hellwig wrote: > From a quick look the code seems reasonably sensible here, > but any chance we could have this in common code? > > > +static bool nvme_fail_queue_request(struct request *req, void *data, bool > > reserved) > > +{ > > + struct

Re: [PATCH fio] t/io_uring: add depth options

2019-03-11 Thread Keith Busch
On Fri, Mar 08, 2019 at 09:31:02PM -0700, Jens Axboe wrote: > On 3/8/19 2:59 PM, Keith Busch wrote: > > Make depth options command line parameters so a recompile isn't > > required to see how it affects performance. > > Thanks, everything really should be command

Re: [PATCH 4/5] nvme: Fail dead namespace's entered requests

2019-03-11 Thread Keith Busch
On Sun, Mar 10, 2019 at 08:58:21PM -0700, jianchao.wang wrote: > Hi Keith > > How about introducing a per hctx queue_rq callback, then install a > separate .queue_rq callback for the dead hctx. Then we just need to > start and complete the request there. That sounds like it could work, though I t

Re: NVMe: Regression: write zeros corrupts ext4 file system

2019-03-11 Thread Keith Busch
On Mon, Mar 11, 2019 at 10:24:42AM +0800, Ming Lei wrote: > Hi, > > It is observed that ext4 is corrupted easily by running some workloads > on QEMU NVMe, such as: > > 1) mkfs.ext4 /dev/nvme0n1 > > 2) mount /dev/nvme0n1 /mnt > > 3) cd /mnt; git clone > git://git.kernel.org/pub/scm/linux/kernel

Re: [PATCH 4/5] nvme: Fail dead namespace's entered requests

2019-03-08 Thread Keith Busch
On Fri, Mar 08, 2019 at 01:54:06PM -0800, Bart Van Assche wrote: > On Fri, 2019-03-08 at 11:19 -0700, Keith Busch wrote: > > On Fri, Mar 08, 2019 at 10:15:27AM -0800, Bart Van Assche wrote: > > > On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote: > > > > End the

[PATCH fio] t/io_uring: add depth options

2019-03-08 Thread Keith Busch
Make depth options command line parameters so a recompile isn't required to see how it affects performance. Signed-off-by: Keith Busch --- t/io_uring.c | 70 1 file changed, 52 insertions(+), 18 deletions(-) diff --git a/t/io_ur

Re: [PATCH 1/5] blk-mq: Export reading mq request state

2019-03-08 Thread Keith Busch
On Fri, Mar 08, 2019 at 01:25:16PM -0800, Bart Van Assche wrote: > On Fri, 2019-03-08 at 14:14 -0700, Keith Busch wrote: > > On Fri, Mar 08, 2019 at 12:47:10PM -0800, Bart Van Assche wrote: > > > If no such mechanism has been defined in the NVMe spec: have you > > > con

Re: [PATCH 1/5] blk-mq: Export reading mq request state

2019-03-08 Thread Keith Busch
On Fri, Mar 08, 2019 at 12:47:10PM -0800, Bart Van Assche wrote: > Thanks for the clarification. Are you aware of any mechanism in the NVMe spec > that causes all outstanding requests to fail? With RDMA this is easy - all > one has to do is to change the queue pair state into IB_QPS_ERR. See also >

Re: [PATCH 1/5] blk-mq: Export reading mq request state

2019-03-08 Thread Keith Busch
On Fri, Mar 08, 2019 at 12:21:16PM -0800, Sagi Grimberg wrote: > For some reason I didn't get patches 2/5 and 3/5... Unreliable 'git send-email'?! :) They're copied to patchwork too: https://patchwork.kernel.org/patch/10845225/ https://patchwork.kernel.org/patch/10845229/

Re: [PATCH 1/5] blk-mq: Export reading mq request state

2019-03-08 Thread Keith Busch
On Fri, Mar 08, 2019 at 10:42:17AM -0800, Bart Van Assche wrote: > On Fri, 2019-03-08 at 11:15 -0700, Keith Busch wrote: > > On Fri, Mar 08, 2019 at 10:07:23AM -0800, Bart Van Assche wrote: > > > On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote: > > > > Drivers

Re: [PATCH 4/5] nvme: Fail dead namespace's entered requests

2019-03-08 Thread Keith Busch
On Fri, Mar 08, 2019 at 10:15:27AM -0800, Bart Van Assche wrote: > On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote: > > End the entered requests on a quieced queue directly rather than flush > > them through the low level driver's queue_rq(). > > >

Re: [PATCH 1/5] blk-mq: Export reading mq request state

2019-03-08 Thread Keith Busch
On Fri, Mar 08, 2019 at 10:07:23AM -0800, Bart Van Assche wrote: > On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote: > > Drivers may need to know the state of their requets. > > Hi Keith, > > What makes you think that drivers should be able to check the state of the

Re: [PATCH 2/5] blk-mq: Export iterating queue requests

2019-03-08 Thread Keith Busch
On Fri, Mar 08, 2019 at 10:08:47AM -0800, Bart Van Assche wrote: > On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote: > > A driver may need to iterate a particular queue's tagged request rather > > than the whole tagset. > > Since iterating over requests triggers ra

[PATCH 2/5] blk-mq: Export iterating queue requests

2019-03-08 Thread Keith Busch
A driver may need to iterate a particular queue's tagged request rather than the whole tagset. Signed-off-by: Keith Busch --- block/blk-mq-tag.c | 1 + block/blk-mq-tag.h | 2 -- include/linux/blk-mq.h | 2 ++ 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/block/b

[PATCH 1/5] blk-mq: Export reading mq request state

2019-03-08 Thread Keith Busch
Drivers may need to know the state of their requets. Signed-off-by: Keith Busch --- block/blk-mq.h | 9 - include/linux/blkdev.h | 9 + 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/block/blk-mq.h b/block/blk-mq.h index c11353a3749d..99ab7e472e62 100644

[PATCH 4/5] nvme: Fail dead namespace's entered requests

2019-03-08 Thread Keith Busch
End the entered requests on a quieced queue directly rather than flush them through the low level driver's queue_rq(). Signed-off-by: Keith Busch --- drivers/nvme/host/core.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers

[PATCH 3/5] blk-mq: Iterate tagset over all requests

2019-03-08 Thread Keith Busch
o see COMPLETED requests that were being returned before, so this also fixes that for all existing callback handlers. Signed-off-by: Keith Busch --- block/blk-mq-tag.c| 12 ++-- drivers/block/mtip32xx/mtip32xx.c | 6 ++ drivers/block/nbd.c | 2 ++ dr

[PATCH 5/5] nvme/pci: Remove queue IO flushing hack

2019-03-08 Thread Keith Busch
when the queue isn't going to be restarted so the IO path doesn't have to deal with these conditions. Signed-off-by: Keith Busch --- drivers/nvme/host/pci.c | 45 + 1 file changed, 29 insertions(+), 16 deletions(-) diff --git a/drivers/nvme/

Re: Interesting 'list _add double add' with nvme drives

2019-03-06 Thread Keith Busch
On Wed, Mar 06, 2019 at 06:48:28PM +, alex_gagn...@dellteam.com wrote: > Hi, > > I'm seeing a list error when we take away, then add back a bunch of nvme > drives. It's not very easy to repro, and the one surviving log is pasted > below. This looks like a double completion coming from the b

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-22 Thread Keith Busch
On Thu, Feb 21, 2019 at 09:51:12PM -0500, Martin K. Petersen wrote: > > Keith, > > > With respect to fs block sizes, one thing making discards suck is that > > many high capacity SSDs' physical page sizes are larger than the fs > > block size, and a sub-page discard is worse than doing nothing. >

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-20 Thread Keith Busch
On Sun, Feb 17, 2019 at 06:42:59PM -0500, Ric Wheeler wrote: > I think the variability makes life really miserable for layers above it. > > Might be worth constructing some tooling that we can use to validate or > shame vendors over - testing things like a full device discard, discard of > fs bloc

Re: Read-only Mapping of Program Text using Large THP Pages

2019-02-20 Thread Keith Busch
On Wed, Feb 20, 2019 at 06:43:46AM -0800, Matthew Wilcox wrote: > What NVMe doesn't have is a way for the host to tell the controller > "Here's a 2MB sized I/O; bytes 40960 to 45056 are most important to > me; please give me a completion event once those bytes are valid and > then another completio

Re: [PATCH V3 1/5] genirq/affinity: don't mark 'affd' as const

2019-02-19 Thread Keith Busch
On Mon, Feb 18, 2019 at 04:42:27PM -0800, 陈华才 wrote: > I've tested, this patch can fix the nvme problem, but it can't be applied > to 4.19 because of different context. And, I still think my original solution > (genirq/affinity: Assign default affinity to pre/post vectors) is correct. > There may b

Re: [LSF/MM TOPIC] NVMe Performance: Userspace vs Kernel

2019-02-15 Thread Keith Busch
On Fri, Feb 15, 2019 at 09:19:02PM +, Felipe Franciosi wrote: > Over the last year or two, I have done extensive experimentation comparing > applications using libaio to those using SDPK. Try the io_uring interface instead. Its queued up in the linux-block for-next tree. > For hypervisors,

Re: [LSF/MM TOPIC] improving storage testing

2019-02-15 Thread Keith Busch
On Thu, Feb 14, 2019 at 10:02:02PM -0500, Theodore Y. Ts'o wrote: > > My (undocumented) rule of thumb has been that blktests shouldn't assume > > anything newer than whatever ships on Debian oldstable. I can document > > that requirement. > > That's definitely not true for the nvme tests; the nvme

Re: [PATCH V3 1/5] genirq/affinity: don't mark 'affd' as const

2019-02-13 Thread Keith Busch
On Wed, Feb 13, 2019 at 10:41:55PM +0100, Thomas Gleixner wrote: > Btw, while I have your attention. There popped up an issue recently related > to that affinity logic. > > The current implementation fails when: > > /* > * If there aren't any vectors left after applying the pre/p

Re: [PATCH V3 1/5] genirq/affinity: don't mark 'affd' as const

2019-02-13 Thread Keith Busch
On Wed, Feb 13, 2019 at 09:56:36PM +0100, Thomas Gleixner wrote: > On Wed, 13 Feb 2019, Bjorn Helgaas wrote: > > On Wed, Feb 13, 2019 at 06:50:37PM +0800, Ming Lei wrote: > > > We have to ask driver to re-caculate set vectors after the whole IRQ > > > vectors are allocated later, and the result nee

Re: [PATCH V2 3/4] nvme-pci: avoid irq allocation retrying via .calc_sets

2019-02-12 Thread Keith Busch
ew for the whole series if you spin a v3 for the other minor comments. Reviewed-by: Keith Busch > +static void nvme_calc_irq_sets(struct irq_affinity *affd, int nvecs) > +{ > + struct nvme_dev *dev = affd->priv; > + > + nvme_calc_io_queues(dev, nvecs); > + > +

Re: [PATCH 05/18] Add io_uring IO interface

2019-02-07 Thread Keith Busch
On Thu, Feb 07, 2019 at 12:55:39PM -0700, Jens Axboe wrote: > IO submissions use the io_uring_sqe data structure, and completions > are generated in the form of io_uring_sqe data structures. ^^^ Completions use _cqe, right?

Re: Question on handling managed IRQs when hotplugging CPUs

2019-02-05 Thread Keith Busch
On Tue, Feb 05, 2019 at 04:10:47PM +0100, Hannes Reinecke wrote: > On 2/5/19 3:52 PM, Keith Busch wrote: > > Whichever layer dispatched the IO to a CPU specific context should > > be the one to wait for its completion. That should be blk-mq for most > > block drivers. > >

Re: Question on handling managed IRQs when hotplugging CPUs

2019-02-05 Thread Keith Busch
On Tue, Feb 05, 2019 at 03:09:28PM +, John Garry wrote: > On 05/02/2019 14:52, Keith Busch wrote: > > On Tue, Feb 05, 2019 at 05:24:11AM -0800, John Garry wrote: > > > On 04/02/2019 07:12, Hannes Reinecke wrote: > > > > > > Hi Hannes, > > > >

Re: Question on handling managed IRQs when hotplugging CPUs

2019-02-05 Thread Keith Busch
On Tue, Feb 05, 2019 at 05:24:11AM -0800, John Garry wrote: > On 04/02/2019 07:12, Hannes Reinecke wrote: > > Hi Hannes, > > > > > So, as the user then has to wait for the system to declars 'ready for > > CPU remove', why can't we just disable the SQ and wait for all I/O to > > complete? > > We c

Re: [PATCH v2 0/4] Write-hint for FS journal

2019-01-28 Thread Keith Busch
On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote: > On Fri 25-01-19 09:23:53, Keith Busch wrote: > > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote: > > > Towards supporing write-hints/streams for

Re: [PATCH v2 0/4] Write-hint for FS journal

2019-01-25 Thread Keith Busch
On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote: > Towards supporing write-hints/streams for filesystem journal. > > > > Here is the v1 patch for background -

Re: [PATCH v4 2/2] trace nvme submit queue status

2018-12-18 Thread Keith Busch
On Tue, Dec 18, 2018 at 06:47:50PM +0100, h...@lst.de wrote: > On Tue, Dec 18, 2018 at 10:26:46AM -0700, Keith Busch wrote: > > No need for a space after the %s. __print_disk_name already appends a > > space if there's a disk name, and we don't want the extra space if

Re: [PATCH v4 2/2] trace nvme submit queue status

2018-12-18 Thread Keith Busch
On Mon, Dec 17, 2018 at 08:51:38PM -0800, yupeng wrote: > +TRACE_EVENT(nvme_sq, > + TP_PROTO(void *rq_disk, int qid, int sq_head, int sq_tail), > + TP_ARGS(rq_disk, qid, sq_head, sq_tail), > + TP_STRUCT__entry( > + __array(char, disk, DISK_NAME_LEN) > + __field(i

Re: [PATCH v2] nvme: provide fallback for discard alloc failure

2018-12-12 Thread Keith Busch
On Wed, Dec 12, 2018 at 09:36:36AM -0700, Jens Axboe wrote: > On 12/12/18 9:28 AM, Keith Busch wrote: > > On Wed, Dec 12, 2018 at 09:18:11AM -0700, Jens Axboe wrote: > >> When boxes are run near (or to) OOM, we have a problem with the discard > >> page allocation in nvme

Re: [PATCH v2] nvme: provide fallback for discard alloc failure

2018-12-12 Thread Keith Busch
On Wed, Dec 12, 2018 at 09:18:11AM -0700, Jens Axboe wrote: > When boxes are run near (or to) OOM, we have a problem with the discard > page allocation in nvme. If we fail allocating the special page, we > return busy, and it'll get retried. But since ordering is honored for > dispatch requests, we

Re: [PATCH nvme-cli 6/5] fabrics: pass in nr_write_queues

2018-12-11 Thread Keith Busch
On Tue, Dec 11, 2018 at 02:49:36AM -0800, Sagi Grimberg wrote: > if (cfg.host_traddr) { > len = sprintf(p, ",host_traddr=%s", cfg.host_traddr); > if (len < 0) > @@ -1009,6 +1019,7 @@ int connect(const char *desc, int argc, char **argv) > {"hostnqn",

Re: [PATCH] block: Restore tape support

2018-12-10 Thread Keith Busch
On Mon, Dec 10, 2018 at 03:06:52PM -0500, Laurence Oberman wrote: > Tested and works fine. > Thanks All > > Tested-by: Laurence Oberman Cool, thank you for confirming.

[PATCH] block/bio: Do not zero user pages

2018-12-10 Thread Keith Busch
We don't need to zero fill the bio if not using kernel allocated pages. Fixes: f3587d76da05 ("block: Clear kernel memory before copying to user") # v4.20-rc2 Reported-by: Todd Aiken Cc: Laurence Oberman Cc: sta...@vger.kernel.org Cc: Bart Van Assche Signed-off-by: Keith Bu

Re: [PATCH] block: Restore tape support

2018-12-10 Thread Keith Busch
On Sun, Dec 09, 2018 at 07:08:14PM -0800, Bart Van Assche wrote: > According to what I found in > https://bugzilla.kernel.org/show_bug.cgi?id=201935 patch "block: Clear > kernel memory before copying to user" broke tape access. Hence revert > that patch. Instead of reverting back to the leaking ar

Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-12-04 Thread Keith Busch
On Tue, Dec 04, 2018 at 02:21:17PM -0700, Keith Busch wrote: > On Tue, Dec 04, 2018 at 11:33:33AM -0800, James Smart wrote: > > On 12/4/2018 9:48 AM, Keith Busch wrote: > > > Once quiesced, the proposed iterator can handle the final termination > > > of the request, perf

  1   2   3   4   5   >