Re: [PATCH 3/6] nvme: Move all IO out of controller reset

2018-05-21 Thread Keith Busch
On Tue, May 22, 2018 at 12:04:53AM +0800, Ming Lei wrote: > On Mon, May 21, 2018 at 09:44:33AM -0600, Keith Busch wrote: > > On Mon, May 21, 2018 at 11:34:27PM +0800, Ming Lei wrote: > > > nvme_dev_disable() quiesces queues first before killing queues. > > > > >

Re: [PATCH 1/6] nvme: Sync request queues on reset

2018-05-21 Thread Keith Busch
On Mon, May 21, 2018 at 11:25:43PM +0800, Ming Lei wrote: > On Mon, May 21, 2018 at 08:04:13AM -0600, Keith Busch wrote: > > On Sat, May 19, 2018 at 08:01:42AM +0800, Ming Lei wrote: > > > > You keep saying that, but the controller state is global to the > > > > co

Re: [PATCH 3/6] nvme: Move all IO out of controller reset

2018-05-21 Thread Keith Busch
On Mon, May 21, 2018 at 11:34:27PM +0800, Ming Lei wrote: > nvme_dev_disable() quiesces queues first before killing queues. > > If queues are quiesced during or before nvme_wait_freeze() is run > from the 2nd part of reset, the 2nd part can't move on, and IO hang > is caused. Finally no reset can

Re: [PATCH 3/6] nvme: Move all IO out of controller reset

2018-05-21 Thread Keith Busch
On Mon, May 21, 2018 at 10:58:51PM +0800, Ming Lei wrote: > On Mon, May 21, 2018 at 08:22:19AM -0600, Keith Busch wrote: > > On Sat, May 19, 2018 at 07:03:58AM +0800, Ming Lei wrote: > > > On Fri, May 18, 2018 at 10:38:20AM -0600, Keith Busch wrote: > > > >

Re: [PATCH 3/6] nvme: Move all IO out of controller reset

2018-05-21 Thread Keith Busch
On Sat, May 19, 2018 at 07:03:58AM +0800, Ming Lei wrote: > On Fri, May 18, 2018 at 10:38:20AM -0600, Keith Busch wrote: > > + > > + if (unfreeze) > > + nvme_wait_freeze(>ctrl); > > + > > timeout may comes just before blk_mq_update_nr_hw_queu

Re: [PATCH blktests] Fix block/011 to not use sysfs for device disabling

2018-05-21 Thread Keith Busch
On Mon, May 21, 2018 at 02:37:56AM -0400, Yi Zhang wrote: > Hi Keith > I tried this patch on my R730 Server, but it lead to system hang after > setpci, could you help check it, thanks. > > Console log: > storageqe-62 login: > Kernel 4.17.0-rc5 on an x86_64 > > storageqe-62 login: [

Re: [PATCH 1/6] nvme: Sync request queues on reset

2018-05-21 Thread Keith Busch
On Sat, May 19, 2018 at 08:01:42AM +0800, Ming Lei wrote: > > You keep saying that, but the controller state is global to the > > controller. It doesn't matter which namespace request_queue started the > > reset: every namespaces request queue sees the RESETTING controller state > > When timeouts

Re: [PATCH V6 11/11] nvme: pci: support nested EH

2018-05-18 Thread Keith Busch
On Sat, May 19, 2018 at 06:26:28AM +0800, Ming Lei wrote: > So could we please face to the real issue instead of working around > test case? Yes, that's why I want you to stop referencing the broken test.

Re: [PATCH 1/6] nvme: Sync request queues on reset

2018-05-18 Thread Keith Busch
On Sat, May 19, 2018 at 06:32:11AM +0800, Ming Lei wrote: > This way can't sync timeout reliably, since timeout events can > come from two NS at the same time, and one may be handled as > RESET_TIMER, and another one can be handled as EH_HANDLED. You keep saying that, but the controller state is

[PATCH blktests] Fix block/011 to not use sysfs for device disabling

2018-05-18 Thread Keith Busch
to bring it online. This can look like a permanent device failure from the driver's perspective. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- tests/block/011 | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/tests/block/011 b/tests/block/011 index 62e89f7..2

[PATCH 2/6] nvme-pci: Fix queue freeze criteria on reset

2018-05-18 Thread Keith Busch
, and is not harmful to do a second time. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- drivers/nvme/host/pci.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 8da63402d474..2bd9d84f58d0 100644 --- a/d

[PATCH 5/6] nvme-pci: Attempt reset retry for IO failures

2018-05-18 Thread Keith Busch
If the reset failed due to a non-fatal error, this patch will attempt to reset the controller again, with a maximum of 4 attempts. Since the failed reset case has changed purpose, this patch provides a more appropriate name and warning message for the reset failure. Signed-off-by: Keith Busch

[PATCH 3/6] nvme: Move all IO out of controller reset

2018-05-18 Thread Keith Busch
renaming the function 'nvme_dev_add' to a more appropriate name that describes what it's actually doing: nvme_alloc_io_tags. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- drivers/nvme/host/core.c | 3 +++ drivers/nvme/host/nvme.h | 1 + drivers/nvme/host/pci.c

[PATCH 6/6] nvme-pci: Rate limit the nvme timeout warnings

2018-05-18 Thread Keith Busch
each of those messages, so this patch adds a ratelimit on them. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- drivers/nvme/host/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index ddfeb186d129..e4b91c

[PATCH 1/6] nvme: Sync request queues on reset

2018-05-18 Thread Keith Busch
This patch fixes races that occur with simultaneous controller resets by synchronizing request queues prior to initializing the controller. Withouth this, a thread may attempt disabling a controller at the same time as we're trying to enable it. Signed-off-by: Keith Busch <keith.bu...@intel.

[PATCH 4/6] nvme: Allow reset from CONNECTING state

2018-05-18 Thread Keith Busch
A failed connection may be retryable. This patch allows the connecting state to initiate a reset so that it may try to connect again. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- drivers/nvme/host/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/co

Re: [PATCH V6 11/11] nvme: pci: support nested EH

2018-05-18 Thread Keith Busch
On Thu, May 17, 2018 at 04:23:45PM +0200, Johannes Thumshirn wrote: > > Agreed. Alternatively possibly call the driver's reset_preparei/done > > callbacks. > > Exactly, but as long as we can issue the reset via sysfs the test-case > is still valid. I disagree the test case is valid. The test

Re: [PATCH V6 11/11] nvme: pci: support nested EH

2018-05-18 Thread Keith Busch
On Fri, May 18, 2018 at 08:20:05AM +0800, Ming Lei wrote: > What I think block/011 is helpful is that it can trigger IO timeout > during reset, which can be triggered in reality too. As I mentioned earlier, there is nothing wrong with the spirit of the test. What's wrong with it is the misguided

Re: [PATCH V6 11/11] nvme: pci: support nested EH

2018-05-17 Thread Keith Busch
On Thu, May 17, 2018 at 10:41:29AM +0200, Christoph Hellwig wrote: > On Wed, May 16, 2018 at 08:20:31PM -0600, Keith Busch wrote: > > On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote: > > > All simulation in block/011 may happen in reality. > > > > If this t

Re: [PATCH V6 11/11] nvme: pci: support nested EH

2018-05-16 Thread Keith Busch
Hi Ming, I'm developing the answers in code the issues you raised. It will just take a moment to complete flushing those out. In the meantime just want to point out why I think block/011 isn't a real test. On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote: > All simulation in block/011

Re: [PATCH V5 0/9] nvme: pci: fix & improve timeout handling

2018-05-16 Thread Keith Busch
On Wed, May 16, 2018 at 12:31:28PM +0800, Ming Lei wrote: > Hi Keith, > > This issue may probably be fixed by Jianchao's patch of 'nvme: pci: set > nvmeq->cq_vector > after alloc cq/sq'[1] and my another patch of 'nvme: pci: unquiesce admin > queue after controller is shutdown'[2], and both two

Re: [PATCH V6 11/11] nvme: pci: support nested EH

2018-05-16 Thread Keith Busch
Hi Ming, I'm sorry, but I am frankly not on board with introducing yet another work-queue into this driver for handling this situation. The fact you missed syncing with this queue in the surprise remove case, introducing various use-after-free conditions, just demonstrates exactly how

Re: [PATCH V5 0/9] nvme: pci: fix & improve timeout handling

2018-05-14 Thread Keith Busch
On Tue, May 15, 2018 at 07:47:07AM +0800, Ming Lei wrote: > > > > [ 760.727485] nvme nvme1: EH 0: after recovery -19 > > > > [ 760.727488] nvme nvme1: EH: fail controller > > > > > > The above issue(hang in nvme_remove()) is still an old issue, which > > > is because queues are kept as quiesce

Re: [PATCHv2 blktests] nvme: Add command timeout injection test

2018-05-14 Thread Keith Busch
On Mon, May 14, 2018 at 02:01:36PM -0700, Omar Sandoval wrote: > On Mon, May 14, 2018 at 02:42:41PM -0600, Keith Busch wrote: > > This test will run a background IO process and inject an admin command > > with a very short timeout that is all but guaranteed to expire without &

[PATCHv2 blktests] nvme: Add command timeout injection test

2018-05-14 Thread Keith Busch
This test will run a background IO process and inject an admin command with a very short timeout that is all but guaranteed to expire without a completion: the async event request. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- v1 -> v2: Changed description since its n

[PATCH blktests] nvme: Add command timeout injection test

2018-05-14 Thread Keith Busch
This test will run a background IO process and inject an admin command with a very short timeout that is all but guaranteed to expire without a completion: the async event request. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- tests/nvme/005

Re: [PATCH V5 0/9] nvme: pci: fix & improve timeout handling

2018-05-14 Thread Keith Busch
Hi Ming, On Sat, May 12, 2018 at 08:21:22AM +0800, Ming Lei wrote: > > [ 760.679960] nvme nvme1: controller is down; will reset: CSTS=0x, > > PCI_STATUS=0x > > [ 760.701468] nvme nvme1: EH 0: after shutdown, top eh: 1 > > [ 760.727099] pci_raw_set_power_state: 62 callbacks

Re: [PATCH V5 0/9] nvme: pci: fix & improve timeout handling

2018-05-11 Thread Keith Busch
On Fri, May 11, 2018 at 08:29:24PM +0800, Ming Lei wrote: > Hi, > > The 1st patch introduces blk_quiesce_timeout() and blk_unquiesce_timeout() > for NVMe, meantime fixes blk_sync_queue(). > > The 2nd patch covers timeout for admin commands for recovering controller > for avoiding possible

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-05-10 Thread Keith Busch
On Fri, May 11, 2018 at 06:03:59AM +0800, Ming Lei wrote: > Sorry, forget to mention, it isn't enough to simply sync timeout inside > reset(). > > Another tricky thing is about freeze & unfreeze, now freeze is done in > nvme_dev_disable(), and unfreeze is done in nvme_reset_work. That means > we

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-05-10 Thread Keith Busch
On Fri, May 11, 2018 at 05:24:46AM +0800, Ming Lei wrote: > Could you share me the link? The diff was in this reply here: http://lists.infradead.org/pipermail/linux-nvme/2018-April/017019.html > Firstly, the previous nvme_sync_queues() won't work reliably, so this > patch introduces

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-05-10 Thread Keith Busch
On Fri, May 11, 2018 at 05:10:40AM +0800, Ming Lei wrote: > On Fri, May 11, 2018 at 5:05 AM, Keith Busch > <keith.bu...@linux.intel.com> wrote: > > On Fri, May 11, 2018 at 04:52:11AM +0800, Ming Lei wrote: > >> Hi Keith, > >> > >> On Tue, May 8, 2018 at

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-05-10 Thread Keith Busch
On Fri, May 11, 2018 at 04:52:11AM +0800, Ming Lei wrote: > Hi Keith, > > On Tue, May 8, 2018 at 11:30 PM, Keith Busch <keith.bu...@intel.com> wrote: > > On Sat, Apr 28, 2018 at 11:50:17AM +0800, Ming Lei wrote: > >> This sync may be raced with one timed-out

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-05-08 Thread Keith Busch
On Sat, Apr 28, 2018 at 11:50:17AM +0800, Ming Lei wrote: > This sync may be raced with one timed-out request, which may be handled > as BLK_EH_HANDLED or BLK_EH_RESET_TIMER, so the above sync queues can't > work reliably. Ming, As proposed, that scenario is impossible to encounter. Resetting

Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling

2018-05-08 Thread Keith Busch
On Sat, May 05, 2018 at 07:51:22PM -0400, Laurence Oberman wrote: > 3rd and 4th attempts slightly better, but clearly not dependable > > [root@segstorage1 blktests]# ./check block/011 > block/011 => nvme0n1 (disable PCI device while doing I/O)[failed] > runtime...  81.188s > ---

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-30 Thread Keith Busch
On Sun, Apr 29, 2018 at 05:39:52AM +0800, Ming Lei wrote: > On Sat, Apr 28, 2018 at 9:35 PM, Keith Busch > <keith.bu...@linux.intel.com> wrote: > > On Sat, Apr 28, 2018 at 11:50:17AM +0800, Ming Lei wrote: > >> > I understand how the problems are happe

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-28 Thread Keith Busch
On Sat, Apr 28, 2018 at 11:50:17AM +0800, Ming Lei wrote: > > I understand how the problems are happening a bit better now. It used > > to be that blk-mq would lock an expired command one at a time, so when > > we had a batch of IO timeouts, the driver was able to complete all of > > them inside a

Re: [PATCH 1/2] nvme: pci: simplify timeout handling

2018-04-27 Thread Keith Busch
On Thu, Apr 26, 2018 at 08:39:55PM +0800, Ming Lei wrote: > +/* > + * This one is called after queues are quiesced, and no in-fligh timeout > + * and nvme interrupt handling. > + */ > +static void nvme_pci_cancel_request(struct request *req, void *data, > + bool reserved) > +{ > +

[PATCHv2] Add surprise removal block test

2018-04-26 Thread Keith Busch
is reenabled at the end of the test. Note, this is currently incompatible with NVMe Subsystems when CONFIG_NVME_MULTIPATH since the /dev/nvme*n* names don't have a pci parent in sysfs. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- v1 -> v2: Incorporated feedback from Omar and Johannes

Re: [PATCH 2/2] nvme: pci: guarantee EH can make progress

2018-04-26 Thread Keith Busch
On Thu, Apr 26, 2018 at 08:39:56PM +0800, Ming Lei wrote: > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 5d05a04f8e72..1e058deb4718 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -1265,6 +1265,20 @@ static enum blk_eh_timer_return

Re: [PATCH blktests] Add surprise removal block test

2018-04-25 Thread Keith Busch
On Wed, Apr 25, 2018 at 08:56:02AM -0700, Omar Sandoval wrote: > On Tue, Apr 24, 2018 at 03:41:46PM -0600, Keith Busch wrote: > > +_test_hotplug_slot() { > > I'd call this _test_dev_in_hotplug_slot(). Sounds good. > > + parent="$(_get_pci_parent_from_blkdev)"

Re: [PATCH blktests] Add surprise removal block test

2018-04-24 Thread Keith Busch
On Tue, Apr 24, 2018 at 04:04:22PM -0600, Johannes Thumshirn wrote: > On Tue, Apr 24, 2018 at 03:41:46PM -0600, Keith Busch wrote: > > [...] > > > +DESCRIPTION="break PCI link while doing I/O" > > +TIMED=1 > > I _think_ we can set QUICK=1 here, Omar? >

[PATCH blktests] Add surprise removal block test

2018-04-24 Thread Keith Busch
Signed-off-by: Keith Busch <keith.bu...@intel.com> --- common/rc | 17 + tests/block/016 | 52 2 files changed, 69 insertions(+) create mode 100755 tests/block/016 diff --git a/common/rc b/common/rc index 1

Re: [PATCH] nvme: lightnvm: add granby support

2018-04-17 Thread Keith Busch
On Tue, Apr 17, 2018 at 04:15:38PM -0600, Jens Axboe wrote: > >> Looks good to me. > >> > >> Reviewed-by: Matias Bjørling > >> > >> Keith, when convenient can you pick this up for 4.18? > > > > This looks safe for 4.17-rc2, no? Unless you want to wait for the next > > release.

Re: [PATCH] nvme: lightnvm: add granby support

2018-04-17 Thread Keith Busch
On Tue, Apr 17, 2018 at 08:16:25AM +0200, Matias Bjørling wrote: > On 4/17/18 3:55 AM, Wei Xu wrote: > > Add a new lightnvm quirk to identify CNEX’s Granby controller. > > > > Signed-off-by: Wei Xu > > --- > > drivers/nvme/host/pci.c | 2 ++ > > 1 file changed, 2

Re: [PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors

2018-03-28 Thread Keith Busch
On Wed, Mar 28, 2018 at 09:32:14AM +0200, Christoph Hellwig wrote: > On Tue, Mar 27, 2018 at 09:39:08AM -0600, Keith Busch wrote: > > - return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0); > > + return blk_mq_pci_map_queues(set, t

[PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues

2018-03-27 Thread Keith Busch
.br...@microsemi.com> Cc: <qla2xxx-upstr...@qlogic.com> Cc: <linux-s...@vger.kernel.org> Signed-off-by: Keith Busch <keith.bu...@intel.com> --- v1 -> v2: Update blk-mq API directly instead of chaining a default parameter to a new API, and update all drivers accord

[PATCH 2/3] nvme-pci: Remove unused queue parameter

2018-03-27 Thread Keith Busch
All the queue memory is allocated up front. We don't take the node into consideration when creating queues anymore, so removing the unused parameter. Signed-off-by: Keith Busch <keith.bu...@intel.com> Reviewed-by: Christoph Hellwig <h...@lst.de> --- v1 -> v2: Added review. dr

[PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors

2018-03-27 Thread Keith Busch
will share the pre_vector with all CPUs assigned. Cc: Jianchao Wang <jianchao.w.w...@oracle.com> Cc: Ming Lei <ming@redhat.com> Signed-off-by: Keith Busch <keith.bu...@intel.com> --- v1 -> v2: Update to use new blk-mq API. Removed unnecessary braces, inline function

Re: [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues

2018-03-26 Thread Keith Busch
On Mon, Mar 26, 2018 at 09:50:38AM +0800, Ming Lei wrote: > > Given no many callers of blk_mq_pci_map_queues(), I suggest to add the > parameter of 'offset' to this API directly, then people may keep the > '.pre_vectors' stuff in mind, and avoid to misuse it. Yeah, I think I have to agree. I

Re: [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues

2018-03-26 Thread Keith Busch
On Sat, Mar 24, 2018 at 09:55:49PM +0800, jianchao.wang wrote: > Maybe we could provide a callback parameter for __blk_mq_pci_map_queues which > give the mapping from hctx queue num to device-relative interrupt vector > index. If a driver's mapping is so complicated as to require a special

Re: [PATCH 2/3] nvme-pci: Remove unused queue parameter

2018-03-26 Thread Keith Busch
On Mon, Mar 26, 2018 at 09:47:07AM +0800, Ming Lei wrote: > On Fri, Mar 23, 2018 at 04:19:22PM -0600, Keith Busch wrote: > > @@ -1629,9 +1627,7 @@ static int nvme_create_io_queues(struct nvme_dev *dev) > > int ret = 0; > > > > for (i = dev->ctrl.que

[PATCH 2/3] nvme-pci: Remove unused queue parameter

2018-03-23 Thread Keith Busch
All nvme queue memory is allocated up front. We don't take the node into consideration when creating queues anymore, so removing the unused parameter. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- drivers/nvme/host/pci.c | 10 +++--- 1 file changed, 3 insertions(+), 7 del

[PATCH 3/3] nvme-pci: Separate IO and admin queue IRQ vectors

2018-03-23 Thread Keith Busch
ly one interrupt vector available, the admin and IO queues will share the pre_vector with all CPUs assigned. Signed-off-by: Jianchao Wang <jianchao.w.w...@oracle.com> Reviewed-by: Ming Lei <ming@redhat.com> [changelog, code comments, merge, and blk-mq pci vector offset] Signed-o

[PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues

2018-03-23 Thread Keith Busch
The PCI interrupt vectors intended to be associated with a queue may not start at 0. This patch adds an offset parameter so blk-mq may find the intended affinity mask. The default value is 0 so existing drivers that don't care about this parameter don't need to change. Signed-off-by: Keith Busch

Re: [PATCH 08/12] lightnvm: implement get log report chunk helpers

2018-03-21 Thread Keith Busch
On Wed, Mar 21, 2018 at 03:06:05AM -0700, Matias Bjørling wrote: > > outside of nvme core so that we can use it form lightnvm. > > > > Signed-off-by: Javier González > > --- > > drivers/lightnvm/core.c | 11 +++ > > drivers/nvme/host/core.c | 6 ++-- > >

Re: [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB

2018-03-05 Thread Keith Busch
On Mon, Mar 05, 2018 at 01:10:53PM -0700, Jason Gunthorpe wrote: > So when reading the above mlx code, we see the first wmb() being used > to ensure that CPU stores to cachable memory are visible to the DMA > triggered by the doorbell ring. IIUC, we don't need a similar barrier for NVMe to ensure

Re: [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB

2018-03-05 Thread Keith Busch
On Mon, Mar 05, 2018 at 12:33:29PM +1100, Oliver wrote: > On Thu, Mar 1, 2018 at 10:40 AM, Logan Gunthorpe wrote: > > @@ -429,10 +429,7 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq, > > { > > u16 tail = nvmeq->sq_tail; > > > - if

Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory

2018-03-01 Thread Keith Busch
On Thu, Mar 01, 2018 at 11:00:51PM +, Stephen Bates wrote: > > P2P is about offloading the memory and PCI subsystem of the host CPU > and this is achieved no matter which p2p_dev is used. Even within a device, memory attributes for its various regions may not be the same. There's a

Re: [PATCH 5/5] nvme: pci: pass max vectors as num_possible_cpus() to pci_alloc_irq_vectors

2018-03-01 Thread Keith Busch
On Thu, Mar 01, 2018 at 01:52:20AM +0100, Christoph Hellwig wrote: > Looks fine, > > and we should pick this up for 4.16 independent of the rest, which > I might need a little more review time for. > > Reviewed-by: Christoph Hellwig Thanks, queued up for 4.16.

Re: [PATCH] blk: optimization for classic polling

2018-02-20 Thread Keith Busch
On Tue, Feb 20, 2018 at 02:21:37PM +0100, Peter Zijlstra wrote: > Also, set_current_state(TASK_RUNNING) is dodgy (similarly in > __blk_mq_poll), why do you need that memory barrier? You're right. The subsequent revision that was committed removed the barrier. The commit is here:

Re: [PATCH] blk: optimization for classic polling

2018-02-08 Thread Keith Busch
On Sun, May 30, 2083 at 09:51:06AM +0530, Nitesh Shetty wrote: > This removes the dependency on interrupts to wake up task. Set task > state as TASK_RUNNING, if need_resched() returns true, > while polling for IO completion. > Earlier, polling task used to sleep, relying on interrupt to wake it

[PATCH] blk-mq-sched: Enable merging discard bio into request

2018-02-01 Thread Keith Busch
Signed-off-by: Keith Busch <keith.bu...@intel.com> --- block/blk-mq-sched.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 55c0a745b427..25c14c58385c 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -259,6 +259,8 @

Re: [PATCHv2 2/2] block: Handle merging discards in IO schedulers

2018-02-01 Thread Keith Busch
On Thu, Feb 01, 2018 at 02:02:14PM -0700, Jens Axboe wrote: > On 2/1/18 1:31 PM, Keith Busch wrote: > > This adds support for merging discard requests in all IO schedulers. > > As per my last email, some of these aren't correct. Only the > blk-mq-sched change is correct/useful,

[PATCHv2 2/2] block: Handle merging discards in IO schedulers

2018-02-01 Thread Keith Busch
This adds support for merging discard requests in all IO schedulers. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- block/bfq-iosched.c | 2 +- block/blk-core.c | 4 block/blk-mq-sched.c | 2 ++ block/cfq-iosched.c | 2 +- block/deadline-iosched

[PATCHv2 1/2] block: Merge discard requests as a special case

2018-02-01 Thread Keith Busch
Discard requests operate under different constraints than other operations and have different rules for merging. This patch will handle such requests as a special case, using the same criteria and segment accounting used for merging a discard bio into a reqseut. Signed-off-by: Keith Busch

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-02-01 Thread Keith Busch
On Thu, Feb 01, 2018 at 10:58:23AM -0700, Jens Axboe wrote: > I was able to reproduce on a test box, pretty trivially in fact: > > # echo mq-deadline > /sys/block/nvme2n1/queue/scheduler > # mkfs.ext4 /dev/nvme2n1 > # mount /dev/nvme2n1 /data -o discard > # dd if=/dev/zero of=/data/10g bs=1M

[PATCH] block: Merge discard requests as a special case

2018-02-01 Thread Keith Busch
Discard requests operate under different constraints than other operations and have different rules for merging. This patch will treat such requests as a special case, using the same criteria and segment accounting used for merging a discard bio into a reqseut. Signed-off-by: Keith Busch

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-02-01 Thread Keith Busch
On Thu, Feb 01, 2018 at 08:26:11AM -0700, Jens Axboe wrote: > On 1/31/18 9:56 PM, Keith Busch wrote: > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > > index 8452fc7164cc..01671e1373ff 100644 > > --- a/block/blk-merge.c > > +++ b/block/blk-merge.c > &

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Keith Busch
On Wed, Jan 31, 2018 at 08:07:41PM -0700, Jens Axboe wrote: > if (total_phys_segments > queue_max_segments(q)) > - return 0; > + return 0; This perhaps unintended change happens to point out another problem: queue_max_segments is the wrong limit for discards,

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Keith Busch
On Wed, Jan 31, 2018 at 08:29:37AM -0700, Jens Axboe wrote: > > How about something like the below? > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 8452fc7164cc..cee102fb060e 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -574,8 +574,13 @@ static int

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-30 Thread Keith Busch
On Tue, Jan 30, 2018 at 01:32:25PM -0700, Jens Axboe wrote: > On 1/30/18 1:30 PM, Keith Busch wrote: > > On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote: > >> > >> Looking at the disassembly, 'n' is 2 and 'segments' is 0x. > > > > Is this

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-30 Thread Keith Busch
On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote: > > Looking at the disassembly, 'n' is 2 and 'segments' is 0x. Is this still a problem if you don't use an IO scheduler? With deadline, I'm not finding any path to bio_attempt_discard_merge which is where the nr_phys_segments is

Re: [PATCH 1/3] nvme/pci: Start request after doorbell ring

2018-01-22 Thread Keith Busch
On Wed, Jan 03, 2018 at 01:21:05PM -0700, Keith Busch wrote: > > I've removed the submission side poll in a local build, and amazingly I > am observing a not insignificant increase in latency without it on some > workloads with certain hardware. I will have to delay recommending/posti

[LSF/MM TOPIC] blk-mq priority based hctx selection

2018-01-15 Thread Keith Busch
For the storage track, I would like to propose a topic for differentiated blk-mq hardware contexts. Today, blk-mq considers all hardware contexts equal, and are selected based on the software's CPU context. There are use cases that benefit from having hardware context selection criteria beyond

[PATCHv2 2/5] nvme/multipath: Consult blk_status_t for failover

2018-01-09 Thread Keith Busch
Hannes Reinecke <h...@suse.com> Signed-off-by: Keith Busch <keith.bu...@intel.com> --- drivers/nvme/host/core.c | 9 + drivers/nvme/host/multipath.c | 44 --- drivers/nvme/host/nvme.h | 5 +++-- 3 files changed, 16 insertion

[PATCHv2 5/5] dm mpath: Use blk_path_error

2018-01-09 Thread Keith Busch
Uses common code for determining if an error should be retried on alternate path. Acked-by: Mike Snitzer <snit...@redhat.com> Reviewed-by: Hannes Reinecke <h...@suse.com> Signed-off-by: Keith Busch <keith.bu...@intel.com> --- drivers/md/dm-mpath.c | 19 ++-

[PATCHv2 4/5] nvme/multipath: Use blk_path_error

2018-01-09 Thread Keith Busch
Uses common code for determining if an error should be retried on alternate path. Acked-by: Mike Snitzer <snit...@redhat.com> Reviewed-by: Hannes Reinecke <h...@suse.com> Signed-off-by: Keith Busch <keith.bu...@intel.com> --- drivers/nvme/host/multipath.c | 14 +-

[PATCHv2 3/5] block: Provide blk_status_t decoding for path errors

2018-01-09 Thread Keith Busch
This patch provides a common decoder for block status path related errors that may be retried so various entities wishing to consult this do not have to duplicate this decision. Acked-by: Mike Snitzer <snit...@redhat.com> Reviewed-by: Hannes Reinecke <h...@suse.com> Signed-off-by:

[PATCHv2 0/5] nvme/dm failover unification

2018-01-09 Thread Keith Busch
nel doc for it. Added reviews and acks. Keith Busch (5): nvme: Add more command status translation nvme/multipath: Consult blk_status_t for failover block: Provide blk_status_t decoding for path errors nvme/multipath: Use blk_path_error dm mpath: Use blk_path_error drivers/md/dm-mpat

[PATCHv2 1/5] nvme: Add more command status translation

2018-01-09 Thread Keith Busch
This adds more NVMe status code translations to blk_status_t values, and captures all the current status codes NVMe multipath uses. Acked-by: Mike Snitzer <snit...@redhat.com> Reviewed-by: Hannes Reinecke <h...@suse.com> Signed-off-by: Keith Busch <keith.bu...@intel.com> ---

Re: [PATCH 2/5] nvme/multipath: Consult blk_status_t for failover

2018-01-09 Thread Keith Busch
On Mon, Jan 08, 2018 at 01:57:07AM -0800, Christoph Hellwig wrote: > > - if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) { > > - if (nvme_req_needs_failover(req)) { > > + blk_status_t status = nvme_error_status(req); > > + > > + if (unlikely(status != BLK_STS_OK

Re: [PATCH 1/5] nvme: Add more command status translation

2018-01-08 Thread Keith Busch
On Mon, Jan 08, 2018 at 04:34:36PM +0100, Christoph Hellwig wrote: > It's basically a kernel bug as it tries to access lbas that do not > exist. BLK_STS_TARGET should be fine. Okay, I'll fix this and address your other comments, and resend. Thanks for the feedback.

Re: [PATCH 09/12] nvme-pci: Use PCI p2pmem subsystem to manage the CMB

2018-01-05 Thread Keith Busch
On Fri, Jan 05, 2018 at 11:19:28AM -0700, Logan Gunthorpe wrote: > Although it is not explicitly stated anywhere, pci_alloc_p2pmem() should > always be at least 4k aligned. This is because the gen_pool that implements > it is created with PAGE_SHIFT for its min_alloc_order. Ah, I see that now.

Re: [PATCH 09/12] nvme-pci: Use PCI p2pmem subsystem to manage the CMB

2018-01-05 Thread Keith Busch
On Thu, Jan 04, 2018 at 12:01:34PM -0700, Logan Gunthorpe wrote: > Register the CMB buffer as p2pmem and use the appropriate allocation > functions to create and destroy the IO SQ. > > If the CMB supports WDS and RDS, publish it for use as p2p memory > by other devices. <> > + if (qid &&

Re: [PATCH 0/5] Failover criteria unification

2018-01-04 Thread Keith Busch
On Thu, Jan 04, 2018 at 06:36:27PM -0500, Mike Snitzer wrote: > Right, I dropped that patch since it'd have only resulted in conflicts > come merge time. As such, this series can easily go through the nvme > tree to Jens. It looks like you can also touch up dm to allow it to multipath nvme even

[PATCH 3/5] block: Provide blk_status_t decoding for retryable errors

2018-01-04 Thread Keith Busch
This patch provides a common decoder for block status that may be retried so various entities wishing to consult this do not have to duplicate this decision. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- include/linux/blk_types.h | 16 1 file changed, 16 inse

[PATCH 5/5] dm mpath: Use blk_retryable

2018-01-04 Thread Keith Busch
Uses common code for determining if an error should be retried on alternate path. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- drivers/md/dm-mpath.c | 19 ++- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-m

[PATCH 2/5] nvme/multipath: Consult blk_status_t for failover

2018-01-04 Thread Keith Busch
This removes nvme multipath's specific status decoding to see if failover is needed, using the generic blk_status_t that was translated earlier. This abstraction from the raw NVMe status means nvme status decoding exists in just one place. Signed-off-by: Keith Busch <keith.bu...@intel.

[PATCH 1/5] nvme: Add more command status translation

2018-01-04 Thread Keith Busch
This adds more NVMe status code translations to blk_status_t values, and captures all the current status codes NVMe multipath uses. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- drivers/nvme/host/core.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/nvme/host/co

[PATCH 0/5] Failover criteria unification

2018-01-04 Thread Keith Busch
-4.16 without that patch, as I'm not seeing it in the most current branch. Keith Busch (5): nvme: Add more command status translation nvme/multipath: Consult blk_status_t for failover block: Provide blk_status_t decoding for retryable errors nvme/multipath: Use blk_retryable dm mpath: Use

Re: [PATCH 1/3] nvme/pci: Start request after doorbell ring

2018-01-03 Thread Keith Busch
On Thu, Dec 21, 2017 at 02:01:44PM -0700, Jens Axboe wrote: > I haven't either, but curious if others had. It's mostly just extra > overhead, I haven't seen it provide a latency reduction of any kind. I've removed the submission side poll in a local build, and amazingly I am observing a not

Re: [for-4.16 PATCH v2 0/5] block, nvme, dm: allow DM multipath to use NVMe's error handler

2018-01-02 Thread Keith Busch
Instead of hiding NVMe path related errors, the NVMe driver needs to code an appropriate generic block status from an NVMe status. We already do this translation whether or not CONFIG_NVME_MULTIPATHING is set, so I think it's silly NVMe native multipathing has a second status decoder. This just

Re: [PATCH 2/3] nvme/pci: Remove cq_vector check in IO path

2018-01-02 Thread Keith Busch
On Sun, Dec 31, 2017 at 02:30:09PM +0200, Sagi Grimberg wrote: > Not sure if stealing bios from requests is a better design. Note that > we do exactly this in other transport (nvme_[rdma|loop|fc]_is_ready). Well, we're currently failing requests that may succeed if we could back them out for

Re: [for-4.16 PATCH 0/5] block, nvme, dm: allow DM multipath to use NVMe's error handler

2017-12-26 Thread Keith Busch
On Tue, Dec 19, 2017 at 04:05:41PM -0500, Mike Snitzer wrote: > These patches enable DM multipath to work well on NVMe over Fabrics > devices. Currently that implies CONFIG_NVME_MULTIPATH is _not_ set. > > But follow-on work will be to make it so that native NVMe multipath > and DM multipath can

Re: [PATCH 1/3] nvme/pci: Start request after doorbell ring

2017-12-26 Thread Keith Busch
On Mon, Dec 25, 2017 at 12:11:47PM +0200, Sagi Grimberg wrote: > > This is a performance optimization that allows the hardware to work on > > a command in parallel with the kernel's stats and timeout tracking. > > Curious to know what does this buy us? blk_mq_start_request doesn't do anything to

Re: [PATCH 3/3] block: Polling completion performance optimization

2017-12-21 Thread Keith Busch
On Thu, Dec 21, 2017 at 03:17:41PM -0700, Jens Axboe wrote: > On 12/21/17 2:34 PM, Keith Busch wrote: > > It would be nice, but the driver doesn't know a request's completion > > is going to be a polled. > > That's trivially solvable though, since the information is avai

Re: [PATCH 3/3] block: Polling completion performance optimization

2017-12-21 Thread Keith Busch
On Thu, Dec 21, 2017 at 02:00:04PM -0700, Jens Axboe wrote: > On 12/21/17 1:56 PM, Scott Bauer wrote: > > On 12/21/2017 01:46 PM, Keith Busch wrote: > >> @@ -181,7 +181,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio) > >>struct task_struc

Re: [PATCH 1/3] nvme/pci: Start request after doorbell ring

2017-12-21 Thread Keith Busch
On Thu, Dec 21, 2017 at 01:53:44PM -0700, Jens Axboe wrote: > Turns out that wasn't what patch 2 was. And the code is right there > above as well, and under the q_lock, so I guess that race doesn't > exist. > > But that does bring up the fact if we should always be doing the >

[PATCH 0/3] Performance enhancements

2017-12-21 Thread Keith Busch
/queue/io_poll_delay nvme set-feature /dev/nvme0 -f 8 -v 0x4ff fio profile: [global] ioengine=pvsync2 rw=randread norandommap direct=1 bs=4k hipri [hi-pri] filename=/dev/nvme0n1 cpus_allowed=2 Keith Busch (3): nvme/pci: Start request after doorbell ring nvme/pci: Remove cq_

[PATCH 2/3] nvme/pci: Remove cq_vector check in IO path

2017-12-21 Thread Keith Busch
This is a micro-optimization removing unnecessary check for a disabled queue. We no longer need this check because blk-mq provides the ability to quiesce queues that nvme uses, and the doorbell registers are never unmapped as long as requests are active. Signed-off-by: Keith Busch <keith

<    1   2   3   4   >