On Tue, May 22, 2018 at 12:04:53AM +0800, Ming Lei wrote:
> On Mon, May 21, 2018 at 09:44:33AM -0600, Keith Busch wrote:
> > On Mon, May 21, 2018 at 11:34:27PM +0800, Ming Lei wrote:
> > > nvme_dev_disable() quiesces queues first before killing queues.
> > >
> >
On Mon, May 21, 2018 at 11:25:43PM +0800, Ming Lei wrote:
> On Mon, May 21, 2018 at 08:04:13AM -0600, Keith Busch wrote:
> > On Sat, May 19, 2018 at 08:01:42AM +0800, Ming Lei wrote:
> > > > You keep saying that, but the controller state is global to the
> > > > co
On Mon, May 21, 2018 at 11:34:27PM +0800, Ming Lei wrote:
> nvme_dev_disable() quiesces queues first before killing queues.
>
> If queues are quiesced during or before nvme_wait_freeze() is run
> from the 2nd part of reset, the 2nd part can't move on, and IO hang
> is caused. Finally no reset can
On Mon, May 21, 2018 at 10:58:51PM +0800, Ming Lei wrote:
> On Mon, May 21, 2018 at 08:22:19AM -0600, Keith Busch wrote:
> > On Sat, May 19, 2018 at 07:03:58AM +0800, Ming Lei wrote:
> > > On Fri, May 18, 2018 at 10:38:20AM -0600, Keith Busch wrote:
> > > >
On Sat, May 19, 2018 at 07:03:58AM +0800, Ming Lei wrote:
> On Fri, May 18, 2018 at 10:38:20AM -0600, Keith Busch wrote:
> > +
> > + if (unfreeze)
> > + nvme_wait_freeze(>ctrl);
> > +
>
> timeout may comes just before blk_mq_update_nr_hw_queu
On Mon, May 21, 2018 at 02:37:56AM -0400, Yi Zhang wrote:
> Hi Keith
> I tried this patch on my R730 Server, but it lead to system hang after
> setpci, could you help check it, thanks.
>
> Console log:
> storageqe-62 login:
> Kernel 4.17.0-rc5 on an x86_64
>
> storageqe-62 login: [
On Sat, May 19, 2018 at 08:01:42AM +0800, Ming Lei wrote:
> > You keep saying that, but the controller state is global to the
> > controller. It doesn't matter which namespace request_queue started the
> > reset: every namespaces request queue sees the RESETTING controller state
>
> When timeouts
On Sat, May 19, 2018 at 06:26:28AM +0800, Ming Lei wrote:
> So could we please face to the real issue instead of working around
> test case?
Yes, that's why I want you to stop referencing the broken test.
On Sat, May 19, 2018 at 06:32:11AM +0800, Ming Lei wrote:
> This way can't sync timeout reliably, since timeout events can
> come from two NS at the same time, and one may be handled as
> RESET_TIMER, and another one can be handled as EH_HANDLED.
You keep saying that, but the controller state is
to bring it online. This can look like
a permanent device failure from the driver's perspective.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
tests/block/011 | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/tests/block/011 b/tests/block/011
index 62e89f7..2
, and is not harmful to do a second time.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
drivers/nvme/host/pci.c | 18 --
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8da63402d474..2bd9d84f58d0 100644
--- a/d
If the reset failed due to a non-fatal error, this patch will attempt
to reset the controller again, with a maximum of 4 attempts.
Since the failed reset case has changed purpose, this patch provides a
more appropriate name and warning message for the reset failure.
Signed-off-by: Keith Busch
renaming the function 'nvme_dev_add' to a
more appropriate name that describes what it's actually doing:
nvme_alloc_io_tags.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
drivers/nvme/host/core.c | 3 +++
drivers/nvme/host/nvme.h | 1 +
drivers/nvme/host/pci.c
each of those messages, so this patch adds a ratelimit on them.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
drivers/nvme/host/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ddfeb186d129..e4b91c
This patch fixes races that occur with simultaneous controller
resets by synchronizing request queues prior to initializing the
controller. Withouth this, a thread may attempt disabling a controller
at the same time as we're trying to enable it.
Signed-off-by: Keith Busch <keith.bu...@intel.
A failed connection may be retryable. This patch allows the connecting
state to initiate a reset so that it may try to connect again.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
drivers/nvme/host/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/host/co
On Thu, May 17, 2018 at 04:23:45PM +0200, Johannes Thumshirn wrote:
> > Agreed. Alternatively possibly call the driver's reset_preparei/done
> > callbacks.
>
> Exactly, but as long as we can issue the reset via sysfs the test-case
> is still valid.
I disagree the test case is valid. The test
On Fri, May 18, 2018 at 08:20:05AM +0800, Ming Lei wrote:
> What I think block/011 is helpful is that it can trigger IO timeout
> during reset, which can be triggered in reality too.
As I mentioned earlier, there is nothing wrong with the spirit of
the test. What's wrong with it is the misguided
On Thu, May 17, 2018 at 10:41:29AM +0200, Christoph Hellwig wrote:
> On Wed, May 16, 2018 at 08:20:31PM -0600, Keith Busch wrote:
> > On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote:
> > > All simulation in block/011 may happen in reality.
> >
> > If this t
Hi Ming,
I'm developing the answers in code the issues you raised. It will just
take a moment to complete flushing those out. In the meantime just want
to point out why I think block/011 isn't a real test.
On Thu, May 17, 2018 at 07:10:59AM +0800, Ming Lei wrote:
> All simulation in block/011
On Wed, May 16, 2018 at 12:31:28PM +0800, Ming Lei wrote:
> Hi Keith,
>
> This issue may probably be fixed by Jianchao's patch of 'nvme: pci: set
> nvmeq->cq_vector
> after alloc cq/sq'[1] and my another patch of 'nvme: pci: unquiesce admin
> queue after controller is shutdown'[2], and both two
Hi Ming,
I'm sorry, but I am frankly not on board with introducing yet
another work-queue into this driver for handling this situation. The
fact you missed syncing with this queue in the surprise remove case,
introducing various use-after-free conditions, just demonstrates exactly
how
On Tue, May 15, 2018 at 07:47:07AM +0800, Ming Lei wrote:
> > > > [ 760.727485] nvme nvme1: EH 0: after recovery -19
> > > > [ 760.727488] nvme nvme1: EH: fail controller
> > >
> > > The above issue(hang in nvme_remove()) is still an old issue, which
> > > is because queues are kept as quiesce
On Mon, May 14, 2018 at 02:01:36PM -0700, Omar Sandoval wrote:
> On Mon, May 14, 2018 at 02:42:41PM -0600, Keith Busch wrote:
> > This test will run a background IO process and inject an admin command
> > with a very short timeout that is all but guaranteed to expire without
&
This test will run a background IO process and inject an admin command
with a very short timeout that is all but guaranteed to expire without
a completion: the async event request.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
v1 -> v2:
Changed description since its n
This test will run a background IO process and inject an admin command
with a very short timeout that is all but guaranteed to expire without
a completion: the async event request.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
tests/nvme/005
Hi Ming,
On Sat, May 12, 2018 at 08:21:22AM +0800, Ming Lei wrote:
> > [ 760.679960] nvme nvme1: controller is down; will reset: CSTS=0x,
> > PCI_STATUS=0x
> > [ 760.701468] nvme nvme1: EH 0: after shutdown, top eh: 1
> > [ 760.727099] pci_raw_set_power_state: 62 callbacks
On Fri, May 11, 2018 at 08:29:24PM +0800, Ming Lei wrote:
> Hi,
>
> The 1st patch introduces blk_quiesce_timeout() and blk_unquiesce_timeout()
> for NVMe, meantime fixes blk_sync_queue().
>
> The 2nd patch covers timeout for admin commands for recovering controller
> for avoiding possible
On Fri, May 11, 2018 at 06:03:59AM +0800, Ming Lei wrote:
> Sorry, forget to mention, it isn't enough to simply sync timeout inside
> reset().
>
> Another tricky thing is about freeze & unfreeze, now freeze is done in
> nvme_dev_disable(), and unfreeze is done in nvme_reset_work. That means
> we
On Fri, May 11, 2018 at 05:24:46AM +0800, Ming Lei wrote:
> Could you share me the link?
The diff was in this reply here:
http://lists.infradead.org/pipermail/linux-nvme/2018-April/017019.html
> Firstly, the previous nvme_sync_queues() won't work reliably, so this
> patch introduces
On Fri, May 11, 2018 at 05:10:40AM +0800, Ming Lei wrote:
> On Fri, May 11, 2018 at 5:05 AM, Keith Busch
> <keith.bu...@linux.intel.com> wrote:
> > On Fri, May 11, 2018 at 04:52:11AM +0800, Ming Lei wrote:
> >> Hi Keith,
> >>
> >> On Tue, May 8, 2018 at
On Fri, May 11, 2018 at 04:52:11AM +0800, Ming Lei wrote:
> Hi Keith,
>
> On Tue, May 8, 2018 at 11:30 PM, Keith Busch <keith.bu...@intel.com> wrote:
> > On Sat, Apr 28, 2018 at 11:50:17AM +0800, Ming Lei wrote:
> >> This sync may be raced with one timed-out
On Sat, Apr 28, 2018 at 11:50:17AM +0800, Ming Lei wrote:
> This sync may be raced with one timed-out request, which may be handled
> as BLK_EH_HANDLED or BLK_EH_RESET_TIMER, so the above sync queues can't
> work reliably.
Ming,
As proposed, that scenario is impossible to encounter. Resetting
On Sat, May 05, 2018 at 07:51:22PM -0400, Laurence Oberman wrote:
> 3rd and 4th attempts slightly better, but clearly not dependable
>
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)[failed]
> runtime... 81.188s
> ---
On Sun, Apr 29, 2018 at 05:39:52AM +0800, Ming Lei wrote:
> On Sat, Apr 28, 2018 at 9:35 PM, Keith Busch
> <keith.bu...@linux.intel.com> wrote:
> > On Sat, Apr 28, 2018 at 11:50:17AM +0800, Ming Lei wrote:
> >> > I understand how the problems are happe
On Sat, Apr 28, 2018 at 11:50:17AM +0800, Ming Lei wrote:
> > I understand how the problems are happening a bit better now. It used
> > to be that blk-mq would lock an expired command one at a time, so when
> > we had a batch of IO timeouts, the driver was able to complete all of
> > them inside a
On Thu, Apr 26, 2018 at 08:39:55PM +0800, Ming Lei wrote:
> +/*
> + * This one is called after queues are quiesced, and no in-fligh timeout
> + * and nvme interrupt handling.
> + */
> +static void nvme_pci_cancel_request(struct request *req, void *data,
> + bool reserved)
> +{
> +
is reenabled at the end of the test.
Note, this is currently incompatible with NVMe Subsystems when
CONFIG_NVME_MULTIPATH since the /dev/nvme*n* names don't have a pci
parent in sysfs.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
v1 -> v2:
Incorporated feedback from Omar and Johannes
On Thu, Apr 26, 2018 at 08:39:56PM +0800, Ming Lei wrote:
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 5d05a04f8e72..1e058deb4718 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1265,6 +1265,20 @@ static enum blk_eh_timer_return
On Wed, Apr 25, 2018 at 08:56:02AM -0700, Omar Sandoval wrote:
> On Tue, Apr 24, 2018 at 03:41:46PM -0600, Keith Busch wrote:
> > +_test_hotplug_slot() {
>
> I'd call this _test_dev_in_hotplug_slot().
Sounds good.
> > + parent="$(_get_pci_parent_from_blkdev)"
On Tue, Apr 24, 2018 at 04:04:22PM -0600, Johannes Thumshirn wrote:
> On Tue, Apr 24, 2018 at 03:41:46PM -0600, Keith Busch wrote:
>
> [...]
>
> > +DESCRIPTION="break PCI link while doing I/O"
> > +TIMED=1
>
> I _think_ we can set QUICK=1 here, Omar?
>
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
common/rc | 17 +
tests/block/016 | 52
2 files changed, 69 insertions(+)
create mode 100755 tests/block/016
diff --git a/common/rc b/common/rc
index 1
On Tue, Apr 17, 2018 at 04:15:38PM -0600, Jens Axboe wrote:
> >> Looks good to me.
> >>
> >> Reviewed-by: Matias Bjørling
> >>
> >> Keith, when convenient can you pick this up for 4.18?
> >
> > This looks safe for 4.17-rc2, no? Unless you want to wait for the next
> > release.
On Tue, Apr 17, 2018 at 08:16:25AM +0200, Matias Bjørling wrote:
> On 4/17/18 3:55 AM, Wei Xu wrote:
> > Add a new lightnvm quirk to identify CNEX’s Granby controller.
> >
> > Signed-off-by: Wei Xu
> > ---
> > drivers/nvme/host/pci.c | 2 ++
> > 1 file changed, 2
On Wed, Mar 28, 2018 at 09:32:14AM +0200, Christoph Hellwig wrote:
> On Tue, Mar 27, 2018 at 09:39:08AM -0600, Keith Busch wrote:
> > - return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
> > + return blk_mq_pci_map_queues(set, t
.br...@microsemi.com>
Cc: <qla2xxx-upstr...@qlogic.com>
Cc: <linux-s...@vger.kernel.org>
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
v1 -> v2:
Update blk-mq API directly instead of chaining a default parameter to
a new API, and update all drivers accord
All the queue memory is allocated up front. We don't take the node
into consideration when creating queues anymore, so removing the unused
parameter.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
---
v1 -> v2:
Added review.
dr
will share the pre_vector with all CPUs assigned.
Cc: Jianchao Wang <jianchao.w.w...@oracle.com>
Cc: Ming Lei <ming@redhat.com>
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
v1 -> v2:
Update to use new blk-mq API.
Removed unnecessary braces, inline function
On Mon, Mar 26, 2018 at 09:50:38AM +0800, Ming Lei wrote:
>
> Given no many callers of blk_mq_pci_map_queues(), I suggest to add the
> parameter of 'offset' to this API directly, then people may keep the
> '.pre_vectors' stuff in mind, and avoid to misuse it.
Yeah, I think I have to agree. I
On Sat, Mar 24, 2018 at 09:55:49PM +0800, jianchao.wang wrote:
> Maybe we could provide a callback parameter for __blk_mq_pci_map_queues which
> give the mapping from hctx queue num to device-relative interrupt vector
> index.
If a driver's mapping is so complicated as to require a special
On Mon, Mar 26, 2018 at 09:47:07AM +0800, Ming Lei wrote:
> On Fri, Mar 23, 2018 at 04:19:22PM -0600, Keith Busch wrote:
> > @@ -1629,9 +1627,7 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
> > int ret = 0;
> >
> > for (i = dev->ctrl.que
All nvme queue memory is allocated up front. We don't take the node
into consideration when creating queues anymore, so removing the unused
parameter.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
drivers/nvme/host/pci.c | 10 +++---
1 file changed, 3 insertions(+), 7 del
ly one interrupt vector available, the admin
and IO queues will share the pre_vector with all CPUs assigned.
Signed-off-by: Jianchao Wang <jianchao.w.w...@oracle.com>
Reviewed-by: Ming Lei <ming@redhat.com>
[changelog, code comments, merge, and blk-mq pci vector offset]
Signed-o
The PCI interrupt vectors intended to be associated with a queue may
not start at 0. This patch adds an offset parameter so blk-mq may find
the intended affinity mask. The default value is 0 so existing drivers
that don't care about this parameter don't need to change.
Signed-off-by: Keith Busch
On Wed, Mar 21, 2018 at 03:06:05AM -0700, Matias Bjørling wrote:
> > outside of nvme core so that we can use it form lightnvm.
> >
> > Signed-off-by: Javier González
> > ---
> > drivers/lightnvm/core.c | 11 +++
> > drivers/nvme/host/core.c | 6 ++--
> >
On Mon, Mar 05, 2018 at 01:10:53PM -0700, Jason Gunthorpe wrote:
> So when reading the above mlx code, we see the first wmb() being used
> to ensure that CPU stores to cachable memory are visible to the DMA
> triggered by the doorbell ring.
IIUC, we don't need a similar barrier for NVMe to ensure
On Mon, Mar 05, 2018 at 12:33:29PM +1100, Oliver wrote:
> On Thu, Mar 1, 2018 at 10:40 AM, Logan Gunthorpe wrote:
> > @@ -429,10 +429,7 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq,
> > {
> > u16 tail = nvmeq->sq_tail;
>
> > - if
On Thu, Mar 01, 2018 at 11:00:51PM +, Stephen Bates wrote:
>
> P2P is about offloading the memory and PCI subsystem of the host CPU
> and this is achieved no matter which p2p_dev is used.
Even within a device, memory attributes for its various regions may not be
the same. There's a
On Thu, Mar 01, 2018 at 01:52:20AM +0100, Christoph Hellwig wrote:
> Looks fine,
>
> and we should pick this up for 4.16 independent of the rest, which
> I might need a little more review time for.
>
> Reviewed-by: Christoph Hellwig
Thanks, queued up for 4.16.
On Tue, Feb 20, 2018 at 02:21:37PM +0100, Peter Zijlstra wrote:
> Also, set_current_state(TASK_RUNNING) is dodgy (similarly in
> __blk_mq_poll), why do you need that memory barrier?
You're right. The subsequent revision that was committed removed the
barrier. The commit is here:
On Sun, May 30, 2083 at 09:51:06AM +0530, Nitesh Shetty wrote:
> This removes the dependency on interrupts to wake up task. Set task
> state as TASK_RUNNING, if need_resched() returns true,
> while polling for IO completion.
> Earlier, polling task used to sleep, relying on interrupt to wake it
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
block/blk-mq-sched.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 55c0a745b427..25c14c58385c 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -259,6 +259,8 @
On Thu, Feb 01, 2018 at 02:02:14PM -0700, Jens Axboe wrote:
> On 2/1/18 1:31 PM, Keith Busch wrote:
> > This adds support for merging discard requests in all IO schedulers.
>
> As per my last email, some of these aren't correct. Only the
> blk-mq-sched change is correct/useful,
This adds support for merging discard requests in all IO schedulers.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
block/bfq-iosched.c | 2 +-
block/blk-core.c | 4
block/blk-mq-sched.c | 2 ++
block/cfq-iosched.c | 2 +-
block/deadline-iosched
Discard requests operate under different constraints than other operations
and have different rules for merging. This patch will handle such requests
as a special case, using the same criteria and segment accounting used
for merging a discard bio into a reqseut.
Signed-off-by: Keith Busch
On Thu, Feb 01, 2018 at 10:58:23AM -0700, Jens Axboe wrote:
> I was able to reproduce on a test box, pretty trivially in fact:
>
> # echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
> # mkfs.ext4 /dev/nvme2n1
> # mount /dev/nvme2n1 /data -o discard
> # dd if=/dev/zero of=/data/10g bs=1M
Discard requests operate under different constraints than other operations
and have different rules for merging. This patch will treat such requests
as a special case, using the same criteria and segment accounting used
for merging a discard bio into a reqseut.
Signed-off-by: Keith Busch
On Thu, Feb 01, 2018 at 08:26:11AM -0700, Jens Axboe wrote:
> On 1/31/18 9:56 PM, Keith Busch wrote:
>
> > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > index 8452fc7164cc..01671e1373ff 100644
> > --- a/block/blk-merge.c
> > +++ b/block/blk-merge.c
> &
On Wed, Jan 31, 2018 at 08:07:41PM -0700, Jens Axboe wrote:
> if (total_phys_segments > queue_max_segments(q))
> - return 0;
> + return 0;
This perhaps unintended change happens to point out another problem:
queue_max_segments is the wrong limit for discards,
On Wed, Jan 31, 2018 at 08:29:37AM -0700, Jens Axboe wrote:
>
> How about something like the below?
>
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..cee102fb060e 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -574,8 +574,13 @@ static int
On Tue, Jan 30, 2018 at 01:32:25PM -0700, Jens Axboe wrote:
> On 1/30/18 1:30 PM, Keith Busch wrote:
> > On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote:
> >>
> >> Looking at the disassembly, 'n' is 2 and 'segments' is 0x.
> >
> > Is this
On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote:
>
> Looking at the disassembly, 'n' is 2 and 'segments' is 0x.
Is this still a problem if you don't use an IO scheduler? With deadline,
I'm not finding any path to bio_attempt_discard_merge which is where the
nr_phys_segments is
On Wed, Jan 03, 2018 at 01:21:05PM -0700, Keith Busch wrote:
>
> I've removed the submission side poll in a local build, and amazingly I
> am observing a not insignificant increase in latency without it on some
> workloads with certain hardware. I will have to delay recommending/posti
For the storage track, I would like to propose a topic for differentiated
blk-mq hardware contexts. Today, blk-mq considers all hardware contexts
equal, and are selected based on the software's CPU context. There are
use cases that benefit from having hardware context selection criteria
beyond
Hannes Reinecke <h...@suse.com>
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
drivers/nvme/host/core.c | 9 +
drivers/nvme/host/multipath.c | 44 ---
drivers/nvme/host/nvme.h | 5 +++--
3 files changed, 16 insertion
Uses common code for determining if an error should be retried on
alternate path.
Acked-by: Mike Snitzer <snit...@redhat.com>
Reviewed-by: Hannes Reinecke <h...@suse.com>
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
drivers/md/dm-mpath.c | 19 ++-
Uses common code for determining if an error should be retried on
alternate path.
Acked-by: Mike Snitzer <snit...@redhat.com>
Reviewed-by: Hannes Reinecke <h...@suse.com>
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
drivers/nvme/host/multipath.c | 14 +-
This patch provides a common decoder for block status path related errors
that may be retried so various entities wishing to consult this do not
have to duplicate this decision.
Acked-by: Mike Snitzer <snit...@redhat.com>
Reviewed-by: Hannes Reinecke <h...@suse.com>
Signed-off-by:
nel
doc for it.
Added reviews and acks.
Keith Busch (5):
nvme: Add more command status translation
nvme/multipath: Consult blk_status_t for failover
block: Provide blk_status_t decoding for path errors
nvme/multipath: Use blk_path_error
dm mpath: Use blk_path_error
drivers/md/dm-mpat
This adds more NVMe status code translations to blk_status_t values,
and captures all the current status codes NVMe multipath uses.
Acked-by: Mike Snitzer <snit...@redhat.com>
Reviewed-by: Hannes Reinecke <h...@suse.com>
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
On Mon, Jan 08, 2018 at 01:57:07AM -0800, Christoph Hellwig wrote:
> > - if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) {
> > - if (nvme_req_needs_failover(req)) {
> > + blk_status_t status = nvme_error_status(req);
> > +
> > + if (unlikely(status != BLK_STS_OK
On Mon, Jan 08, 2018 at 04:34:36PM +0100, Christoph Hellwig wrote:
> It's basically a kernel bug as it tries to access lbas that do not
> exist. BLK_STS_TARGET should be fine.
Okay, I'll fix this and address your other comments, and resend. Thanks
for the feedback.
On Fri, Jan 05, 2018 at 11:19:28AM -0700, Logan Gunthorpe wrote:
> Although it is not explicitly stated anywhere, pci_alloc_p2pmem() should
> always be at least 4k aligned. This is because the gen_pool that implements
> it is created with PAGE_SHIFT for its min_alloc_order.
Ah, I see that now.
On Thu, Jan 04, 2018 at 12:01:34PM -0700, Logan Gunthorpe wrote:
> Register the CMB buffer as p2pmem and use the appropriate allocation
> functions to create and destroy the IO SQ.
>
> If the CMB supports WDS and RDS, publish it for use as p2p memory
> by other devices.
<>
> + if (qid &&
On Thu, Jan 04, 2018 at 06:36:27PM -0500, Mike Snitzer wrote:
> Right, I dropped that patch since it'd have only resulted in conflicts
> come merge time. As such, this series can easily go through the nvme
> tree to Jens.
It looks like you can also touch up dm to allow it to multipath nvme
even
This patch provides a common decoder for block status that may be retried
so various entities wishing to consult this do not have to duplicate
this decision.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
include/linux/blk_types.h | 16
1 file changed, 16 inse
Uses common code for determining if an error should be retried on
alternate path.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
drivers/md/dm-mpath.c | 19 ++-
1 file changed, 2 insertions(+), 17 deletions(-)
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-m
This removes nvme multipath's specific status decoding to see if failover
is needed, using the generic blk_status_t that was translated earlier. This
abstraction from the raw NVMe status means nvme status decoding exists
in just one place.
Signed-off-by: Keith Busch <keith.bu...@intel.
This adds more NVMe status code translations to blk_status_t values,
and captures all the current status codes NVMe multipath uses.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
drivers/nvme/host/core.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/drivers/nvme/host/co
-4.16 without that patch, as I'm not seeing it in the most
current branch.
Keith Busch (5):
nvme: Add more command status translation
nvme/multipath: Consult blk_status_t for failover
block: Provide blk_status_t decoding for retryable errors
nvme/multipath: Use blk_retryable
dm mpath: Use
On Thu, Dec 21, 2017 at 02:01:44PM -0700, Jens Axboe wrote:
> I haven't either, but curious if others had. It's mostly just extra
> overhead, I haven't seen it provide a latency reduction of any kind.
I've removed the submission side poll in a local build, and amazingly I
am observing a not
Instead of hiding NVMe path related errors, the NVMe driver needs to
code an appropriate generic block status from an NVMe status.
We already do this translation whether or not CONFIG_NVME_MULTIPATHING is
set, so I think it's silly NVMe native multipathing has a second status
decoder. This just
On Sun, Dec 31, 2017 at 02:30:09PM +0200, Sagi Grimberg wrote:
> Not sure if stealing bios from requests is a better design. Note that
> we do exactly this in other transport (nvme_[rdma|loop|fc]_is_ready).
Well, we're currently failing requests that may succeed if we could
back them out for
On Tue, Dec 19, 2017 at 04:05:41PM -0500, Mike Snitzer wrote:
> These patches enable DM multipath to work well on NVMe over Fabrics
> devices. Currently that implies CONFIG_NVME_MULTIPATH is _not_ set.
>
> But follow-on work will be to make it so that native NVMe multipath
> and DM multipath can
On Mon, Dec 25, 2017 at 12:11:47PM +0200, Sagi Grimberg wrote:
> > This is a performance optimization that allows the hardware to work on
> > a command in parallel with the kernel's stats and timeout tracking.
>
> Curious to know what does this buy us?
blk_mq_start_request doesn't do anything to
On Thu, Dec 21, 2017 at 03:17:41PM -0700, Jens Axboe wrote:
> On 12/21/17 2:34 PM, Keith Busch wrote:
> > It would be nice, but the driver doesn't know a request's completion
> > is going to be a polled.
>
> That's trivially solvable though, since the information is avai
On Thu, Dec 21, 2017 at 02:00:04PM -0700, Jens Axboe wrote:
> On 12/21/17 1:56 PM, Scott Bauer wrote:
> > On 12/21/2017 01:46 PM, Keith Busch wrote:
> >> @@ -181,7 +181,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
> >>struct task_struc
On Thu, Dec 21, 2017 at 01:53:44PM -0700, Jens Axboe wrote:
> Turns out that wasn't what patch 2 was. And the code is right there
> above as well, and under the q_lock, so I guess that race doesn't
> exist.
>
> But that does bring up the fact if we should always be doing the
>
/queue/io_poll_delay
nvme set-feature /dev/nvme0 -f 8 -v 0x4ff
fio profile:
[global]
ioengine=pvsync2
rw=randread
norandommap
direct=1
bs=4k
hipri
[hi-pri]
filename=/dev/nvme0n1
cpus_allowed=2
Keith Busch (3):
nvme/pci: Start request after doorbell ring
nvme/pci: Remove cq_
This is a micro-optimization removing unnecessary check for a disabled
queue. We no longer need this check because blk-mq provides the ability
to quiesce queues that nvme uses, and the doorbell registers are never
unmapped as long as requests are active.
Signed-off-by: Keith Busch <keith
101 - 200 of 305 matches
Mail list logo