On 05/02/2018 04:46 PM, Coly Li wrote:
It is possible that multiple I/O requests hits on failed cache device or
backing device, therefore it is quite common that CACHE_SET_IO_DISABLE is
set already when a task tries to set the bit from bch_cached_dev_error()
or bch_cache_set_error(). Currently th
On 05/02/2018 04:46 PM, Coly Li wrote:
Commit 7e027ca4b534b ("bcache: add stop_when_cache_set_failed option to
backing device") adds stop_when_cache_set_failed option and stops bcache
device if stop_when_cache_set_failed is auto and there is dirty data on
broken cache device. There might exists a
On 05/02/2018 04:46 PM, Coly Li wrote:
When CACHE_SET_IO_DISABLE is set on cache set flags, bcache allocator
thread routine bch_allocator_thread() may stop the while-loops and
exit. Then it is possible to observe the following kernel oops message,
[ 631.068366] bcache: bch_btree_insert() error
On 05/02/2018 04:46 PM, Coly Li wrote:
Commit c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev")
counts backing device I/O requets and set dc->io_disable to true if error
counters exceeds dc->io_error_limit. But it only counts I/O errors for
regular I/O request, neglects errors of writ
On 05/02/2018 04:46 PM, Coly Li wrote:
Commit c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev") tries
to stop bcache device by calling bcache_device_stop() when too many I/O
errors happened on backing device. But if there is internal I/O happening
on cache device (writeback scan, garb
On 05/02/2018 04:46 PM, Coly Li wrote:
Current code uses bdevname() or bio_devname() to reference gendisk
disk name when bcache needs to display the disk names in kernel message.
It was safe before bcache device failure handling patch set merged in,
because when devices are failed, there was dead
When one req is timed out, now nvme_timeout() handles it by the
following way:
nvme_dev_disable(dev, false);
nvme_reset_ctrl(&dev->ctrl);
return BLK_EH_HANDLED.
And block's timeout handler is per-request-queue, that means each
namespace's error handling has to shutdown & r
During draining IO in resetting controller, error still may happen
and timeout can be triggered, the current implementation can't recover
controller any more for this situation, this patch fixes the issue
by the following approach:
- introduces eh_reset_work, and moves draining IO and updating con
This patch splits controller resetting into the following two parts:
1) the real resetting part
2) the 2nd part for draining IO and updating controller state
The patch prepares for supporting reliable controller recovery, for
example, IO timeout still may be triggered when running the above part
In nvme_dev_disable() called during shutting down controler,
nvme_wait_freeze_timeout() may be done on the controller not
frozen yet, so add the check for avoiding the case.
Cc: Jianchao Wang
Cc: Christoph Hellwig
Cc: Sagi Grimberg
Cc: linux-n...@lists.infradead.org
Cc: Laurence Oberman
Signed
When nvme_dev_disable() is used for error recovery, we should always
freeze queues before shutdown controller:
- reset handler supposes queues are frozen, and will wait_freeze &
unfreeze them explicitly, if queues aren't frozen during nvme_dev_disable(),
reset handler may wait forever even though
When admin commands are used in EH for recovering controller, we have to
cover their timeout and can't depend on block's timeout since deadlock may
be caused when these commands are timed-out by block layer again.
Cc: Jianchao Wang
Cc: Christoph Hellwig
Cc: Sagi Grimberg
Cc: linux-n...@lists.in
nvme_dev_disable() and resetting controller are required for recovering
controller, but the two are run from different contexts.
nvme_start_freeze() is run from nvme_dev_disable(), and nvme_unfreeze()
is run from resetting context. Unfortunatley timeout may be triggered
when draining IO from reset
Hi,
The 1st patch introduces blk_quiesce_timeout() and blk_unquiesce_timeout()
for NVMe, meantime fixes blk_sync_queue().
The 2nd patch covers timeout for admin commands for recovering controller
for avoiding possible deadlock.
The 3~5 patches fixes race between nvme_start_freeze() and
nvme_unfr
Turns out the current way can't drain timout completely because mod_timer()
can be triggered in the work func, which can be just run inside the synced
timeout work:
del_timer_sync(&q->timeout);
cancel_work_sync(&q->timeout_work);
This patch introduces one flag of 'timeout_off' for
From: Omar Sandoval
A few helpers are only used from blk-wbt.c, so move them there, and put
wbt_track() behind the CONFIG_BLK_WBT typedef. This is in preparation
for changing how the wbt flags are tracked.
Signed-off-by: Omar Sandoval
---
block/blk-wbt.c | 20
block/blk-wb
From: Omar Sandoval
struct blk_issue_stat squashes three things into one u64:
- The time the driver started working on a request
- The original size of the request (for the io.low controller)
- Flags for writeback throttling
It turns out that on x86_64, we have a 4 byte hole in struct request
w
From: Omar Sandoval
struct blk_issue_stat is going away, and bio->bi_issue_stat doesn't even
use the blk-stats interface, so we can provide a separate implementation
specific for bios. The helpers work the same way as the blk-stats
helpers.
Signed-off-by: Omar Sandoval
---
block/blk-throttle.c
From: Omar Sandoval
issue_stat is going to go away, so first make writeback throttling take
the containing request, update the internal wbt helpers accordingly, and
change rwb->sync_cookie to be the request pointer instead of the
issue_stat pointer. No functional change.
Signed-off-by: Omar Sand
From: Omar Sandoval
Josef mentioned that his upcoming cgroups io controller uses
blk_issue_stat, so moving it to be private to blk-throtl would cause him
some pain. v2 changes patch 3 to replace bi_issue_stat with a new type
with the same helpers (I didn't want to keep the naming because it no
lo
From: Omar Sandoval
cfq and bfq have some internal fields that use sched_clock() which can
trivially use ktime_get_ns() instead. Their timestamp fields in struct
request can also use ktime_get_ns(), which resolves the 8 year old
comment added by commit 28f4197e5d47 ("block: disable preemption bef
From: Omar Sandoval
Currently, struct request has four timestamp fields:
- A start time, set at get_request time, in jiffies, used for iostats
- An I/O start time, set at start_request time, in ktime nanoseconds,
used for blk-stats (i.e., wbt, kyber, hybrid polling)
- Another start time and an
From: Omar Sandoval
We want this next to blk_account_io_done() for the next change so that
we can call ktime_get() only once for both.
Signed-off-by: Omar Sandoval
---
block/blk-mq.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
in
On Wed, May 02, 2018 at 03:38:04PM +1000, Dave Chinner wrote:
> Hi folks,
>
> Version 3 of the FUA for O-DSYNC patchset. This version fixes bugs
> found in the previous version. Functionality is otherwise the same
> as described in the first version:
>
> https://marc.info/?l=linux-xfs&m=152213446
From: Omar Sandoval
Currently, struct request has four timestamp fields:
- A start time, set at get_request time, in jiffies, used for iostats
- An I/O start time, set at start_request time, in ktime nanoseconds,
used for blk-stats (i.e., wbt, kyber, hybrid polling)
- Another start time and an
From: Omar Sandoval
We want this next to blk_account_io_done() for the next change so that
we can call ktime_get() only once for both.
Signed-off-by: Omar Sandoval
---
block/blk-mq.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
in
From: Omar Sandoval
issue_stat is going to go away, so first make writeback throttling take
the containing request, update the internal wbt helpers accordingly, and
change rwb->sync_cookie to be the request pointer instead of the
issue_stat pointer. No functional change.
Signed-off-by: Omar Sand
From: Omar Sandoval
struct blk_issue_stat squashes three things into one u64:
- The time the driver started working on a request
- The original size of the request (for the io.low controller)
- Flags for writeback throttling
It turns out that on x86_64, we have a 4 byte hole in struct request
w
From: Omar Sandoval
A few helpers are only used from blk-wbt.c, so move them there, and put
wbt_track() behind the CONFIG_BLK_WBT typedef. This is in preparation
for changing how the wbt flags are tracked.
Signed-off-by: Omar Sandoval
---
block/blk-wbt.c | 20
block/blk-wb
From: Omar Sandoval
cfq and bfq have some internal fields that use sched_clock() which can
trivially use ktime_get_ns() instead. Their timestamp fields in struct
request can also use ktime_get_ns(), which resolves the 8 year old
comment added by commit 28f4197e5d47 ("block: disable preemption bef
From: Omar Sandoval
struct blk_issue_stat is going away, and bio->bi_issue_stat doesn't even
use the blk-stats interface, so we can provide a separate implementation
private to blk-throtl.
Signed-off-by: Omar Sandoval
---
block/blk-throttle.c | 50 ++-
From: Omar Sandoval
Currently, struct request has four timestamp fields:
- start_time (jiffies), marked at get_request() time and used for
iostats
- issue_stat (ktime nanoseconds, with some bits shared with wbt and
io.low), marked at start_request() time and used for accounting the
time sp
Hi Christian,
On 5/2/2018 5:51 AM, Christian König wrote:
it would be rather nice to have if you could separate out the functions
to detect if peer2peer is possible between two devices.
This would essentially be pci_p2pdma_distance() in the existing
patchset. It returns the sum of the distanc
When CACHE_SET_IO_DISABLE is set on cache set flags, bcache allocator
thread routine bch_allocator_thread() may stop the while-loops and
exit. Then it is possible to observe the following kernel oops message,
[ 631.068366] bcache: bch_btree_insert() error -5
[ 631.069115] bcache: cached_dev_deta
It is possible that multiple I/O requests hits on failed cache device or
backing device, therefore it is quite common that CACHE_SET_IO_DISABLE is
set already when a task tries to set the bit from bch_cache_set_error().
Currently the message "CACHE_SET_IO_DISABLE already set" is printed by
pr_warn(
Commit 7e027ca4b534b ("bcache: add stop_when_cache_set_failed option to
backing device") adds stop_when_cache_set_failed option and stops bcache
device if stop_when_cache_set_failed is auto and there is dirty data on
broken cache device. There might exists a small time gap that the cache
set is rel
Commit c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev") tries
to stop bcache device by calling bcache_device_stop() when too many I/O
errors happened on backing device. But if there is internal I/O happening
on cache device (writeback scan, garbage collection, etc), a regular I/O
reque
Current code uses bdevname() or bio_devname() to reference gendisk
disk name when bcache needs to display the disk names in kernel message.
It was safe before bcache device failure handling patch set merged in,
because when devices are failed, there was deadlock to prevent bcache
printing error mes
Commit c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev")
counts backing device I/O requets and set dc->io_disable to true if error
counters exceeds dc->io_error_limit. But it only counts I/O errors for
regular I/O request, neglects errors of write back I/Os when backing device
is offlin
Hi Jens,
I receive bug reports from partners for the bcache cache device failure
handling patch set (which is just merged into 4.17-rc1). Fortunately we
are still in 4.17 merge window, I suggest to have these fixes to go into
4.17 merge window too.
These patches have non peer reviewer so far, Tan
On 2018/5/2 10:46 PM, Coly Li wrote:
> It is possible that multiple I/O requests hits on failed cache device or
> backing device, therefore it is quite common that CACHE_SET_IO_DISABLE is
> set already when a task tries to set the bit from bch_cached_dev_error()
> or bch_cache_set_error(). Currentl
It is possible that multiple I/O requests hits on failed cache device or
backing device, therefore it is quite common that CACHE_SET_IO_DISABLE is
set already when a task tries to set the bit from bch_cached_dev_error()
or bch_cache_set_error(). Currently the message "CACHE_SET_IO_DISABLE
already s
Commit 7e027ca4b534b ("bcache: add stop_when_cache_set_failed option to
backing device") adds stop_when_cache_set_failed option and stops bcache
device if stop_when_cache_set_failed is auto and there is dirty data on
broken cache device. There might exists a small time gap that the cache
set is rel
When CACHE_SET_IO_DISABLE is set on cache set flags, bcache allocator
thread routine bch_allocator_thread() may stop the while-loops and
exit. Then it is possible to observe the following kernel oops message,
[ 631.068366] bcache: bch_btree_insert() error -5
[ 631.069115] bcache: cached_dev_deta
Commit c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev")
counts backing device I/O requets and set dc->io_disable to true if error
counters exceeds dc->io_error_limit. But it only counts I/O errors for
regular I/O request, neglects errors of write back I/Os when backing device
is offlin
Current code uses bdevname() or bio_devname() to reference gendisk
disk name when bcache needs to display the disk names in kernel message.
It was safe before bcache device failure handling patch set merged in,
because when devices are failed, there was deadlock to prevent bcache
printing error mes
Commit c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev") tries
to stop bcache device by calling bcache_device_stop() when too many I/O
errors happened on backing device. But if there is internal I/O happening
on cache device (writeback scan, garbage collection, etc), a regular I/O
reque
Hi Jens,
I receive bug reports from partners for the bcache cache device failure
handling patch set (which is just merged into 4.17-rc1). Fortunately we
are still in 4.17 merge window, I suggest to have these fixes to go into
4.17 merge window too.
These patches have non peer reviewer so far, Tan
Thanks for your continued effort Dave.
In the current implementation the first write to the location updates the
metadata and must issue the flush. In Windows SQL Server can avoid this
behavior. SQL Server can issue DeviceIoControl with SET_FILE_VALID_DATA and
then SetEndOfFile. The SetEnd
On 5/1/18 8:54 PM, Martin K. Petersen wrote:
>
> Jens,
>
>> diff --git a/block/blk-wbt.h b/block/blk-wbt.h
>> index a232c98fbf4d..aec5bc82d580 100644
>> --- a/block/blk-wbt.h
>> +++ b/block/blk-wbt.h
>> @@ -14,12 +14,17 @@ enum wbt_flags {
>> WBT_TRACKED = 1,/* write, tracked
On 5/2/18 6:45 AM, Christoph Hellwig wrote:
> On Mon, Apr 30, 2018 at 09:32:50AM -0600, Jens Axboe wrote:
>> We recently had a pretty severe perf regression with the XFS async
>> discard. This small series add a SYNC issue discard flag, and also
>> limits the chain size for sync discards. Patch 2 a
On Mon, Apr 30, 2018 at 09:32:50AM -0600, Jens Axboe wrote:
> We recently had a pretty severe perf regression with the XFS async
> discard. This small series add a SYNC issue discard flag, and also
> limits the chain size for sync discards. Patch 2 adds support for
> reverting XFS back to doign syn
Hi Logan,
it would be rather nice to have if you could separate out the functions
to detect if peer2peer is possible between two devices.
That would allow me to reuse the same logic for GPU peer2peer where I
don't really have ZONE_DEVICE.
Regards,
Christian.
Am 24.04.2018 um 01:30 schrieb
On Wed, May 2, 2018 at 9:33 AM, syzbot
wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:fff75eb2a08c Merge tag 'errseq-v4.17' of
> git://git.kernel.o...
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?id=5301511529693184
> kernel con
> > I have created internal code changes based on below RFC and using irq
> > poll CPU lockup issue is resolved.
> > https://www.spinics.net/lists/linux-scsi/msg116668.html
>
> Could we use the 1:1 mapping and not apply out-of-tree irq poll in the
> following test? So that we can keep at same page
On Wed, May 02, 2018 at 03:32:53PM +0530, Kashyap Desai wrote:
> > -Original Message-
> > From: Ming Lei [mailto:ming@redhat.com]
> > Sent: Wednesday, May 2, 2018 3:17 PM
> > To: Kashyap Desai
> > Cc: linux-s...@vger.kernel.org; linux-block@vger.kernel.org
> > Subject: Re: Performance d
> -Original Message-
> From: Ming Lei [mailto:ming@redhat.com]
> Sent: Wednesday, May 2, 2018 3:17 PM
> To: Kashyap Desai
> Cc: linux-s...@vger.kernel.org; linux-block@vger.kernel.org
> Subject: Re: Performance drop due to "blk-mq-sched: improve sequential
I/O
> performance"
>
> On Wed,
On Wed, May 02, 2018 at 01:13:34PM +0530, Kashyap Desai wrote:
> Hi Ming,
>
> I was running some performance test on latest 4.17-rc and figure out
> performance drop (approximate 15% drop) due to below patch set.
> https://marc.info/?l=linux-block&m=150802309522847&w=2
>
> I observed drop on late
Hi Ming,
I was running some performance test on latest 4.17-rc and figure out
performance drop (approximate 15% drop) due to below patch set.
https://marc.info/?l=linux-block&m=150802309522847&w=2
I observed drop on latest 4.16.6 stable and 4.17-rc kernel as well. Taking
bisect approach, figure o
Hello,
syzbot found the following crash on:
HEAD commit:fff75eb2a08c Merge tag 'errseq-v4.17' of
git://git.kernel.o...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?id=5301511529693184
kernel config:
https://syzkaller.appspot.com/x/.config?id=64935577
On Wed, May 02, 2018 at 01:12:57PM +0800, jianchao.wang wrote:
> Hi Ming
>
> On 05/02/2018 12:54 PM, Ming Lei wrote:
> >> We need to return BLK_EH_RESET_TIMER in nvme_timeout then:
> >> 1. defer the completion. we can't unmap the io request before close the
> >> controller totally, so not BLK_EH_
61 matches
Mail list logo