On Fri, Jan 12, 2018 at 04:21:33PM +0100, Paolo Valente wrote:
>
>
> > Il giorno 12 gen 2018, alle ore 09:29, Paolo Valente
> > ha scritto:
> >
> >
> >
> >> Il giorno 12 gen 2018, alle ore 05:18, Ming Lei ha
> >> scritto:
> >>
> >> On Thu, Jan 11, 2018 at 08:40:54AM -0700, Jens Axboe wrot
Dear,
Sorry to trouble you.
I'm confused by the following code in bio.c/bio_split:
-
struct bio *bio_split(struct bio *bio, int sectors,
...
if (bio_flagged(bio, BIO_TRACE_COMPLETION))
bio_set_flag(bio, BIO_TRACE_COMPLETION);
..
Hi Jens,
I prepared this pull request in the hope that it may help you review and
stage these changes for 4.16.
I went over Ming's changes again to refine the headers and code comments
for clarity to help ease review and inclussion.
I've done extensive testing of the changes in this pull request
On 1/14/18 10:43 AM, Mike Snitzer wrote:
> On Sun, Jan 14 2018 at 12:38pm -0500,
> Jens Axboe wrote:
>
>> On 1/13/18 10:49 AM, Ming Lei wrote:
>>> In case of no IO scheduler, RQF_MQ_INFLIGHT is set in blk_mq_rq_ctx_init(),
>>> but 7c3fb70f0341 clears it mistakenly, so fix it.
>>
>> Oops yeah, tha
Hi Linus,
Just a single fix for nvme over fabrics that should go into 4.15.
Please pull!
git://git.kernel.dk/linux-block.git for-linus
Ewan D. Milne (1):
nvme-fabrics: initialize default host->id in nvmf_host_default()
J
On Sun, Jan 14 2018 at 12:38pm -0500,
Jens Axboe wrote:
> On 1/13/18 10:49 AM, Ming Lei wrote:
> > In case of no IO scheduler, RQF_MQ_INFLIGHT is set in blk_mq_rq_ctx_init(),
> > but 7c3fb70f0341 clears it mistakenly, so fix it.
>
> Oops yeah, that's my bad. However, I think the below fix is cle
On 1/13/18 10:49 AM, Ming Lei wrote:
> In case of no IO scheduler, RQF_MQ_INFLIGHT is set in blk_mq_rq_ctx_init(),
> but 7c3fb70f0341 clears it mistakenly, so fix it.
Oops yeah, that's my bad. However, I think the below fix is cleaner
and avoids a conditional.
diff --git a/block/blk-mq.c b/block
On Sun, Jan 14, 2018 at 11:12 PM, jianchao.wang
wrote:
>
>
> On 01/13/2018 05:19 AM, Bart Van Assche wrote:
>> Sorry but I only retrieved the blk-mq debugfs several minutes after the hang
>> started so I'm not sure the state information is relevant. Anyway, I have
>> attached
>> it to this e-mail
On 01/13/2018 05:19 AM, Bart Van Assche wrote:
> Sorry but I only retrieved the blk-mq debugfs several minutes after the hang
> started so I'm not sure the state information is relevant. Anyway, I have
> attached
> it to this e-mail. The most remarkable part is the following:
>
> ./9ddf
Currently bcache does not handle backing device failure, if backing
device is offline and disconnected from system, its bcache device can still
be accessible. If the bcache device is in writeback mode, I/O requests even
can success if the requests hit on cache device. That is to say, when and
how b
If a bcache device is configured to writeback mode, current code does not
handle write I/O errors on backing devices properly.
In writeback mode, write request is written to cache device, and
latter being flushed to backing device. If I/O failed when writing from
cache device to the backing device
In order to catch I/O error of backing device, a separate bi_end_io
call back is required. Then a per backing device counter can record I/O
errors number and retire the backing device if the counter reaches a
per backing device I/O error limit.
This patch adds backing_request_endio() to bcache bac
Struct cache uses io_errors for two purposes,
- Error decay: when cache set error_decay is set, io_errors is used to
generate a small piece of delay when I/O error happens.
- I/O errors counter: in order to generate big enough value for error
decay, I/O errors counter value is stored by left sh
From: Tang Junhui
When we run IO in a detached device, and run iostat to shows IO status,
normally it will show like bellow (Omitted some fields):
Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util
sdd... 15.89 0.531.820.202.23 1.81 52.30
bcache0..
When there are too many I/O errors on cache device, current bcache code
will retire the whole cache set, and detach all bcache devices. But the
detached bcache devices are not stopped, which is problematic when bcache
is in writeback mode.
If the retired cache set has dirty data of backing devices
When too many I/Os failed on cache device, bch_cache_set_error() is called
in the error handling code path to retire whole problematic cache set. If
new I/O requests continue to come and take refcount dc->count, the cache
set won't be retired immediately, this is a problem.
Further more, there are
struct delayed_work writeback_rate_update in struct cache_dev is a delayed
worker to call function update_writeback_rate() in period (the interval is
defined by dc->writeback_rate_update_seconds).
When a metadate I/O error happens on cache device, bcache error handling
routine bch_cache_set_error(
In patch "bcache: fix cached_dev->count usage for bch_cache_set_error()",
cached_dev_get() is called when creating dc->writeback_thread, and
cached_dev_put() is called when exiting dc->writeback_thread. This
modification works well unless people detach the bcache device manually by
'echo 1 > /s
When bcache metadata I/O fails, bcache will call bch_cache_set_error()
to retire the whole cache set. The expected behavior to retire a cache
set is to unregister the cache set, and unregister all backing device
attached to this cache set, then remove sysfs entries of the cache set
and all attached
Kernel thread routine bch_writeback_thread() has the following code block,
447 down_write(&dc->writeback_lock);
448~450 if (check conditions) {
451 up_write(&dc->writeback_lock);
452 set_current_state(TASK_INTERRUPTIBLE);
453
454 if (kthr
dc->writeback_rate_update_seconds can be set via sysfs and its value can
be set to [1, ULONG_MAX]. It does not make sense to set such a large
value, 60 seconds is long enough value considering the default 5 seconds
works well for long time.
Because dc->writeback_rate_update is a special delayed w
Kernel thread routine bch_allocator_thread() references macro
allocator_wait() to wait for a condition or quit to do_exit()
when kthread_should_stop() is true. Here is the code block,
284 while (1) { \
285 set_current_state(
Hi maintainers and folks,
This patch set tries to improve bcache device failure handling, includes
cache device and backing device failures.
The basic idea to handle failed cache device is,
- Unregister cache set
- Detach all backing devices which are attached to this cache set
- Stop all the det
Reviewed-by: Sagi Grimberg
Reviewed-by: Sagi Grimberg
25 matches
Mail list logo