On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Currently ->verify_fn not woks at all because at the moment it is called
> bio->bi_iter.bi_size == 0, so we do not iterate integrity bvecs at all.
>
> In order to perform verification we need to know original data vector,
> with new bvec rewind API
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Some ->bi_end_io handlers (for example: pi_verify or decrypt handlers)
> need to know original data vector, but after bio traverse io-stack it may
> be advanced, splited and relocated many times so it is hard to guess
> original iterator. Let's add '
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Currently if some one try to advance bvec beyond it's size we simply
> dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
> This simply means that we endup dereferencing/corrupting random memory
> region.
>
> Sane reaction would be
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Signed-off-by: Dmitry Monakhov
> ---
> block/t10-pi.c | 9 +++--
> drivers/scsi/lpfc/lpfc_scsi.c| 5 +++--
> drivers/scsi/qla2xxx/qla_isr.c | 8
> drivers/target/target_core_sbc.c | 2 +-
> include/linux/t10-pi.
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Currently all integrity prep hooks are open-coded, and if prepare fails
> we ignore it's code and fail bio with EIO. Let's return real error to
> upper layer, so later caller may react accordingly.
>
> In fact no one want to use bio_integrity_prep()
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> bio_integrity_trim inherent it's interface from bio_trim and accept
> offset and size, but this API is error prone because data offset
> must always be insync with bio's data offset. That is why we have
> integrity update hook in bio_advance()
>
> S
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> SCSI drivers do care about bip_seed so we must update it accordingly.
>
> Signed-off-by: Dmitry Monakhov
> ---
> block/bio-integrity.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> Reviewed-by: Christoph Hellwig
> Signed-off-by: Dmitry Monakhov
> ---
> block/bio.c | 4
> 1 file changed, 4 insertions(+)
>
> diff --git a/block/bio.c b/block/bio.c
> index e75878f..fa84323 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @
On 04/04/2017 08:56 PM, Dmitry Monakhov wrote:
> If bio has no data, such as ones from blkdev_issue_flush(),
> then we have nothing to protect.
>
> This patch prevent bugon like follows:
>
> kfree_debugcheck: out of range ptr ac1fa1d106742a5ah
> kernel BUG at mm/slab.c:2773!
> invalid opcode: 000
On Tue, Apr 04, 2017 at 03:56:34PM +, Bart Van Assche wrote:
> > This looks like generic block layer code, why is it in SCSI?
>
> Hello Christoph,
>
> That's an excellent question. I assume that you are fine with moving
> this code to the block layer?
Yes. In fact I wonder if we need the bl
On Wed, Apr 5, 2017 at 12:27 PM, NeilBrown wrote:
> On Tue, Apr 04 2017, Christoph Hellwig wrote:
>
>> Looks fine,
>>
>> Reviewed-by: Christoph Hellwig
>>
>> But if you actually care about performance in any way I'd suggest
>> to use the loop device in direct I/O mode..
>
> The losetup on my test
On Wed, Apr 5, 2017 at 12:33 PM, NeilBrown wrote:
>
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file. This double-throttling can trigger
When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file. This double-throttling can trigger
positive feedback loops that create significant delays. The
On Tue, Apr 04 2017, Ming Lei wrote:
> On Mon, Apr 3, 2017 at 9:18 AM, NeilBrown wrote:
>>
>> When a filesystem is mounted from a loop device, writes are
>> throttled by balance_dirty_pages() twice: once when writing
>> to the filesystem and once when the loop_handle_cmd() writes
>> to the backin
On Tue, Apr 04 2017, Christoph Hellwig wrote:
> Looks fine,
>
> Reviewed-by: Christoph Hellwig
>
> But if you actually care about performance in any way I'd suggest
> to use the loop device in direct I/O mode..
The losetup on my test VM is too old to support that :-(
I guess it might be time to
On Tue, Apr 04 2017, Michael Wang wrote:
> On 04/04/2017 11:37 AM, NeilBrown wrote:
>> On Tue, Apr 04 2017, Michael Wang wrote:
> [snip]
If sync_request_write() is using a bio that has already been used, it
should call bio_reset() and fill in the details again.
However I don't
On 04/04/2017 09:25 AM, adam.manzana...@wdc.com wrote:
> From: Adam Manzanares
>
> In 4.10 I introduced a patch that associates the ioc priority with
> each request in the block layer. This work was done in the single queue
> block layer code. This patch unifies ioc priority to request mapping ac
If bio has no data, such as ones from blkdev_issue_flush(),
then we have nothing to protect.
This patch prevent bugon like follows:
kfree_debugcheck: out of range ptr ac1fa1d106742a5ah
kernel BUG at mm/slab.c:2773!
invalid opcode: [#1] SMP
Modules linked in: bcache
CPU: 0 PID: 4428 Comm: xfs
SCSI drivers do care about bip_seed so we must update it accordingly.
Signed-off-by: Dmitry Monakhov
---
block/bio-integrity.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index b5009a8..82a6ffb 100644
--- a/block/bio-integrity.c
+++ b/block/b
Reviewed-by: Christoph Hellwig
Signed-off-by: Dmitry Monakhov
---
block/bio.c | 4
1 file changed, 4 insertions(+)
diff --git a/block/bio.c b/block/bio.c
index e75878f..fa84323 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1907,6 +1907,10 @@ void bio_trim(struct bio *bio, int offset, int
bio_integrity_trim inherent it's interface from bio_trim and accept
offset and size, but this API is error prone because data offset
must always be insync with bio's data offset. That is why we have
integrity update hook in bio_advance()
So only meaningful values are: offset == 0, sectors == bio_s
Currently if some one try to advance bvec beyond it's size we simply
dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
This simply means that we endup dereferencing/corrupting random memory
region.
Sane reaction would be to propagate error back to calling context
But bvec_iter_a
Some ->bi_end_io handlers (for example: pi_verify or decrypt handlers)
need to know original data vector, but after bio traverse io-stack it may
be advanced, splited and relocated many times so it is hard to guess
original iterator. Let's add 'bi_done' conter which accounts number
of bytes iterator
Currently all integrity prep hooks are open-coded, and if prepare fails
we ignore it's code and fail bio with EIO. Let's return real error to
upper layer, so later caller may react accordingly.
In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
so it is reasonable to fold i
Signed-off-by: Dmitry Monakhov
---
block/t10-pi.c | 9 +++--
drivers/scsi/lpfc/lpfc_scsi.c| 5 +++--
drivers/scsi/qla2xxx/qla_isr.c | 8
drivers/target/target_core_sbc.c | 2 +-
include/linux/t10-pi.h | 2 ++
5 files changed, 13 insertions(+), 13 del
Currently ->verify_fn not woks at all because at the moment it is called
bio->bi_iter.bi_size == 0, so we do not iterate integrity bvecs at all.
In order to perform verification we need to know original data vector,
with new bvec rewind API this is trivial.
testcase:
https://github.com/dmonakhov
This patch set fix various problems spotted during T10/DIF integrity machinery
testing.
TOC:
## Fix various bugs in T10/DIF/DIX infrastructure
0001-bio-integrity-Do-not-allocate-integrity-context-for
0002-bio-integrity-bio_trim-should-truncate-integrity-vec
0003-bio-integrity-bio_integrity_advanc
On 04/04/2017 03:41 AM, Christoph Hellwig wrote:
> On Tue, Apr 04, 2017 at 09:58:53AM +0200, Jan Kara wrote:
>> FS_NOWAIT looks a bit too generic given these are filesystem feature flags.
>> Can we call it FS_NOWAIT_IO?
>
> It's way to generic as it's a feature of the particular file_operations
Hello,
Am Donnerstag, 23. März 2017, 01:36:52 BRT schrieb Jan Kara:
> this is a series with the remaining patches (on top of 4.11-rc2) to fix
> several different races and issues I've found when testing device shutdown
> and reuse. The first patch fixes possible (theoretical) problems when
> openi
On 04/04/2017 05:59 PM, Ming Lei wrote:
On Tue, Apr 4, 2017 at 8:07 PM, Hannes Reinecke wrote:
Hi all,
as discussed recently most existing HBAs have a host-wide tagset which
does not map easily onto the per-queue tagset model of block mq.
This patchset implements a flag BLK_MQ_F_GLOBAL_TAGS fo
> On Apr 3, 2017, at 5:42 PM, Omar Sandoval wrote:
>
> From: Omar Sandoval
>
> Hi, Jens,
>
> This series has some fixes and enhancements for blk-mq:
>
> - Patch 1 is a cleanup in preparation for the rest of the series
> - Patch 2 is a fix necessary for patch 4 when scheduling is enabled,
>
On 04/04/2017 05:32 PM, Omar Sandoval wrote:
On Tue, Apr 04, 2017 at 02:07:43PM +0200, Hannes Reinecke wrote:
Hi all,
as discussed recently most existing HBAs have a host-wide tagset which
does not map easily onto the per-queue tagset model of block mq.
This patchset implements a flag BLK_MQ_F_
From: Omar Sandoval
Currently, this callback is called right after put_request() and has no
distinguishable purpose. Instead, let's call it before put_request() as
soon as I/O has completed on the request, before we account it in
blk-stat. With this, Kyber can enable stats when it sees a latency
From: Omar Sandoval
The Kyber I/O scheduler is an I/O scheduler for fast devices designed to
scale to multiple queues. Users configure only two knobs, the target
read and synchronous write latencies, and the scheduler tunes itself to
achieve that latency goal.
The implementation is based on "tok
From: Omar Sandoval
This is v2 of Kyber, an I/O scheduler for multiqueue devices combining
several techniques: the scalable bitmap library, the new blk-stats API,
and queue depth throttling similar to blk-wbt. v1 was here [1].
v2 adds a tunable target synchronous write latency. The heuristics ar
From: Omar Sandoval
This is required for schedulers that define their own put_request().
Signed-off-by: Omar Sandoval
---
block/blk-mq.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3dbfed6d0e5b..e75fa7a1a653 100644
--- a/block/blk-mq.c
+++ b/block/
From: Omar Sandoval
This operation supports the use case of limiting the number of bits that
can be allocated for a given operation. Rather than setting aside some
bits at the end of the bitmap, we can set aside bits in each word of the
bitmap. This means we can keep the allocation hints spread o
From: Omar Sandoval
Wire up the sbitmap_get_shallow() operation to the tag code so that a
caller can limit the number of tags available to it.
Signed-off-by: Omar Sandoval
---
block/blk-mq-tag.c | 5 -
block/blk-mq.h | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/
On 04/04/2017 09:00 AM, Ming Lei wrote:
> Just be curious, why is multi hard queue used for this case? Are there
> some real cases in SCSI?
Hello Ming,
Yes, there is a real need for this. Background information is available
in the following e-mail thread: Arun Easi, "scsi-mq - tag# and
can_queue,
On Tue, Apr 4, 2017 at 8:07 PM, Hannes Reinecke wrote:
> Hi all,
>
> as discussed recently most existing HBAs have a host-wide tagset which
> does not map easily onto the per-queue tagset model of block mq.
> This patchset implements a flag BLK_MQ_F_GLOBAL_TAGS for block-mq, which
> enables the us
On 04/03/2017 11:42 PM, Christoph Hellwig wrote:
>> +static void scsi_restart_hctx(struct request_queue *q,
>> + struct blk_mq_hw_ctx *hctx)
>> +{
>> +struct blk_mq_tags *tags = hctx->tags;
>> +struct blk_mq_tag_set *set = q->tag_set;
>> +int i;
>> +
>> +rcu
On Tue, Apr 4, 2017 at 11:19 PM, Dmitry Monakhov wrote:
> Ming Lei writes:
>
>> On Mon, Apr 3, 2017 at 3:23 PM, Dmitry Monakhov wrote:
>>> Currently if some one try to advance bvec beyond it's size we simply
>>> dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
>>> This simply
On 04/03/2017 11:40 PM, Christoph Hellwig wrote:
> On Mon, Apr 03, 2017 at 04:22:25PM -0700, Bart Van Assche wrote:
>> A later patch in this series will namely use RCU to iterate over
>> this list.
>
> It also adds a couple lockdep_assert_held calls, which might be worth
> mentioning.
>
> Otherwi
On Tue, 2017-04-04 at 08:32 -0700, Omar Sandoval wrote:
> blk-mq already supports a shared tagset, and scsi-mq already uses that.
> When we initialize a request queue, we add it to a tagset with
> blk_mq_add_queue_set(), where we automatically mark the tagset as shared
> if there is more than one q
On Tue, Apr 04, 2017 at 02:07:43PM +0200, Hannes Reinecke wrote:
> Hi all,
>
> as discussed recently most existing HBAs have a host-wide tagset which
> does not map easily onto the per-queue tagset model of block mq.
> This patchset implements a flag BLK_MQ_F_GLOBAL_TAGS for block-mq, which
> enab
On Tue, 2017-04-04 at 08:25 -0700, adam.manzana...@wdc.com wrote:
>
Reviewed-by: Bart Van Assche
On Tue, 2017-04-04 at 12:42 +0200, Paolo Valente wrote:
> > Il giorno 31 mar 2017, alle ore 17:31, Bart Van Assche
> > ha scritto:
> >
> > On Fri, 2017-03-31 at 14:47 +0200, Paolo Valente wrote:
> > > + delta_ktime = ktime_get();
> > > + delta_ktime = ktime_sub(delta_ktime, b
From: Adam Manzanares
In 4.10 I introduced a patch that associates the ioc priority with
each request in the block layer. This work was done in the single queue
block layer code. This patch unifies ioc priority to request mapping across
the single/multi queue block layers.
I have tested this pat
Ming Lei writes:
> On Mon, Apr 3, 2017 at 3:23 PM, Dmitry Monakhov wrote:
>> Currently if some one try to advance bvec beyond it's size we simply
>> dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
>> This simply means that we endup dereferencing/corrupting random memory
>> r
On Mon, Apr 3, 2017 at 3:23 PM, Dmitry Monakhov wrote:
> Currently if some one try to advance bvec beyond it's size we simply
> dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
> This simply means that we endup dereferencing/corrupting random memory
> region.
>
> Sane reaction
On Mon, Apr 3, 2017 at 9:18 AM, NeilBrown wrote:
>
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file. This double-throttling can trigger
>
On Tue, Apr 04, 2017 at 10:46:54AM +0300, Max Gurtovoy wrote:
>> +if (set->nr_hw_queues > dev->num_comp_vectors)
>> +goto fallback;
>> +
>> +for (queue = 0; queue < set->nr_hw_queues; queue++) {
>> +mask = ib_get_vector_affinity(dev, first_vec + queue);
>> +
On 04/04/2017 02:24 PM, Michael Wang wrote:
> On 04/04/2017 12:23 PM, Michael Wang wrote:
> [snip]
>>> add something like
>>> if (wbio->bi_next)
>>> printk("bi_next!= NULL i=%d read_disk=%d bi_end_io=%pf\n",
>>> i, r1_bio->read_disk, wbio->bi_end_io);
>>>
>>> that might help narr
Em Sun, 2 Apr 2017 14:34:18 -0600
Jonathan Corbet escreveu:
> On Thu, 30 Mar 2017 17:11:27 -0300
> Mauro Carvalho Chehab wrote:
>
> > This series converts just two documents, adding them to the
> > core-api.rst book. It addresses the errors/warnings that popup
> > after the conversion.
> >
> >
Writeback throttling does not play well with CFQ since that also tries
to throttle async writes. As a result async writeback can get starved in
presence of readers. As an example take a benchmark simulating
postgreSQL database running over a standard rotating SATA drive. There
are 16 processes doin
On 04/04/2017 12:23 PM, Michael Wang wrote:
[snip]
>> add something like
>> if (wbio->bi_next)
>> printk("bi_next!= NULL i=%d read_disk=%d bi_end_io=%pf\n",
>> i, r1_bio->read_disk, wbio->bi_end_io);
>>
>> that might help narrow down what is happening.
>
> Just triggered again in
Christoph Hellwig writes:
> This is a pretty big increase in the bio_integrity_payload size,
> but I guess we can't get around it..
Yes, everybody hate this solution, me too, but I've stated with
other approach and it is appeaded to be very ugly.
My idea was that we have two types of iterator i
Add a host template flag 'host_tagset' to enable the use of a
global tagmap for block-mq.
Signed-off-by: Hannes Reinecke
---
drivers/scsi/scsi_lib.c | 2 ++
include/scsi/scsi_host.h | 5 +
2 files changed, 7 insertions(+)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index
Most legacy HBAs have a tagset per HBA, not per queue. To map
these devices onto block-mq this patch implements a new tagset
flag BLK_MQ_F_GLOBAL_TAGS, which will cause the tag allocator
to use just one tagset for all hardware queues.
Signed-off-by: Hannes Reinecke
---
block/blk-mq-tag.c | 1
Hi all,
as discussed recently most existing HBAs have a host-wide tagset which
does not map easily onto the per-queue tagset model of block mq.
This patchset implements a flag BLK_MQ_F_GLOBAL_TAGS for block-mq, which
enables the use of a shared tagset for all hardware queues.
The second patch adds
On Mon 03-04-17 11:18:51, NeilBrown wrote:
>
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file. This double-throttling can trigger
> posit
> Il giorno 31 mar 2017, alle ore 17:31, Bart Van Assche
> ha scritto:
>
> On Fri, 2017-03-31 at 14:47 +0200, Paolo Valente wrote:
>> -static bool bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue
>> *bfqq,
>> -bool compensate)
>> +static bool bfq_bfq
On 04/04/2017 11:37 AM, NeilBrown wrote:
> On Tue, Apr 04 2017, Michael Wang wrote:
[snip]
>>>
>>> If sync_request_write() is using a bio that has already been used, it
>>> should call bio_reset() and fill in the details again.
>>> However I don't see how that would happen.
>>> Can you give specif
On Tue, Apr 04 2017, Michael Wang wrote:
> Hi, Neil
>
> On 04/03/2017 11:25 PM, NeilBrown wrote:
>> On Mon, Apr 03 2017, Michael Wang wrote:
>>
>>> blk_attempt_plug_merge() try to merge bio into request and chain them
>>> by 'bi_next', while after the bio is done inside request, we forgot to
>>>
On Tue, Apr 04, 2017 at 09:58:53AM +0200, Jan Kara wrote:
> FS_NOWAIT looks a bit too generic given these are filesystem feature flags.
> Can we call it FS_NOWAIT_IO?
It's way to generic as it's a feature of the particular file_operations
instance. But once we switch to using RWF_* we can just th
Hi, Neil
On 04/03/2017 11:25 PM, NeilBrown wrote:
> On Mon, Apr 03 2017, Michael Wang wrote:
>
>> blk_attempt_plug_merge() try to merge bio into request and chain them
>> by 'bi_next', while after the bio is done inside request, we forgot to
>> reset the 'bi_next'.
>>
>> This lead into BUG while
On Mon 03-04-17 13:53:05, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues
>
> Return EAGAIN if any of the following checks fail for direct I/O:
> + i_rwsem is lockable
> + Writing beyond end of file (will trigger allocation)
> + Blocks are not allocated at the write location
Patches seem t
Any feedback is welcome.
Hi Sagi,
the patchset looks good and of course we can add support for more
drivers in the future.
have you run some performance testing with the nvmf initiator ?
Sagi Grimberg (6):
mlx5: convert to generic pci_alloc_irq_vectors
mlx5: move affinity hints assig
diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
new file mode 100644
index ..d402f7c93528
--- /dev/null
+++ b/block/blk-mq-rdma.c
@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2017 Sagi Grimberg.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * unde
Looks fine,
Reviewed-by: Christoph Hellwig
But if you actually care about performance in any way I'd suggest
to use the loop device in direct I/O mode..
> - if ((src->ref_tag == 0x) ||
> - (src->app_tag == 0x)) {
> + if ((src->ref_tag == T10_REF_ESCAPE) ||
> + (src->app_tag == T10_APP_ESCAPE)) {
Please remove the inne
On Mon, Apr 03, 2017 at 11:23:30AM +0400, Dmitry Monakhov wrote:
> Currently all integrity prep hooks are open-coded, and if prepare fails
> we ignore it's code and fail bio with EIO. Let's return real error to
> upper layer, so later caller may react accordingly. For example retry in
> case of ENO
On Mon, Apr 03, 2017 at 11:23:29AM +0400, Dmitry Monakhov wrote:
> bio_integrity_trim inherent it's interface from bio_trim and accept
> offset and size, but this API is error prone because data offset
> must always be insync with bio's data offset. That is why we have
> integrity update hook in bi
Looks good,
Reviewed-by: Christoph Hellwig
This is a pretty big increase in the bio_integrity_payload size,
but I guess we can't get around it..
Reviewed-by: Christoph Hellwig
Looks good,
Reviewed-by: Christoph Hellwig
76 matches
Mail list logo