Re: [PATCH 1/4] blk-mq: introduce BLK_MQ_F_SCHED_USE_HW_TAG

2017-05-10 Thread Ming Lei
Hi Jens, On Thu, May 04, 2017 at 08:06:15AM -0600, Jens Axboe wrote: ... > > No we do not. 256 is a LOT. I realize most of the devices expose 64K * > num_hw_queues of depth. Expecting to utilize all that is insane. > Internally, these devices have nowhere near that amount of parallelism. > Hence

Re: [linux-next][bock] [bisected c20cfc27a] WARNING: CPU: 22 PID: 0 at block/blk-core.c:2655 .blk_update_request+0x4f8/0x500

2017-05-10 Thread Christoph Hellwig
On Tue, May 09, 2017 at 08:48:21PM +0530, Abdul Haleem wrote: > A bisection for the above suspects resulted a bad commit; > > c20cfc27a47307e811346f85959cf3cc07ae42f9 is the first bad commit > commit c20cfc27a47307e811346f85959cf3cc07ae42f9 > Author: Christoph Hellwig > Date: Wed Apr 5 19:21:07

[PATCH 0/5] mmc: core: modernize ioctl() requests

2017-05-10 Thread Linus Walleij
This is a series that starts to untangle the MMC "big host lock", i.e. what is taken by issueing mmc_claim_host() which usually happens through mmc_get_card(). The host lock is standing in the way of a bunch of modernizations, because the block layer interface takes this lock when a new request ar

[PATCH 1/5] mmc: core: Delete bounce buffer Kconfig option

2017-05-10 Thread Linus Walleij
This option is activated by all multiplatform configs and what not so we almost always have it turned on, and the memory it saves is negligible, even more so moving forward. The actual bounce buffer only gets allocated only when used, the only thing the ifdefs are saving is a little bit of code. I

[PATCH 5/5] mmc: block: move multi-ioctl() to use block layer

2017-05-10 Thread Linus Walleij
This switches also the multiple-command ioctl() call to issue all ioctl()s through the block layer instead of going directly to the device. We extend the passed argument with an argument count and loop over all passed commands in the ioctl() issue function called from the block layer. By doing th

[PATCH 3/5] mmc: block: Tag is_rpmb as bool

2017-05-10 Thread Linus Walleij
The variable is_rpmb is clearly a bool and even assigned true and false, yet declared as an int. Signed-off-by: Linus Walleij --- drivers/mmc/core/block.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index be782b8d4a0d..3

[PATCH 4/5] mmc: block: move single ioctl() commands to block requests

2017-05-10 Thread Linus Walleij
This wraps single ioctl() commands into block requests using the custom block layer request types REQ_OP_DRV_IN and REQ_OP_DRV_OUT. By doing this we are loosening the grip on the big host lock, since two calls to mmc_get_card()/mmc_put_card() are removed. We are storing the ioctl() in/out argumen

[PATCH 2/5] mmc: core: Allocate per-request data using the block layer core

2017-05-10 Thread Linus Walleij
The mmc_queue_req is a per-request state container the MMC core uses to carry bounce buffers, pointers to asynchronous requests and so on. Currently allocated as a static array of objects, then as a request comes in, a mmc_queue_req is assigned to it, and used during the lifetime of the request. T

Re: [linux-next][bock] [bisected c20cfc27a] WARNING: CPU: 22 PID: 0 at block/blk-core.c:2655 .blk_update_request+0x4f8/0x500

2017-05-10 Thread Abdul Haleem
On Wed, 2017-05-10 at 09:56 +0200, Christoph Hellwig wrote: > On Tue, May 09, 2017 at 08:48:21PM +0530, Abdul Haleem wrote: > > A bisection for the above suspects resulted a bad commit; > > > > c20cfc27a47307e811346f85959cf3cc07ae42f9 is the first bad commit > > commit c20cfc27a47307e811346f85959c

Re: [PATCH v4 01/27] fs: remove unneeded forward definition of mm_struct from fs.h

2017-05-10 Thread Jan Kara
On Tue 09-05-17 11:49:04, Jeff Layton wrote: > Signed-off-by: Jeff Layton Looks good. You can add: Reviewed-by: Jan Kara Honza > --- > include/linux/fs.h | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/include/linux/f

Re: [PATCH v4 05/27] btrfs: btrfs_wait_tree_block_writeback can be void return

2017-05-10 Thread Jan Kara
On Tue 09-05-17 11:49:08, Jeff Layton wrote: > Nothing checks its return value. > > Signed-off-by: Jeff Layton Looks good to me. You can add: Reviewed-by: Jan Kara Honza > --- > fs/btrfs/disk-io.c | 6 +++--- > fs/btrfs/disk-io.

Re: [PATCH v4 11/27] fuse: set mapping error in writepage_locked when it fails

2017-05-10 Thread Jan Kara
On Tue 09-05-17 11:49:14, Jeff Layton wrote: > This ensures that we see errors on fsync when writeback fails. > > Signed-off-by: Jeff Layton > Reviewed-by: Christoph Hellwig Looks good to me. You can add: Reviewed-by: Jan Kara H

Re: [PATCH v4 12/27] cifs: set mapping error when page writeback fails in writepage or launder_pages

2017-05-10 Thread Jan Kara
On Tue 09-05-17 11:49:15, Jeff Layton wrote: > Signed-off-by: Jeff Layton > Reviewed-by: Christoph Hellwig Looks good to me. You can add: Reviewed-by: Jan Kara Honza > --- > fs/cifs/file.c | 12 +++- > 1 file changed, 7

Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-10 Thread Jeff Layton
On Wed, 2017-05-10 at 08:03 +1000, NeilBrown wrote: > On Tue, May 09 2017, Jeff Layton wrote: > > > An errseq_t is a way of recording errors in one place, and allowing any > > number of "subscribers" to tell whether an error has been set again > > since a previous time. > > > > It's implemented a

Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-10 Thread Jan Kara
On Tue 09-05-17 11:49:16, Jeff Layton wrote: > An errseq_t is a way of recording errors in one place, and allowing any > number of "subscribers" to tell whether an error has been set again > since a previous time. > > It's implemented as an unsigned 32-bit value that is managed with atomic > opera

Re: [PATCH v4 14/27] fs: new infrastructure for writeback error handling and reporting

2017-05-10 Thread Jan Kara
On Tue 09-05-17 11:49:17, Jeff Layton wrote: > Most filesystems currently use mapping_set_error and > filemap_check_errors for setting and reporting/clearing writeback errors > at the mapping level. filemap_check_errors is indirectly called from > most of the filemap_fdatawait_* functions and from

Re: [PATCH v4 14/27] fs: new infrastructure for writeback error handling and reporting

2017-05-10 Thread Jeff Layton
On Wed, 2017-05-10 at 13:48 +0200, Jan Kara wrote: > On Tue 09-05-17 11:49:17, Jeff Layton wrote: > > Most filesystems currently use mapping_set_error and > > filemap_check_errors for setting and reporting/clearing writeback errors > > at the mapping level. filemap_check_errors is indirectly called

Re: [PATCH v4 06/27] fs: check for writeback errors after syncing out buffers in generic_file_fsync

2017-05-10 Thread Matthew Wilcox
On Tue, May 09, 2017 at 11:49:09AM -0400, Jeff Layton wrote: > ext2 currently does a test+clear of the AS_EIO flag, which is > is problematic for some coming changes. > > What we really need to do instead is call filemap_check_errors > in __generic_file_fsync after syncing out the buffers. That >

[PATCH] scsi: sanity check for timeout in sg_io()

2017-05-10 Thread Hannes Reinecke
sg_io() is using msecs_to_jiffies() to convert a passed in timeout value (in milliseconds) to a jiffies value. However, if the value is too large msecs_to_jiffies() will return MAX_JIFFY_OFFSET, which will be truncated to -2 and cause the timeout to be set to 1.3 _years_. Which is probably too long

Re: [PATCH] scsi: sanity check for timeout in sg_io()

2017-05-10 Thread James Bottomley
On Wed, 2017-05-10 at 15:24 +0200, Hannes Reinecke wrote: > sg_io() is using msecs_to_jiffies() to convert a passed in timeout > value (in milliseconds) to a jiffies value. However, if the value > is too large msecs_to_jiffies() will return MAX_JIFFY_OFFSET, which > will be truncated to -2 and caus

Re: [PATCH v4 14/27] fs: new infrastructure for writeback error handling and reporting

2017-05-10 Thread Jan Kara
On Wed 10-05-17 08:19:50, Jeff Layton wrote: > On Wed, 2017-05-10 at 13:48 +0200, Jan Kara wrote: > > On Tue 09-05-17 11:49:17, Jeff Layton wrote: > > > diff --git a/fs/file_table.c b/fs/file_table.c > > > index 954d510b765a..d6138b6411ff 100644 > > > --- a/fs/file_table.c > > > +++ b/fs/file_table

[PATCH] blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op

2017-05-10 Thread wenxiong
From: Wen Xiong When formatting NVMe to 512B/4K + T10 DIf/DIX, dd with split op returns "Input/output error". Looks block layer split the bio after calling bio_integrity_prep(bio). This patch fixes the issue. Below is how we debug this issue: (1)format nvme to 4K block # size with type 2 DIF (2)

Re: [PATCH] blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op

2017-05-10 Thread Jens Axboe
On 05/10/2017 07:54 AM, wenxi...@linux.vnet.ibm.com wrote: > From: Wen Xiong > > When formatting NVMe to 512B/4K + T10 DIf/DIX, dd with split op returns > "Input/output error". Looks block layer split the bio after calling > bio_integrity_prep(bio). This patch fixes the issue. > > Below is how w

Re: [PATCH 25/27] block: remove the discard_zeroes_data flag

2017-05-10 Thread h...@lst.de
On Mon, May 08, 2017 at 11:46:14PM -0700, Nicholas A. Bellinger wrote: > That said, simply propagating up q->limits.max_write_zeroes_sectors as > dev_attrib->unmap_zeroes_data following existing code still looks like > the right thing to do. It is not. Martin has decoupled write same/zeroes suppo

Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-10 Thread Matthew Wilcox
On Tue, May 09, 2017 at 11:49:16AM -0400, Jeff Layton wrote: > +++ b/lib/errseq.c > @@ -0,0 +1,199 @@ > +#include > +#include > +#include > +#include > + > +/* > + * An errseq_t is a way of recording errors in one place, and allowing any > + * number of "subscribers" to tell whether it has chan

Re: [PATCH] blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op

2017-05-10 Thread Martin K. Petersen
wenxi...@linux.vnet.ibm.com, > When formatting NVMe to 512B/4K + T10 DIf/DIX, dd with split op > returns "Input/output error". Looks block layer split the bio after > calling bio_integrity_prep(bio). This patch fixes the issue. Looks good. Acked-by: Martin K. Petersen -- Martin K. Petersen

Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-10 Thread Jeff Layton
On Wed, 2017-05-10 at 07:18 -0700, Matthew Wilcox wrote: > On Tue, May 09, 2017 at 11:49:16AM -0400, Jeff Layton wrote: > > +++ b/lib/errseq.c > > @@ -0,0 +1,199 @@ > > +#include > > +#include > > +#include > > +#include > > + > > +/* > > + * An errseq_t is a way of recording errors in one plac

[PATCH 5/9] bio-integrity: fold bio_integrity_enabled to bio_integrity_prep

2017-05-10 Thread Dmitry Monakhov
Currently all integrity prep hooks are open-coded, and if prepare fails we ignore it's code and fail bio with EIO. Let's return real error to upper layer, so later caller may react accordingly. In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled, so it is reasonable to fold i

[PATCH 0/9] block: T10/DIF Fixes and cleanups v4

2017-05-10 Thread Dmitry Monakhov
TOC: 1-bio-integrity-Do-not-allocate-integrity-context-for-fsync 2-bio-integrity-bio_trim-should-truncate-integrity-vector 3-bio-integrity-bio_integrity_advance-must-update-interator 4-bio-integrity-fix-interface-for-bio_integrity_trim 5-bio-integrity-fold-bio_integrity_enabled-to-bio_interator 6-T

[PATCH 9/9] bio-integrity: Restore original iterator on verify stage

2017-05-10 Thread Dmitry Monakhov
Currently ->verify_fn not woks at all because at the moment it is called bio->bi_iter.bi_size == 0, so we do not iterate integrity bvecs at all. In order to perform verification we need to know original data vector, with new bvec rewind API this is trivial. testcase: https://github.com/dmonakhov

[PATCH 8/9] bio: add bvec_iter rewind API

2017-05-10 Thread Dmitry Monakhov
Some ->bi_end_io handlers (for example: pi_verify or decrypt handlers) need to know original data vector, but after bio traverse io-stack it may be advanced, splited and relocated many times so it is hard to guess original iterator. Let's add 'bi_done' conter which accounts number of bytes iterator

[PATCH 1/9] bio-integrity: Do not allocate integrity context for bio w/o data

2017-05-10 Thread Dmitry Monakhov
If bio has no data, such as ones from blkdev_issue_flush(), then we have nothing to protect. This patch prevent bugon like follows: kfree_debugcheck: out of range ptr ac1fa1d106742a5ah kernel BUG at mm/slab.c:2773! invalid opcode: [#1] SMP Modules linked in: bcache CPU: 0 PID: 4428 Comm: xfs

[PATCH 7/9] Guard bvec iteration logic

2017-05-10 Thread Dmitry Monakhov
Currently if some one try to advance bvec beyond it's size we simply dump WARN_ONCE and continue to iterate beyond bvec array boundaries. This simply means that we endup dereferencing/corrupting random memory region. Sane reaction would be to propagate error back to calling context But bvec_iter_a

[PATCH 2/9] bio-integrity: bio_trim should truncate integrity vector accordingly

2017-05-10 Thread Dmitry Monakhov
Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Martin K. Petersen Signed-off-by: Dmitry Monakhov --- block/bio.c | 4 1 file changed, 4 insertions(+) diff --git a/block/bio.c b/block/bio.c index 888e780..2f01f1b 100644 --- a/block/bio.c +++ b/block/bio.c @@ -188

[PATCH 3/9] bio-integrity: bio_integrity_advance must update integrity seed

2017-05-10 Thread Dmitry Monakhov
SCSI drivers do care about bip_seed so we must update it accordingly. Reviewed-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Signed-off-by: Dmitry Monakhov --- block/bio-integrity.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index b50

[PATCH 6/9] T10: Move opencoded contants to common header

2017-05-10 Thread Dmitry Monakhov
Signed-off-by: Dmitry Monakhov --- block/t10-pi.c | 9 +++-- drivers/scsi/lpfc/lpfc_scsi.c| 5 +++-- drivers/scsi/qla2xxx/qla_isr.c | 8 drivers/target/target_core_sbc.c | 2 +- include/linux/t10-pi.h | 2 ++ 5 files changed, 13 insertions(+), 13 del

[PATCH 4/9] bio-integrity: fix interface for bio_integrity_trim

2017-05-10 Thread Dmitry Monakhov
bio_integrity_trim inherent it's interface from bio_trim and accept offset and size, but this API is error prone because data offset must always be insync with bio's data offset. That is why we have integrity update hook in bio_advance() So only meaningful values are: offset == 0, sectors == bio_s

Re: [PATCH 0/9] block: T10/DIF Fixes and cleanups v3

2017-05-10 Thread Dmitry Monakhov
Christoph Hellwig writes: > Hi Dmitry, > > can you resend this series? Sorry for a very long delay, I'm in the middle of honeymoon and this is not a good time for a work :) > I really think we should get this into 4.12 at least. Please see updated version in the LKML list.

[PATCH] elevator: eliminate unused result build warning

2017-05-10 Thread Firo Yang
Gcc complains about ignoring return value of ‘strstrip’; Fix it by just using the strstrip() as the function parameter. Signed-off-by: Firo Yang --- block/elevator.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/block/elevator.c b/block/elevator.c index fda6be9..dd0ed19 1

Re: [PATCH] elevator: eliminate unused result build warning

2017-05-10 Thread Jens Axboe
On 05/10/2017 09:29 AM, Firo Yang wrote: > Gcc complains about ignoring return value of ‘strstrip’; Fix it by > just using the strstrip() as the function parameter. This has already been fixed up. -- Jens Axboe

[PATCH v3 0/4] blk-mq: support to use hw tag for scheduling

2017-05-10 Thread Ming Lei
Hi, This patchset introduces flag of BLK_MQ_F_SCHED_USE_HW_TAG and allows to use hardware tag directly for IO scheduling if the queue's depth is big enough. In this way, we can avoid to allocate extra tags and request pool for IO schedule, and the schedule tag allocation/release can be saved in I/

[PATCH v3 1/4] blk-mq: introduce BLK_MQ_F_SCHED_USE_HW_TAG

2017-05-10 Thread Ming Lei
When blk-mq I/O scheduler is used, we need two tags for submitting one request. One is called scheduler tag for allocating request and scheduling I/O, another one is called driver tag, which is used for dispatching IO to hardware/driver. This way introduces one extra per-queue allocation for both t

[PATCH v3 2/4] blk-mq: introduce blk_mq_get_queue_depth()

2017-05-10 Thread Ming Lei
The hardware queue depth can be resized via blk_mq_update_nr_requests(), so introduce this helper for retrieving queue's depth easily. Reviewed-by: Omar Sandoval Signed-off-by: Ming Lei --- block/blk-mq.c | 15 +++ block/blk-mq.h | 1 + 2 files changed, 16 insertions(+) diff --git

[PATCH v3 3/4] blk-mq: use hw tag for scheduling if hw tag space is big enough

2017-05-10 Thread Ming Lei
When tag space of one device is big enough, we use hw tag directly for I/O scheduling. Now the decision is made if hw queue depth is not less than q->nr_requests and the tag set isn't shared. Signed-off-by: Ming Lei --- block/blk-mq-sched.c | 80 +--

[PATCH v3 4/4] blk-mq: allow to use hw tag for shared tags

2017-05-10 Thread Ming Lei
In case of shared tags, hctx_may_queue() limits that the maximum number of requests allocated to one hw queue is .queue_depth / active_queues. So we try to allow to use hw tag for this case if .queue_depth/shared_queues is not less than q->nr_requests. This can cover some scsi devices too, such a

Re: [PATCH v4 23/27] gfs2: clean up some filemap_* calls

2017-05-10 Thread Bob Peterson
- Original Message - | In some places, it's trying to reset the mapping error after calling | filemap_fdatawait. That's no longer required. Also, turn several | filemap_fdatawrite+filemap_fdatawait calls into filemap_write_and_wait. | That will at least return writeback errors that occur du

Re: [linux-next][bock] [bisected c20cfc27a] WARNING: CPU: 22 PID: 0 at block/blk-core.c:2655 .blk_update_request+0x4f8/0x500

2017-05-10 Thread Christoph Hellwig
Hi Abdul, can you test the patch below? I'll try to create a way to inject short WRITE SAME commands using qemu next, but I thought I'd give you a chance to try it as well. --- diff --git a/block/blk-core.c b/block/blk-core.c index c580b0138a7f..c7068520794b 100644 --- a/block/blk-core.c +++ b/b

Re: [PATCH 3/9] bio-integrity: bio_integrity_advance must update integrity seed

2017-05-10 Thread Martin K. Petersen
Dmitry, > SCSI drivers do care about bip_seed so we must update it accordingly. Reviewed-by: Martin K. Petersen -- Martin K. Petersen Oracle Linux Engineering

Re: [PATCH 5/9] bio-integrity: fold bio_integrity_enabled to bio_integrity_prep

2017-05-10 Thread Martin K. Petersen
Dmitry, > Currently all integrity prep hooks are open-coded, and if prepare > fails we ignore it's code and fail bio with EIO. Let's return real > error to upper layer, so later caller may react accordingly. > > In fact no one want to use bio_integrity_prep() w/o > bio_integrity_enabled, so it is

Re: [PATCH 6/9] T10: Move opencoded contants to common header

2017-05-10 Thread Martin K. Petersen
Dmitry, Reviewed-by: Martin K. Petersen -- Martin K. Petersen Oracle Linux Engineering

Re: [PATCH 7/9] Guard bvec iteration logic

2017-05-10 Thread Martin K. Petersen
Dmitry, > Currently if some one try to advance bvec beyond it's size we simply > dump WARN_ONCE and continue to iterate beyond bvec array boundaries. > This simply means that we endup dereferencing/corrupting random memory > region. > > Sane reaction would be to propagate error back to calling co

Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.

2017-05-10 Thread NeilBrown
On Tue, May 02 2017, NeilBrown wrote: > This is a revision of my series of patches working > towards removing the bioset work queues. Hi Jens, could I get some feed-back about your thoughts on this series? Will you apply it? When? Do I need to resend anything? Would you like a git-pull reque

Re: [PATCH 25/27] block: remove the discard_zeroes_data flag

2017-05-10 Thread Nicholas A. Bellinger
On Wed, 2017-05-10 at 16:06 +0200, h...@lst.de wrote: > On Mon, May 08, 2017 at 11:46:14PM -0700, Nicholas A. Bellinger wrote: > > That said, simply propagating up q->limits.max_write_zeroes_sectors as > > dev_attrib->unmap_zeroes_data following existing code still looks like > > the right thing to

Re: [PATCH 25/27] block: remove the discard_zeroes_data flag

2017-05-10 Thread h...@lst.de
On Wed, May 10, 2017 at 09:50:35PM -0700, Nicholas A. Bellinger wrote: > 1) Expose a block_device or request_queue bit to signal 'real LBPRZ' > support up to IBLOCK, in order to maintain SCSI target feature > compatibility. No way. If you want to zero use REQ_OP_WRITE_ZEROES..

Re: [PATCH 25/27] block: remove the discard_zeroes_data flag

2017-05-10 Thread Nicholas A. Bellinger
On Thu, 2017-05-11 at 08:26 +0200, h...@lst.de wrote: > On Wed, May 10, 2017 at 09:50:35PM -0700, Nicholas A. Bellinger wrote: > > 1) Expose a block_device or request_queue bit to signal 'real LBPRZ' > > support up to IBLOCK, in order to maintain SCSI target feature > > compatibility. > > No way.