[PATCH v2] block: add documentation for io_timeout

2018-11-29 Thread Weiping Zhang
add documentation for /sys/block//queue/io_timeout Signed-off-by: Weiping Zhang --- Documentation/ABI/testing/sysfs-block | 9 + Documentation/block/queue-sysfs.txt | 6 ++ 2 files changed, 15 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/te

[PATCH 1/2] ataflop: fix error handling in atari_floppy_init()

2018-11-29 Thread Dan Carpenter
Smatch complains that there is an off by one if the allocation fails in: DMABuffer = atari_stram_alloc(BUFFER_SIZE+512, "ataflop"); In that situation, "i" would be point to one element beyond the end of the unit[] array. There is a second bug because the error handling calls blk_mq_free_

[PATCH 2/2] blk-mq: Add a NULL check in blk_mq_free_map_and_requests()

2018-11-29 Thread Dan Carpenter
I recently found some code which called blk_mq_free_map_and_requests() with a NULL set->tags pointer. I fixed the caller, but it seems like a good idea to add a NULL check here as well. Now we can call: blk_mq_free_tag_set(set); blk_mq_free_tag_set(set); twice in a row and it's

Re: [PATCH v4 5/5] lightnvm: pblk: Support for packed metadata

2018-11-29 Thread Javier Gonzalez
> On 29 Nov 2018, at 08.16, Igor Konopko wrote: > > In current pblk implementation, l2p mapping for not closed lines > is always stored only in OOB metadata and recovered from it. > > Such a solution does not provide data integrity when drives does > not have such a OOB metadata space. > > The

Re: [PATCH v4 0/5] lightnvm: Flexible metadata

2018-11-29 Thread Javier Gonzalez
> On 29 Nov 2018, at 08.16, Igor Konopko wrote: > > This series of patches extends the way how pblk can > store L2P sector metadata. After this set of changes > any size of NVMe metadata is supported in pblk. > Also there is an support for case without NVMe metadata. > > Changes v3 --> v4: > -re

Re: [PATCH 17/20] aio: support for IO polling

2018-11-29 Thread Benny Halevy
On Wed, 2018-11-28 at 11:50 -0700, Jens Axboe wrote: > On 11/28/18 2:33 AM, Benny Halevy wrote: > > > I don't see how we can get there with it being larger than already, > > > that would be a big bug if we fill more events than userspace asked > > > for. > > > > > > > Currently we indeed can't, b

Re: [PATCH v4 3/5] lightnvm: Flexible DMA pool entry size

2018-11-29 Thread Matias Bjørling
On 11/29/2018 08:16 AM, Igor Konopko wrote: Currently whole lightnvm and pblk uses single DMA pool, for which entry size is always equal to PAGE_SIZE. PPA list always needs 8B*64, so there is only 56B*64 space for OOB meta. Since NVMe OOB meta can be bigger, such as 128B, this solution is not rob

Re: [PATCH] block: update documentation

2018-11-29 Thread Bryan Gurney
On Tue, Nov 27, 2018 at 8:25 PM Damien Le Moal wrote: > > Add the description of the zoned, nr_zones and chunk_sectors sysfs queue > attributes to Documentation/block/queue-sysfs.txt. The description of > the zoned and chunk_sector attributes are mostly copied from > ABI/testing/sysfs-block. While

Re: [PATCH 1/2] ataflop: fix error handling in atari_floppy_init()

2018-11-29 Thread Jens Axboe
On 11/29/18 3:55 AM, Dan Carpenter wrote: > Smatch complains that there is an off by one if the allocation fails in: > > DMABuffer = atari_stram_alloc(BUFFER_SIZE+512, "ataflop"); > > In that situation, "i" would be point to one element beyond the end of > the unit[] array. > > There is a

Re: [PATCH 2/2] blk-mq: Add a NULL check in blk_mq_free_map_and_requests()

2018-11-29 Thread Jens Axboe
On 11/29/18 3:56 AM, Dan Carpenter wrote: > I recently found some code which called blk_mq_free_map_and_requests() > with a NULL set->tags pointer. I fixed the caller, but it seems like a > good idea to add a NULL check here as well. Now we can call: > > blk_mq_free_tag_set(set); > b

Re: [PATCH 1/7] block: improve logic around when to sort a plug list

2018-11-29 Thread Christoph Hellwig
Looks good, Reviewed-by: Christoph Hellwig

Re: [PATCH 2/7] blk-mq: add mq_ops->commit_rqs()

2018-11-29 Thread Christoph Hellwig
On Wed, Nov 28, 2018 at 06:35:33AM -0700, Jens Axboe wrote: > blk-mq passes information to the hardware about any given request being > the last that we will issue in this sequence. The point is that hardware > can defer costly doorbell type writes to the last request. But if we run > into errors i

Re: [PATCH 5/7] ataflop: implement mq_ops->commit_rqs() hook

2018-11-29 Thread Christoph Hellwig
On Wed, Nov 28, 2018 at 06:35:36AM -0700, Jens Axboe wrote: > We need this for blk-mq to kick things into gear, if we told it that > we had more IO coming, but then failed to deliver on that promise. > > Reviewed-by: Omar Sandoval > Signed-off-by: Jens Axboe Looks good, Reviewed-by: Christoph

Re: [PATCH 3/7] nvme: implement mq_ops->commit_rqs() hook

2018-11-29 Thread Christoph Hellwig
> +static inline int nvme_next_ring_index(struct nvme_queue *nvmeq, u16 index) > +{ > + if (++index == nvmeq->q_depth) > + return 0; > + > + return index; > +} This is unused now. Also what about this little cleanup on top? diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/

Re: [PATCH 6/7] blk-mq: use bd->last == true for list inserts

2018-11-29 Thread Christoph Hellwig
On Wed, Nov 28, 2018 at 06:35:37AM -0700, Jens Axboe wrote: > If we are issuing a list of requests, we know if we're at the last one. > If we fail issuing, ensure that we call ->commits_rqs() to flush any > potential previous requests. > > Reviewed-by: Omar Sandoval > Signed-off-by: Jens Axboe

Re: [PATCH 7/7] blk-mq: use plug for devices that implement ->commits_rqs()

2018-11-29 Thread Christoph Hellwig
> + /* > + * Use plugging if we have a ->commit_rqs() hook as well, > + * as we know the driver uses bd->last in a smart > + * fashion. > + */ Nipick: this could flow on just two lines: /* * Use plugg

Re: [PATCH 7/7] blk-mq: use plug for devices that implement ->commits_rqs()

2018-11-29 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 07:49:59AM -0800, Christoph Hellwig wrote: > > + /* > > +* Use plugging if we have a ->commit_rqs() hook as well, > > +* as we know the driver uses bd->last in a smart > > +* fashion. > > +*/ > > Nipick: this could f

Re: [PATCH v3] block: add io timeout to sysfs

2018-11-29 Thread Christoph Hellwig
I think we need a check for the presence of a timeout method and only show this attribute if the driver actually supports block level timeouts.

Re: [PATCH v2] block: add documentation for io_timeout

2018-11-29 Thread Bart Van Assche
On Thu, 2018-11-29 at 18:22 +0800, Weiping Zhang wrote: > add documentation for /sys/block//queue/io_timeout Patch descriptions should consist of full sentences. That means that these should start with a capital letter and end with a period. > + > +What:/sys/block//queue/io_timeou

Re: [PATCH 2/7] blk-mq: add mq_ops->commit_rqs()

2018-11-29 Thread Jens Axboe
On 11/29/18 8:45 AM, Christoph Hellwig wrote: > On Wed, Nov 28, 2018 at 06:35:33AM -0700, Jens Axboe wrote: >> blk-mq passes information to the hardware about any given request being >> the last that we will issue in this sequence. The point is that hardware >> can defer costly doorbell type writes

Re: [PATCH v3 1/2] arm64/neon: add workaround for ambiguous C99 stdint.h types

2018-11-29 Thread Dave Martin
On Tue, Nov 27, 2018 at 06:08:57PM +0800, Jackie Liu wrote: > In a way similar to ARM commit 09096f6a0ee2 ("ARM: 7822/1: add workaround > for ambiguous C99 stdint.h types"), this patch redefines the macros that > are used in stdint.h so its definitions of uint64_t and int64_t are > compatible with

Re: [PATCH v3 2/2] arm64: crypto: add NEON accelerated XOR implementation

2018-11-29 Thread Dave Martin
On Tue, Nov 27, 2018 at 06:08:58PM +0800, Jackie Liu wrote: > This is a NEON acceleration method that can improve > performance by approximately 20%. I got the following > data from the centos 7.5 on Huawei's HISI1616 chip: > > [ 93.837726] xor: measuring software checksum speed > [ 93.874039] 8

Re: [PATCH 3/7] nvme: implement mq_ops->commit_rqs() hook

2018-11-29 Thread Jens Axboe
On 11/29/18 8:47 AM, Christoph Hellwig wrote: >> +static inline int nvme_next_ring_index(struct nvme_queue *nvmeq, u16 index) >> +{ >> +if (++index == nvmeq->q_depth) >> +return 0; >> + >> +return index; >> +} > > This is unused now. Huh, wonder how I missed that. GCC must not

Re: [PATCH 3/7] nvme: implement mq_ops->commit_rqs() hook

2018-11-29 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 10:02:25AM -0700, Jens Axboe wrote: > On 11/29/18 8:47 AM, Christoph Hellwig wrote: > >> +static inline int nvme_next_ring_index(struct nvme_queue *nvmeq, u16 > >> index) > >> +{ > >> + if (++index == nvmeq->q_depth) > >> + return 0; > >> + > >> + return index; >

Re: [PATCH 7/7] blk-mq: use plug for devices that implement ->commits_rqs()

2018-11-29 Thread Jens Axboe
On 11/29/18 8:49 AM, Christoph Hellwig wrote: >> +/* >> + * Use plugging if we have a ->commit_rqs() hook as well, >> + * as we know the driver uses bd->last in a smart >> + * fashion. >> + */ > > Nipick: this could flow on just two lines

Re: [PATCH 3/7] nvme: implement mq_ops->commit_rqs() hook

2018-11-29 Thread Jens Axboe
On 11/29/18 10:04 AM, Christoph Hellwig wrote: > On Thu, Nov 29, 2018 at 10:02:25AM -0700, Jens Axboe wrote: >> On 11/29/18 8:47 AM, Christoph Hellwig wrote: +static inline int nvme_next_ring_index(struct nvme_queue *nvmeq, u16 index) +{ + if (++index == nvmeq->q_depth) +

Re: [PATCH 3/7] nvme: implement mq_ops->commit_rqs() hook

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 10:06:20AM -0700, Jens Axboe wrote: > On 11/29/18 10:04 AM, Christoph Hellwig wrote: > > gcc never warns about unused static inline functions. Which makes a lot > > of sense at least for headers.. > > Not so much for non-headers :-) You can #include .c files too! :)

Re: [PATCH 3/7] nvme: implement mq_ops->commit_rqs() hook

2018-11-29 Thread Jens Axboe
On 11/29/18 10:38 AM, Keith Busch wrote: > On Thu, Nov 29, 2018 at 10:06:20AM -0700, Jens Axboe wrote: >> On 11/29/18 10:04 AM, Christoph Hellwig wrote: >>> gcc never warns about unused static inline functions. Which makes a lot >>> of sense at least for headers.. >> >> Not so much for non-headers

Re: [PATCH v3 2/2] arm64: crypto: add NEON accelerated XOR implementation

2018-11-29 Thread Ard Biesheuvel
On Thu, 29 Nov 2018 at 18:00, Dave Martin wrote: > > On Tue, Nov 27, 2018 at 06:08:58PM +0800, Jackie Liu wrote: > > This is a NEON acceleration method that can improve > > performance by approximately 20%. I got the following > > data from the centos 7.5 on Huawei's HISI1616 chip: > > > > [ 93.83

Re: [PATCH v3 2/2] arm64: crypto: add NEON accelerated XOR implementation

2018-11-29 Thread Dave Martin
On Thu, Nov 29, 2018 at 07:09:10PM +0100, Ard Biesheuvel wrote: > On Thu, 29 Nov 2018 at 18:00, Dave Martin wrote: > > > > On Tue, Nov 27, 2018 at 06:08:58PM +0800, Jackie Liu wrote: [...] > > > +static struct xor_block_template xor_block_arm64 = { > > > + .name = "arm64_neon", > > > +

[PATCH 05/13] nvme-pci: consolidate code for polling non-dedicated queues

2018-11-29 Thread Christoph Hellwig
We have three places that can poll for I/O completions on a normal interrupt-enabled queue. All of them are in slow path code, so consolidate them to a single helper that uses spin_lock_irqsave and removes the fast path cqe_pending check. Signed-off-by: Christoph Hellwig --- drivers/nvme/host/p

block and nvme polling improvements V2

2018-11-29 Thread Christoph Hellwig
Hi all, this series optimizes a few bits in the block layer and nvme code related to polling. It starts by moving the queue types recently introduce entirely into the block layer instead of requiring an indirect call for them. It then switches nvme and the block layer to only allow polling with

[PATCH 03/13] nvme-pci: cleanup SQ allocation a bit

2018-11-29 Thread Christoph Hellwig
Use a bit flag to mark if the SQ was allocated from the CMB, and clean up the surrounding code a bit. Signed-off-by: Christoph Hellwig --- drivers/nvme/host/pci.c | 33 +++-- 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/drivers/nvme/host/pci.c b/dri

[PATCH 02/13] nvme-pci: use atomic bitops to mark a queue enabled

2018-11-29 Thread Christoph Hellwig
This gets rid of all the messing with cq_vector and the ->polled field by using an atomic bitop to mark the queue enabled or not. Signed-off-by: Christoph Hellwig --- drivers/nvme/host/pci.c | 43 ++--- 1 file changed, 15 insertions(+), 28 deletions(-) diff -

[PATCH 01/13] block: move queues types to the block layer

2018-11-29 Thread Christoph Hellwig
Having another indirect all in the fast path doesn't really help in our post-spectre world. Also having too many queue type is just going to create confusion, so I'd rather manage them centrally. Note that the queue type naming and ordering changes a bit - the first index now is the defauly queue

[PATCH 10/13] nvme-mpath: remove I/O polling support

2018-11-29 Thread Christoph Hellwig
The ->poll_fn has been stale for a while, as a lot of places check for mq ops. But there is no real point in it anyway, as we don't even use the multipath code for subsystems without multiple ports, which is usually what we do high performance I/O to. If it really becomes an issue we should rewor

[PATCH 09/13] nvme-rdma: remove I/O polling support

2018-11-29 Thread Christoph Hellwig
The code was always a bit of a hack that digs far too much into RDMA core internals. Lets kick it out and reimplement proper dedicated poll queues as needed. Signed-off-by: Christoph Hellwig --- drivers/nvme/host/rdma.c | 24 1 file changed, 24 deletions(-) diff --git

[PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues

2018-11-29 Thread Christoph Hellwig
This is the last place outside of nvme_irq that handles CQEs from interrupt context, and thus is in the way of removing the cq_lock for normal queues, and avoiding lockdep warnings on the poll queues, for which we already take it without IRQ disabling. Signed-off-by: Christoph Hellwig --- driver

[PATCH 13/13] block: enable polling by default if a poll map is initalized

2018-11-29 Thread Christoph Hellwig
If the user did setup polling in the driver we should not require another know in the block layer to enable it. Signed-off-by: Christoph Hellwig --- block/blk-mq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index 9c90c5038d07..a550a00ac00c 100644 --- a/

[PATCH 11/13] block: remove ->poll_fn

2018-11-29 Thread Christoph Hellwig
This was intended to support users like nvme multipath, but is just getting in the way and adding another indirect call. Signed-off-by: Christoph Hellwig --- block/blk-core.c | 23 --- block/blk-mq.c | 24 +++- include/linux/blkdev.h | 2 --

[PATCH 04/13] nvme-pci: only allow polling with separate poll queues

2018-11-29 Thread Christoph Hellwig
This will allow us to simplify both the regular NVMe interrupt handler and the upcoming aio poll code. In addition to that the separate queues are generally a good idea for performance reasons. Signed-off-by: Christoph Hellwig --- drivers/nvme/host/pci.c | 18 +- 1 file changed,

[PATCH 12/13] block: only allow polling if a poll queue_map exists

2018-11-29 Thread Christoph Hellwig
This avoids having to have differnet mq_ops for different setups with or without poll queues. Signed-off-by: Christoph Hellwig --- block/blk-sysfs.c | 2 +- drivers/nvme/host/pci.c | 29 + 2 files changed, 10 insertions(+), 21 deletions(-) diff --git a/block/b

[PATCH 06/13] nvme-pci: refactor nvme_disable_io_queues

2018-11-29 Thread Christoph Hellwig
Pass the opcode for the delete SQ/CQ command as an argument instead of the somewhat confusing pass loop. Signed-off-by: Christoph Hellwig --- drivers/nvme/host/pci.c | 41 - 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/drivers/nvme/host/

[PATCH 08/13] nvme-pci: remove the CQ lock for interrupt driven queues

2018-11-29 Thread Christoph Hellwig
Now that we can't poll regular, interrupt driven I/O queues there is almost nothing that can race with an interrupt. The only possible other contexts polling a CQ are the error handler and queue shutdown, and both are so far off in the slow path that we can simply use the big hammer of disabling i

[PATCH] sbitmap: don't loop for find_next_zero_bit() for !round_robin

2018-11-29 Thread Jens Axboe
If we aren't forced to do round robin tag allocation, just use the allocation hint to find the index for the tag word, don't use it for the offset inside the word. This avoids a potential extra round trip in the bit looping. Signed-off-by: Jens Axboe --- diff --git a/lib/sbitmap.c b/lib/sbitmap

Re: [PATCH] sbitmap: don't loop for find_next_zero_bit() for !round_robin

2018-11-29 Thread Omar Sandoval
On Thu, Nov 29, 2018 at 12:34:12PM -0700, Jens Axboe wrote: > If we aren't forced to do round robin tag allocation, just use the > allocation hint to find the index for the tag word, don't use it for the > offset inside the word. Maybe also add "We're already fetching that cache line, so we might

Re: [PATCH] sbitmap: don't loop for find_next_zero_bit() for !round_robin

2018-11-29 Thread Jens Axboe
On 11/29/18 12:42 PM, Omar Sandoval wrote: > On Thu, Nov 29, 2018 at 12:34:12PM -0700, Jens Axboe wrote: >> If we aren't forced to do round robin tag allocation, just use the >> allocation hint to find the index for the tag word, don't use it for the >> offset inside the word. > > Maybe also add "

[PATCH v2] sbitmap: don't loop for find_next_zero_bit() for !round_robin

2018-11-29 Thread Jens Axboe
If we aren't forced to do round robin tag allocation, just use the allocation hint to find the index for the tag word, don't use it for the offset inside the word. This avoids a potential extra round trip in the bit looping, and since we're fetching this cacheline, we may as well check the whole wo

Re: [PATCH 01/13] block: move queues types to the block layer

2018-11-29 Thread Jens Axboe
On 11/29/18 12:12 PM, Christoph Hellwig wrote: > Having another indirect all in the fast path doesn't really help > in our post-spectre world. Also having too many queue type is just > going to create confusion, so I'd rather manage them centrally. > > Note that the queue type naming and ordering

[PATCH] sbitmap: ammortize cost of clearing bits

2018-11-29 Thread Jens Axboe
sbitmap maintains a set of words that we use to set and clear bits, with each bit representing a tag for blk-mq. Even though we spread the bits out and maintain a hint cache, one particular bit allocated will end up being cleared in the exact same spot. This introduces batched clearing of bits. In

Re: [PATCH 01/13] block: move queues types to the block layer

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote: > +enum hctx_type { > + HCTX_TYPE_DEFAULT, /* all I/O not otherwise accounted for */ > + HCTX_TYPE_READ, /* just for READ I/O */ > + HCTX_TYPE_POLL, /* polled I/O of any kind */ > + > + HCTX_MAX_

Re: [PATCH 02/13] nvme-pci: use atomic bitops to mark a queue enabled

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 08:12:59PM +0100, Christoph Hellwig wrote: > This gets rid of all the messing with cq_vector and the ->polled field > by using an atomic bitop to mark the queue enabled or not. > > Signed-off-by: Christoph Hellwig Looks good. Reviewed-by: Keith Busch

Re: [PATCH 03/13] nvme-pci: cleanup SQ allocation a bit

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 08:13:00PM +0100, Christoph Hellwig wrote: > Use a bit flag to mark if the SQ was allocated from the CMB, and clean > up the surrounding code a bit. > > Signed-off-by: Christoph Hellwig Looks good. Reviewed-by: Keith Busch

Re: [PATCH 01/13] block: move queues types to the block layer

2018-11-29 Thread Jens Axboe
On 11/29/18 1:19 PM, Keith Busch wrote: > On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote: >> +enum hctx_type { >> +HCTX_TYPE_DEFAULT, /* all I/O not otherwise accounted for */ >> +HCTX_TYPE_READ, /* just for READ I/O */ >> +HCTX_TYPE_POLL, /* poll

Re: [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote: > This is the last place outside of nvme_irq that handles CQEs from > interrupt context, and thus is in the way of removing the cq_lock for > normal queues, and avoiding lockdep warnings on the poll queues, for > which we already ta

Re: [PATCH 06/13] nvme-pci: refactor nvme_disable_io_queues

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 08:13:03PM +0100, Christoph Hellwig wrote: > Pass the opcode for the delete SQ/CQ command as an argument instead of > the somewhat confusing pass loop. > > Signed-off-by: Christoph Hellwig Looks good. Reviewed-by: Keith Busch

Re: [PATCH v2] sbitmap: don't loop for find_next_zero_bit() for !round_robin

2018-11-29 Thread Omar Sandoval
On Thu, Nov 29, 2018 at 12:47:49PM -0700, Jens Axboe wrote: > If we aren't forced to do round robin tag allocation, just use the > allocation hint to find the index for the tag word, don't use it for the > offset inside the word. This avoids a potential extra round trip in the > bit looping, and si

Re: [PATCH 08/13] nvme-pci: remove the CQ lock for interrupt driven queues

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 08:13:05PM +0100, Christoph Hellwig wrote: > @@ -1050,12 +1051,16 @@ static irqreturn_t nvme_irq(int irq, void *data) > irqreturn_t ret = IRQ_NONE; > u16 start, end; > > - spin_lock(&nvmeq->cq_lock); > + /* > + * The rmb/wmb pair ensures we see all

Re: [PATCH] sbitmap: ammortize cost of clearing bits

2018-11-29 Thread Omar Sandoval
On Thu, Nov 29, 2018 at 01:00:25PM -0700, Jens Axboe wrote: > sbitmap maintains a set of words that we use to set and clear bits, with > each bit representing a tag for blk-mq. Even though we spread the bits > out and maintain a hint cache, one particular bit allocated will end up > being cleared i

Re: [PATCH] sbitmap: ammortize cost of clearing bits

2018-11-29 Thread Jens Axboe
On 11/29/18 2:53 PM, Omar Sandoval wrote: > On Thu, Nov 29, 2018 at 01:00:25PM -0700, Jens Axboe wrote: >> sbitmap maintains a set of words that we use to set and clear bits, with >> each bit representing a tag for blk-mq. Even though we spread the bits >> out and maintain a hint cache, one particu

[PATCH] block: avoid extra bio reference for async O_DIRECT

2018-11-29 Thread Jens Axboe
The bio referencing has a trick that doesn't do any actual atomic inc/dec on the reference count until we have to elevator to > 1. For the async IO O_DIRECT case, we can't use the simple DIO variants, so we use __blkdev_direct_IO(). It always grabs an extra reference to the bio after allocation, wh

Re: [PATCH] block: avoid extra bio reference for async O_DIRECT

2018-11-29 Thread Jens Axboe
On 11/29/18 3:55 PM, Jens Axboe wrote: > The bio referencing has a trick that doesn't do any actual atomic > inc/dec on the reference count until we have to elevator to > 1. For the > async IO O_DIRECT case, we can't use the simple DIO variants, so we use > __blkdev_direct_IO(). It always grabs an

[PATCH 1/3] sbitmap: ensure that sbitmap maps are properly aligned

2018-11-29 Thread Jens Axboe
We try to be careful with alignment for cache purposes, but all of that is worthless if we don't actually align the maps themselves. Signed-off-by: Jens Axboe --- include/linux/sbitmap.h | 11 --- lib/sbitmap.c | 7 +-- 2 files changed, 13 insertions(+), 5 deletions(-) di

[PATCH 3/3] sbitmap: optimize wakeup check

2018-11-29 Thread Jens Axboe
Even if we have no waiters on any of the sbitmap_queue wait states, we still have to loop every entry to check. We do this for every IO, so the cost adds up. Shift a bit of the cost to the slow path, when we actually have waiters. Wrap prepare_to_wait_exclusive() and finish_wait(), so we can maint

[no subject]

2018-11-29 Thread Jens Axboe
Three patches here: 1) Ensure that we align ->map properly 2) v2 of the sbitmap clear cost ammortization. Updated to do a wakeup check AFTER we're done swapping free/cleared masks. Kept the separate alignment for ->word, as it is faster in testing. 3) Cost reduction of having to do wait qu

[PATCH 2/3] sbitmap: ammortize cost of clearing bits

2018-11-29 Thread Jens Axboe
sbitmap maintains a set of words that we use to set and clear bits, with each bit representing a tag for blk-mq. Even though we spread the bits out and maintain a hint cache, one particular bit allocated will end up being cleared in the exact same spot. This introduces batched clearing of bits. In

Re: [PATCH v3 2/2] arm64: crypto: add NEON accelerated XOR implementation

2018-11-29 Thread JackieLiu
> 在 2018年11月30日,02:20,Dave Martin 写道: > > On Thu, Nov 29, 2018 at 07:09:10PM +0100, Ard Biesheuvel wrote: >> On Thu, 29 Nov 2018 at 18:00, Dave Martin wrote: >>> >>> On Tue, Nov 27, 2018 at 06:08:58PM +0800, Jackie Liu wrote: > > [...] > +static struct xor_block_template xor_block_arm

[PATCHSET v3] sbitmap optimizations

2018-11-29 Thread Jens Axboe
The v2 posting got screwed up somehow, sending a v3 just to make sure things are sane. Changes: - Dropped the alignment patch, it should not be needed unless we have debugging enabled of some sort. - Fumbled the optimized wakeup, it's important we match prep + finish with that interface. --

[PATCH 1/2] sbitmap: ammortize cost of clearing bits

2018-11-29 Thread Jens Axboe
sbitmap maintains a set of words that we use to set and clear bits, with each bit representing a tag for blk-mq. Even though we spread the bits out and maintain a hint cache, one particular bit allocated will end up being cleared in the exact same spot. This introduces batched clearing of bits. In

[PATCH 2/2] sbitmap: optimize wakeup check

2018-11-29 Thread Jens Axboe
Even if we have no waiters on any of the sbitmap_queue wait states, we still have to loop every entry to check. We do this for every IO, so the cost adds up. Shift a bit of the cost to the slow path, when we actually have waiters. Wrap prepare_to_wait_exclusive() and finish_wait(), so we can maint

Re:

2018-11-29 Thread Jens Axboe
On 11/29/18 6:12 PM, Jens Axboe wrote: > Three patches here: > > 1) Ensure that we align ->map properly > > 2) v2 of the sbitmap clear cost ammortization. Updated to do a wakeup >check AFTER we're done swapping free/cleared masks. Kept the >separate alignment for ->word, as it is faster i

Re: [PATCH] block: update documentation

2018-11-29 Thread Damien Le Moal
On 2018/11/29 23:54, Bryan Gurney wrote: >> +chunk_sectors (RO) >> +-- >> +This has different meaning depending on the type of the block device. >> +For a RAID device (dm-raid), chunk_sectors indicates the size in 512B >> sectors >> +of the RAID volume stripe segment. For a zoned b

[PATCH v2] block: update documentation

2018-11-29 Thread Damien Le Moal
Add the description of the zoned, nr_zones and chunk_sectors sysfs queue attributes to Documentation/block/queue-sysfs.txt. The description of the zoned and chunk_sector attributes are mostly copied from ABI/testing/sysfs-block (added a typo fix). While at it, also fix a typo in the description of

Re: [PATCH 01/13] block: move queues types to the block layer

2018-11-29 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 07:50:09PM +, Jens Axboe wrote: > > in our post-spectre world. Also having too many queue type is just > > going to create confusion, so I'd rather manage them centrally. > > > > Note that the queue type naming and ordering changes a bit - the > > first index now is th

Re: [PATCHv4 0/3] scsi timeout handling updates

2018-11-29 Thread Ming Lei
On Wed, Nov 28, 2018 at 09:39:44PM -0500, Martin K. Petersen wrote: > > Ming, > > > On Wed, Nov 28, 2018 at 11:08:48AM +0100, Christoph Hellwig wrote: > >> On Wed, Nov 28, 2018 at 06:07:01PM +0800, Ming Lei wrote: > >> > > Is this the nvme target on top of null_blk? > >> > > >> > Yes. > >> > >>

[PATCH] lightnvm: simplify geometry enumeration

2018-11-29 Thread Matias Bjørling
Currently the geometry of an OCSSD is enumerated using a two step approach: First, nvm_register is called, the OCSSD identify command is issued, and second the geometry sos and csecs values are read either from the OCSSD identify if it is a 1.2 drive, or from the NVMe namespace data structure if i

Re: [PATCH 04/13] blkcg: introduce common blkg association logic

2018-11-29 Thread Tejun Heo
On Mon, Nov 26, 2018 at 04:19:37PM -0500, Dennis Zhou wrote: > There are 3 ways blkg association can happen: association with the > current css, with the page css (swap), or from the wbc css (writeback). > > This patch handles how association is done for the first case where we > are associating b

Re: [PATCH 05/13] blkcg: associate blkg when associating a device

2018-11-29 Thread Tejun Heo
On Mon, Nov 26, 2018 at 04:19:38PM -0500, Dennis Zhou wrote: > diff --git a/include/linux/bio.h b/include/linux/bio.h > index 62715a5a4f32..8bc9d9b29fd3 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -486,6 +486,12 @@ extern unsigned int bvec_nr_vecs(unsigned short idx); > ext

Re: [PATCH 11/13] blkcg: remove bio_disassociate_task()

2018-11-29 Thread Tejun Heo
On Mon, Nov 26, 2018 at 04:19:44PM -0500, Dennis Zhou wrote: > Now that a bio only holds a blkg reference, so clean up is simply > putting back that reference. Remove bio_disassociate_task() as it just > calls bio_disassociate_blkg() and call the latter directly. > > Signed-off-by: Dennis Zhou A

Re: [PATCHv4 0/3] scsi timeout handling updates

2018-11-29 Thread Christoph Hellwig
> diff --git a/block/blk-mq.c b/block/blk-mq.c > index a82830f39933..d0ef540711c7 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -647,7 +647,7 @@ EXPORT_SYMBOL(blk_mq_complete_request); > > int blk_mq_request_started(struct request *rq) > { > - return blk_mq_rq_state(rq) != MQ_RQ

Re: [PATCHv4 0/3] scsi timeout handling updates

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 06:11:59PM +0100, Christoph Hellwig wrote: > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index a82830f39933..d0ef540711c7 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -647,7 +647,7 @@ EXPORT_SYMBOL(blk_mq_complete_request); > > > > int blk_mq_req

Re: DIF/DIX issue related to config CONFIG_SCSI_MQ_DEFAULT

2018-11-29 Thread Ming Lei
On Wed, Nov 28, 2018 at 11:37:23AM +0800, chenxiang (M) wrote: > Hi Lei Ming, > > 在 2018/11/27 21:08, Ming Lei 写道: > > On Tue, Nov 27, 2018 at 05:55:45PM +0800, chenxiang (M) wrote: > > > Hi all, > > > > > > There is a issue which may be related to CONFIG_SCSI_MQ_DEFAULT: before we > > > develope

Re: [PATCH 04/13] blkcg: introduce common blkg association logic

2018-11-29 Thread Dennis Zhou
On Thu, Nov 29, 2018 at 07:49:17AM -0800, Tejun Heo wrote: > On Mon, Nov 26, 2018 at 04:19:37PM -0500, Dennis Zhou wrote: > > There are 3 ways blkg association can happen: association with the > > current css, with the page css (swap), or from the wbc css (writeback). > > > > This patch handles ho

Re: [PATCH 05/13] blkcg: associate blkg when associating a device

2018-11-29 Thread Dennis Zhou
On Thu, Nov 29, 2018 at 07:53:33AM -0800, Tejun Heo wrote: > On Mon, Nov 26, 2018 at 04:19:38PM -0500, Dennis Zhou wrote: > > diff --git a/include/linux/bio.h b/include/linux/bio.h > > index 62715a5a4f32..8bc9d9b29fd3 100644 > > --- a/include/linux/bio.h > > +++ b/include/linux/bio.h > > @@ -486,6

Re: [PATCH 2/3] block: switch to per-cpu in-flight counters

2018-11-29 Thread Mike Snitzer
On Tue, Nov 27 2018 at 7:42pm -0500, Mikulas Patocka wrote: > Now when part_round_stats is gone, we can switch to per-cpu in-flight > counters. > > We use the local-atomic type local_t, so that if part_inc_in_flight or > part_dec_in_flight is reentrantly called from an interrupt, the value will

Re: [PATCH 2/3] block: switch to per-cpu in-flight counters

2018-11-29 Thread Mikulas Patocka
On Thu, 29 Nov 2018, Mike Snitzer wrote: > On Tue, Nov 27 2018 at 7:42pm -0500, > Mikulas Patocka wrote: > > > Now when part_round_stats is gone, we can switch to per-cpu in-flight > > counters. > > > > We use the local-atomic type local_t, so that if part_inc_in_flight or > > part_dec_in_f

Re: [PATCH 2/3] block: switch to per-cpu in-flight counters

2018-11-29 Thread Jens Axboe
On 11/29/18 3:05 PM, Mikulas Patocka wrote: > > > On Thu, 29 Nov 2018, Mike Snitzer wrote: > >> On Tue, Nov 27 2018 at 7:42pm -0500, >> Mikulas Patocka wrote: >> >>> Now when part_round_stats is gone, we can switch to per-cpu in-flight >>> counters. >>> >>> We use the local-atomic type local_t

Re: [PATCH 2/3] block: switch to per-cpu in-flight counters

2018-11-29 Thread Mike Snitzer
On Thu, Nov 29 2018 at 5:05pm -0500, Mikulas Patocka wrote: > > > On Thu, 29 Nov 2018, Mike Snitzer wrote: > > > On Tue, Nov 27 2018 at 7:42pm -0500, > > Mikulas Patocka wrote: > > > > > Now when part_round_stats is gone, we can switch to per-cpu in-flight > > > counters. > > > > > > We u

Re: [PATCH 3/3] block: return just one value from part_in_flight

2018-11-29 Thread Mike Snitzer
On Tue, Nov 27 2018 at 7:42pm -0500, Mikulas Patocka wrote: > The previous patches deleted all the code that needed the second value > returned from part_in_flight - now the kernel only uses the first value. > > Consequently, part_in_flight (and blk_mq_in_flight) may be changed so that > it onl

Re: DIF/DIX issue related to config CONFIG_SCSI_MQ_DEFAULT

2018-11-29 Thread Ming Lei
On Wed, Nov 28, 2018 at 10:50:11AM +0800, chenxiang (M) wrote: > Hi Lei Ming, > > 在 2018/11/27 21:08, Ming Lei 写道: > > On Tue, Nov 27, 2018 at 05:55:45PM +0800, chenxiang (M) wrote: > > > Hi all, > > > > > > There is a issue which may be related to CONFIG_SCSI_MQ_DEFAULT: before we > > > develop

Re: [PATCH v4 11/13] nvmet-tcp: add NVMe over TCP target driver

2018-11-29 Thread Sagi Grimberg
+static inline void nvmet_tcp_put_cmd(struct nvmet_tcp_cmd *cmd) +{ +    if (unlikely(cmd == &cmd->queue->connect)) +    return; if you don't return connect cmd to the list please don't add it to it in the first place (during alloc_cmd). and if you use it once, we might think of a clean

Re: [PATCH v4 00/13] TCP transport binding for NVMe over Fabrics

2018-11-29 Thread Sagi Grimberg
What is the plan ahead here? I think the nvme code looks pretty reasonable now (I'll do another pass at nitpicking), but we need the networking stuff sorted out with at least ACKs, or a merge through the networking tree and then a shared branch we can pull in. I would think that having Dave

Re: [PATCH v4 00/13] TCP transport binding for NVMe over Fabrics

2018-11-29 Thread David Miller
From: Sagi Grimberg Date: Thu, 29 Nov 2018 17:24:09 -0800 > >> What is the plan ahead here? I think the nvme code looks pretty >> reasonable now (I'll do another pass at nitpicking), but we need the >> networking stuff sorted out with at least ACKs, or a merge through >> the networking tree and

Re: [PATCH] lightnvm: simplify geometry enumeration

2018-11-29 Thread Javier Gonzalez
> On 29 Nov 2018, at 15.27, Matias Bjørling wrote: > > Currently the geometry of an OCSSD is enumerated using a two step > approach: > > First, nvm_register is called, the OCSSD identify command is issued, > and second the geometry sos and csecs values are read either from the > OCSSD identify i