Thanks,
applied to nvme-4.20 with slight tweaks to the changelog.
In a way similar to ARM commit 09096f6a0ee2 ("ARM: 7822/1: add workaround
for ambiguous C99 stdint.h types"), this patch redefines the macros that
are used in stdint.h so its definitions of uint64_t and int64_t are
compatible with those of the kernel.
This patch comes from:
This is a NEON acceleration method that can improve
performance by approximately 20%. I got the following
data from the centos 7.5 on Huawei's HISI1616 chip:
[ 93.837726] xor: measuring software checksum speed
[ 93.874039] 8regs : 7123.200 MB/sec
[ 93.914038] 32regs : 7180.300 MB/sec
[
On Mon, Nov 26, 2018 at 04:33:10PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 12:00:17PM +0800, Ming Lei wrote:
> > This test covers the following two issues:
> >
> > 1) discard sector need to be aligned with logical block size
> >
> > 2) make sure 'sector_t' instead of 'unsigned int'
On 11/21/2018 6:23 PM, Christoph Hellwig wrote:
Hi all,
this series optimizes a few bits in the block layer and nvme code
related to polling.
It starts by moving the queue types recently introduce entirely into
the block layer instead of requiring an indirect call for them.
It then switches
On Thu, Nov 15, 2018 at 12:00:17PM +0800, Ming Lei wrote:
> This test covers the following two issues:
>
> 1) discard sector need to be aligned with logical block size
>
> 2) make sure 'sector_t' instead of 'unsigned int' is used when comparing
> with discard sector size
>
> Signed-off-by: Ming
On Mon, Nov 26, 2018 at 02:31:29PM -0500, Theodore Y. Ts'o wrote:
> On Mon, Nov 26, 2018 at 10:37:23AM -0800, Omar Sandoval wrote:
> >
> > Hm, what if we output it as KERN_INFO?
> >
> > diff --git a/check b/check
> > index f6c3537..9b4765f 100755
> > --- a/check
> > +++ b/check
> > @@ -314,7
On Mon, Nov 26, 2018 at 10:37:23AM -0800, Omar Sandoval wrote:
>
> Hm, what if we output it as KERN_INFO?
>
> diff --git a/check b/check
> index f6c3537..9b4765f 100755
> --- a/check
> +++ b/check
> @@ -314,7 +314,7 @@ _call_test() {
>
> if [[ -w /dev/kmsg ]]; then
> local
Hello,
My name is ms. Reem Al-Hashimi. The UAE minister of state for international
cooparation. I got your contact from an email database from your country. I
have a financial transaction i would like to discuss with you. Please reply to
reem2...@daum.net, for more details if you are
On Mon, Nov 26, 2018 at 01:32:11PM -0500, Theodore Y. Ts'o wrote:
> On Mon, Nov 26, 2018 at 09:50:57AM -0800, Omar Sandoval wrote:
> >
> > Hey, Ted, sorry, I meant to ask you about this in person at LPC but
> > forgot to. Forgive my ignorance about syslog, but does syslog not pick
> > up the line
On Mon, Nov 26, 2018 at 09:50:57AM -0800, Omar Sandoval wrote:
>
> Hey, Ted, sorry, I meant to ask you about this in person at LPC but
> forgot to. Forgive my ignorance about syslog, but does syslog not pick
> up the line we write to dmesg?
Unfortunately, it does not.
root@xfstests-ltm:~# echo
On Thu, Nov 22, 2018 at 08:02:21PM -0500, Theodore Y. Ts'o wrote:
> Ping?
>
> - Ted
>
> On Mon, Oct 29, 2018 at 12:15:57PM -0400, Theodore Ts'o wrote:
> > Signed-off-by: Theodore Ts'o
> > ---
> > check | 3 +++
> > 1
On Mon, Nov 26, 2018 at 09:35:49AM -0700, Jens Axboe wrote:
> This isn't exactly the same as the previous count, as it includes
> requests for all devices. But that really doesn't matter, if we have
> more than the threshold (16) queued up, flush it. It's not worth it
> to have an expensive list
Add polled variants of PREAD/PREADV and PWRITE/PWRITEV. These act
like their non-polled counterparts, except we expect to poll for
completion of them. The polling happens at io_getevent() time, and
works just like non-polled IO.
To setup an io_context for polled IO, the application must call
Some uses cases repeatedly get and put references to the same file, but
the only exposed interface is doing these one at the time. As each of
these entail an atomic inc or dec on a shared structure, that cost can
add up.
Add fget_many(), which works just like fget(), except it takes an
argument
On the submission side, add file reference batching to the
aio_submit_state. We get as many references as the number of iocbs we
are submitting, and drop unused ones if we end up switching files. The
assumption here is that we're usually only dealing with one fd, and if
there are multiple,
We have to add each submitted polled request to the io_context
poll_submitted list, which means we have to grab the poll_lock. We
already use the block plug to batch submissions if we're doing a batch
of IO submissions, extend that to cover the poll requests internally as
well.
Signed-off-by:
For io_submit(), we have to first copy each pointer to an iocb, then
copy the iocb. The latter is 64 bytes in size, and that's a lot of
copying for a single IO.
Add support for setting IOCTX_FLAG_USERIOCB through the new io_setup2()
system call, which allows the iocbs to reside in userspace. If
Replace the percpu_ref_put() + kmem_cache_free() with a call to
iocb_put() instead.
Signed-off-by: Jens Axboe
---
fs/aio.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 533cb7b1112f..e8457f9486e3 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1878,10
Signed-off-by: Jens Axboe
---
fs/aio.c | 14 ++
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index ba5758c854e8..12859ea1cb64 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1057,6 +1057,15 @@ static inline void iocb_put(struct aio_kiocb *iocb)
In preparation of handing in iocbs in a different fashion as well. Also
make it clear that the iocb being passed in isn't modified, by marking
it const throughout.
Signed-off-by: Jens Axboe
---
fs/aio.c | 68 +++-
1 file changed, 38
This is just like io_setup(), except add a flags argument to let the
caller control/define some of the io_context behavior. Outside of that,
we pass in an iocb array for future use.
Signed-off-by: Jens Axboe
---
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/aio.c
We know this is a read/write request, but in preparation for
having different kinds of those, ensure that we call the assigned
handler instead of assuming it's aio_complete_rq().
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/aio.c | 2 +-
1 file changed, 1 insertion(+), 1
From: Christoph Hellwig
Store the request queue the last bio was submitted to in the iocb
private data in addition to the cookie so that we find the right block
device. Also refactor the common direct I/O bio submission code into a
nice little helper.
Signed-off-by: Christoph Hellwig
It's 192 bytes, fairly substantial. Most items don't need to be cleared,
especially not upfront. Clear the ones we do need to clear, and leave
the other ones for setup when the iocb is prepared and submitted.
Signed-off-by: Jens Axboe
---
fs/aio.c | 19 ---
1 file changed, 12
Plugging is meant to optimize submission of a string of IOs, if we don't
have more than 2 being submitted, don't bother setting up a plug.
Signed-off-by: Jens Axboe
---
fs/aio.c | 18 ++
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index
We can't wait for polled events to complete, as they may require active
polling from whoever submitted it. If that is the same task that is
submitting new IO, we could deadlock waiting for IO to complete that
this task is supposed to be completing itself.
Signed-off-by: Jens Axboe
---
This is in preparation for certain types of IO not needing a ring
reserveration.
Signed-off-by: Jens Axboe
---
fs/aio.c | 30 +-
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index cf0de61743e8..eaceb40e6cf5 100644
--- a/fs/aio.c
From: Christoph Hellwig
No one is going to poll for aio (yet), so we must clear the HIPRI
flag, as we would otherwise send it down the poll queues, where no
one will be polling for completions.
Signed-off-by: Christoph Hellwig
IOCB_HIPRI, not RWF_HIPRI.
Signed-off-by: Jens Axboe
---
From: Christoph Hellwig
Just call blk_poll on the iocb cookie, we can derive the block device
from the inode trivially.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 10 ++
1 file changed, 10 insertions(+)
diff --git a/fs/block_dev.c
From: Christoph Hellwig
This new methods is used to explicitly poll for I/O completion for an
iocb. It must be called for any iocb submitted asynchronously (that
is with a non-null ki_complete) which has the IOCB_HIPRI flag set.
The method is assisted by a new ki_cookie field in struct iocb to
For the grand introduction to this feature, see my original posting
here:
https://lore.kernel.org/linux-block/20181117235317.7366-1-ax...@kernel.dk/
The patchset continues to evolve, and has grown some optimizations that
benefit non-polled aio as well. One such example is the user mapped
iocbs,
If the ioprio capability check fails, we return without putting
the file pointer.
Fixes: d9a08a9e616b ("fs: Add aio iopriority support")
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/aio.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/aio.c b/fs/aio.c
index
This isn't exactly the same as the previous count, as it includes
requests for all devices. But that really doesn't matter, if we have
more than the threshold (16) queued up, flush it. It's not worth it
to have an expensive list loop for this.
Signed-off-by: Jens Axboe
---
block/blk-core.c
blk-mq passes information to the hardware about any given request being
the last that we will issue in this sequence. The point is that hardware
can defer costly doorbell type writes to the last request. But if we run
into errors issuing a sequence of requests, we may never send the request
with
If we have that hook, we know the driver handles bd->last == true in
a smart fashion. If it does, even for multiple hardware queues, it's
a good idea to flush batches of requests to the device, if we have
batches of requests from the submitter.
Signed-off-by: Jens Axboe
---
block/blk-mq.c | 26
If we are issuing a list of requests, we know if we're at the last one.
If we fail issuing, ensure that we call ->commits_rqs() to flush any
potential previous requests.
Signed-off-by: Jens Axboe
---
block/blk-core.c | 2 +-
block/blk-mq.c | 32
Series improving plugging for fast devices, but some fixes in here too.
1-2 are improvements around plugging accounting. Changes the behavior
a bit, but works fine for me.
3-6 add a ->commit_rqs() hook and implement it in drivers that use (or
will use) bd->last to optimize IO submission. If a
We need this for blk-mq to kick things into gear, if we told it that
we had more IO coming, but then failed to deliver on that promise.
Signed-off-by: Jens Axboe
---
drivers/block/ataflop.c | 8
1 file changed, 8 insertions(+)
diff --git a/drivers/block/ataflop.c
We need this for blk-mq to kick things into gear, if we told it that
we had more IO coming, but then failed to deliver on that promise.
Signed-off-by: Jens Axboe
---
drivers/block/virtio_blk.c | 15 +++
1 file changed, 15 insertions(+)
diff --git a/drivers/block/virtio_blk.c
Do it for the nr_hw_queues == 1 case, but only do it for the multi queue
case if we have requests for multiple devices in the plug.
Signed-off-by: Jens Axboe
---
block/blk-core.c | 1 +
block/blk-mq.c | 7 +--
include/linux/blkdev.h | 1 +
3 files changed, 7 insertions(+), 2
Split the command submission and the SQ doorbell ring, and add the
doorbell ring as our ->commit_rqs() hook. This allows a list of
requests to be issued, with nvme only writing the SQ update when
it's necessary. This is more efficient if we have lists of requests
to issue, particularly on
We don't need to use READ_ONCE() to read the map depth and word
fields. This reduces overhead of __sbitmap_queue_get() dramatically
on high IOPS devices, taking it from ~3% to a tenth of that.
Signed-off-by: Jens Axboe
---
lib/sbitmap.c | 8
1 file changed, 4 insertions(+), 4
On Fri, Nov 23, 2018 at 07:58:10AM -0800, Igor Konopko wrote:
> This patch fixes kernel OOPS for surprise removal
> scenario for PCIe connected NVMe drives.
>
> After latest changes, when PCIe device is not present,
> nvme_dev_remove_admin() calls blk_cleanup_queue() on
> admin queue, which frees
On 11/26/18 1:18 AM, Christoph Hellwig wrote:
>> -int blk_poll(struct request_queue *q, blk_qc_t cookie)
>> +int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
>
> The parameter will need some documentation.
Done
--
Jens Axboe
On 11/26/18 1:16 AM, Christoph Hellwig wrote:
>> -bool blk_poll(struct request_queue *q, blk_qc_t cookie)
>> +int blk_poll(struct request_queue *q, blk_qc_t cookie)
>
> Can you add a comment explaining the return value?
>
>> {
>> if (!q->poll_fn || !blk_qc_t_valid(cookie))
>>
> On 23 Nov 2018, at 16.45, Igor Konopko wrote:
>
> In current pblk implementation, l2p mapping for not closed lines
> is always stored only in OOB metadata and recovered from it.
>
> Such a solution does not provide data integrity when drives does
> not have such a OOB metadata space.
>
> The
> On 23 Nov 2018, at 16.45, Igor Konopko wrote:
>
> Currently whole lightnvm and pblk uses single DMA pool,
> for which entry size is always equal to PAGE_SIZE.
> PPA list always needs 8b*64, so there is only 56b*64
> space for OOB meta. Since NVMe OOB meta can be bigger,
> such as 128b, this
Thanks for your suggestions. I will fix and resend it.
BR, Jackie
> 在 2018年11月26日,19:30,Ard Biesheuvel 写道:
>
> Hi Jackie,
>
> On Mon, 26 Nov 2018 at 09:56, Jackie Liu wrote:
>>
>> This is a NEON acceleration method that can improve
>> performance by approximately 20%. I got the following
>>
Hi Jackie,
On Mon, 26 Nov 2018 at 09:56, Jackie Liu wrote:
>
> This is a NEON acceleration method that can improve
> performance by approximately 20%. I got the following
> data from the centos 7.5 on Huawei's HISI1616 chip:
>
> [ 93.837726] xor: measuring software checksum speed
> [ 93.874039]
> On 23 Nov 2018, at 16.45, Igor Konopko wrote:
>
> Currently pblk assumes that size of OOB metadata on drive is always
> equal to size of pblk_sec_meta struct. This commit add helpers which will
> allow to handle different sizes of OOB metadata on drive in the future.
> Still, after this patch
This is a NEON acceleration method that can improve
performance by approximately 20%. I got the following
data from the centos 7.5 on Huawei's HISI1616 chip:
[ 93.837726] xor: measuring software checksum speed
[ 93.874039] 8regs : 7123.200 MB/sec
[ 93.914038] 32regs : 7180.300 MB/sec
[
On Fri, Nov 23, 2018 at 11:34:11AM -0700, Jens Axboe wrote:
> It's pointless to do so, we are by definition on the CPU we want/need
> to be, as that's the one waiting for a completion event.
>
> Signed-off-by: Jens Axboe
Looks good,
Reviewed-by: Christoph Hellwig
> -int blk_poll(struct request_queue *q, blk_qc_t cookie)
> +int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
The parameter will need some documentation.
On Fri, Nov 23, 2018 at 11:34:08AM -0700, Jens Axboe wrote:
> It doesn't set HIPRI on the bio, so polling for it is pretty silly.
>
> Signed-off-by: Jens Axboe
Well, I know who forgot to mark it hipri :)
But for now this looks good, we need to revisit fabrics polling entirely
once everything
On Mon, Nov 26, 2018 at 3:20 AM Ming Lei wrote:
>
> We will support multi-page bvec soon, and have to deal with
> single-page vs multi-page bvec. This patch follows Christoph's
> suggestion to rename all the following helpers:
>
> for_each_bvec
> bvec_iter_bvec
>
> -bool blk_poll(struct request_queue *q, blk_qc_t cookie)
> +int blk_poll(struct request_queue *q, blk_qc_t cookie)
Can you add a comment explaining the return value?
> {
> if (!q->poll_fn || !blk_qc_t_valid(cookie))
> return false;
And false certainly isn't an integer
57 matches
Mail list logo