On Fri, 2019-02-01 at 08:23 -0700, Jens Axboe wrote:
> Here's v11 of the io_uring project. Main fixes in this release is a
> rework of how we grab the ctx->uring_lock, never using trylock for it in
> a user visible way. Outside of that, fixes around locking for the polled
> list when we hit -EAGAIN
On Fri, 1 Feb 2019, Hannes Reinecke wrote:
> Thing is, if we have _managed_ CPU hotplug (ie if the hardware provides some
> means of quiescing the CPU before hotplug) then the whole thing is trivial;
> disable SQ and wait for all outstanding commands to complete.
> Then trivially all requests are c
From: Jonas Rabenstein
Allow modification of the shadow mbr. If the shadow mbr is not marked as
done, this data will be presented read only as the device content. Only
after marking the shadow mbr as done and unlocking a locking range the
actual content is accessible.
Co-authored-by: David Kozub
From: Jonas Rabenstein
Check whether the shadow mbr does fit in the provided space on the
target. Also a proper firmware should handle this case and return an
error we may prevent problems or even damage with crappy firmwares.
Signed-off-by: Jonas Rabenstein
Reviewed-by: Scott Bauer
---
block
From: Jonas Rabenstein
Enable users to mark the shadow mbr as done without completely
deactivating the shadow mbr feature. This may be useful on reboots,
when the power to the disk is not disconnected in between and the shadow
mbr stores the required boot files. Of course, this saves also the
(fe
From: Jonas Rabenstein
All add_token_* functions have a common set of conditions that have to
be checked. Use a common function for those checks in order to avoid
different behaviour as well as code duplication.
Co-authored-by: David Kozub
Signed-off-by: Jonas Rabenstein
Signed-off-by: David K
Every step ends by calling cmd_finalize (via finalize_and_send)
yet every step adds the token OPAL_ENDLIST on its own. Moving
this into cmd_finalize decreases code duplication.
Co-authored-by: Jonas Rabenstein
Signed-off-by: David Kozub
Signed-off-by: Jonas Rabenstein
Reviewed-by: Scott Bauer
From: Jonas Rabenstein
Also the values of OPAL_UID_LENGTH and OPAL_METHOD_LENGTH are the same,
it is weird to use OPAL_UID_LENGTH for the definition of the methods.
Signed-off-by: Jonas Rabenstein
Reviewed-by: Scott Bauer
---
block/sed-opal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion
From: Jonas Rabenstein
Split the header generation from the (normal) memcpy part if a
bytestring is copied into the command buffer. This allows in-place
generation of the bytestring content. For example, copy_from_user may be
used without an intermediate buffer.
Signed-off-by: Jonas Rabenstein
Every step starts with resetting the cmd buffer as well as the comid and
constructs the appropriate OPAL_CALL command. Consequently, those
actions may be combined into one generic function. On should take care
that the opening and closing tokens for the argument list are already
emitted by cmd_star
From: Jonas Rabenstein
Add function address (and if available its symbol) to the message if a
step function fails.
Signed-off-by: Jonas Rabenstein
Reviewed-by: Scott Bauer
---
block/sed-opal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/sed-opal.c b/block/sed
Originally each of the opal functions that call next include
opal_discovery0 in the array of steps. This is superfluous and
can be done always inside next.
Signed-off-by: David Kozub
Reviewed-by: Scott Bauer
---
block/sed-opal.c | 89 +++-
1 file chan
From: Jonas Rabenstein
Instead of having multiple places defining the same argument list to get
a specific column of a sed-opal table, provide a generic version and
call it from those functions.
Signed-off-by: Jonas Rabenstein
Reviewed-by: Scott Bauer
---
block/opal_proto.h | 2 +
block/sed
The steps argument is only read by the next function, so it can
be passed directly as an argument rather than via opal_dev.
Normally, the steps is an array on the stack, so the pointer stops
being valid then the function that set opal_dev.steps returns.
If opal_dev.steps was not set to NULL before
response_get_token had already been in place, its functionality had
been duplicated within response_get_{u64,bytestring} with the same error
handling. Unify the handling by reusing response_get_token within the
other functions.
Co-authored-by: Jonas Rabenstein
Signed-off-by: David Kozub
Signed-o
response_get_{string,u64} include error handling for argument resp being
NULL but response_get_token does not handle this.
Make all three of response_get_{string,u64,token} handle NULL resp in
the same way.
Co-authored-by: Jonas Rabenstein
Signed-off-by: David Kozub
Signed-off-by: Jonas Rabenst
This should make no change in functionality.
The formatting changes were triggered by checkpatch.pl.
Signed-off-by: David Kozub
Reviewed-by: Scott Bauer
---
block/sed-opal.c | 19 +++
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/block/sed-opal.c b/block/sed-opa
This patch series extends SED OPAL support: it adds IOCTL for setting the shadow
MBR done flag which can be useful for unlocking an OPAL disk on boot and it adds
IOCTL for writing to the shadow MBR. Also included are some minor fixes and
improvements.
This series is based on the original work done
* Jens Axboe:
> +/*
> + * Filled with the offset for mmap(2)
> + */
> +struct io_sqring_offsets {
> + __u32 head;
> + __u32 tail;
> + __u32 ring_mask;
> + __u32 ring_entries;
> + __u32 flags;
> + __u32 dropped;
> + __u32 array;
> + __u32 resv[3];
> +};
> +
> +struct
On Thu, Jan 31, 2019 at 3:31 PM Bart Van Assche wrote:
>
> On Thu, 2019-01-31 at 14:13 -0800, Evan Green wrote:
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index cf5538942834..a1ba555e3b92 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -458,8 +458,
On Fri, Feb 01, 2019 at 06:23:27PM +0100, Jann Horn wrote:
> > Oh, yuck. Uuuh... can we make "struct files_struct" doubly-refcounted,
> > like "struct mm_struct"? One reference type to keep the contents
> > intact (the reference type you normally use, and the type used by
> > uring when the thread
Updated reply, see below.
On 2018-09-03 4:34 a.m., Dror Levin wrote:
On Sun, Sep 2, 2018 at 8:55 PM Linus Torvalds
wrote:
On Sun, Sep 2, 2018 at 4:44 AM Richard Weinberger
wrote:
CC'ing relevant people. Otherwise your mail might get lost.
Indeed.
Sorry for that.
On Sun, Sep 2, 2018 a
On Fri, Feb 1, 2019 at 6:04 PM Jann Horn wrote:
>
> On Fri, Feb 1, 2019 at 5:57 PM Matt Mullins wrote:
> > On Tue, 2019-01-29 at 00:59 +0100, Jann Horn wrote:
> > > On Tue, Jan 29, 2019 at 12:47 AM Jens Axboe wrote:
> > > > On 1/28/19 3:32 PM, Jann Horn wrote:
> > > > > On Mon, Jan 28, 2019 at 1
On Fri, Feb 1, 2019 at 5:57 PM Matt Mullins wrote:
> On Tue, 2019-01-29 at 00:59 +0100, Jann Horn wrote:
> > On Tue, Jan 29, 2019 at 12:47 AM Jens Axboe wrote:
> > > On 1/28/19 3:32 PM, Jann Horn wrote:
> > > > On Mon, Jan 28, 2019 at 10:35 PM Jens Axboe wrote:
> > > > > The submission queue (SQ
On Fri, 2019-02-01 at 08:24 -0700, Jens Axboe wrote:
> +int iomap_dio_iopoll(struct kiocb *kiocb, bool spin)
> +{
> + struct request_queue *q = READ_ONCE(kiocb->private);
> +
> + if (!q)
> + return 0;
> + return blk_poll(q, READ_ONCE(kiocb->ki_cookie), spin);
> +}
> +EXPORT_
On Tue, 2019-01-29 at 00:59 +0100, Jann Horn wrote:
> On Tue, Jan 29, 2019 at 12:47 AM Jens Axboe wrote:
> > On 1/28/19 3:32 PM, Jann Horn wrote:
> > > On Mon, Jan 28, 2019 at 10:35 PM Jens Axboe wrote:
> > > > The submission queue (SQ) and completion queue (CQ) rings are shared
> > > > between t
On Fri, Feb 01, 2019 at 05:03:40PM +0100, Heinz Mauelshagen wrote:
> On 2/1/19 3:09 PM, John Dorminy wrote:
> > I didn't know such a thing existed... does it work on any block
> > device? Where do I read more about this?
>
>
> Use sg_write_same(8) from package sg3_utils.
>
> For instance 'sg_wri
On 2/1/19 3:09 PM, John Dorminy wrote:
I didn't know such a thing existed... does it work on any block
device? Where do I read more about this?
Use sg_write_same(8) from package sg3_utils.
For instance 'sg_write_same --in=foobarfile --lba=0 --num=2
--xferlen=512 /dev/sdwhatever'
will r
On 1/31/19 6:48 PM, John Garry wrote:
On 30/01/2019 12:43, Thomas Gleixner wrote:
On Wed, 30 Jan 2019, John Garry wrote:
On 29/01/2019 17:20, Keith Busch wrote:
On Tue, Jan 29, 2019 at 05:12:40PM +, John Garry wrote:
On 29/01/2019 15:44, Keith Busch wrote:
Hm, we used to freeze the queu
On 1/24/19 3:25 AM, Jianchao Wang wrote:
> Hi Jens
>
> These two patches are small optimization for accessing the queue mapping
> in hot path. It saves the queue mapping results into blk_mq_ctx directly,
> then we needn't do the complicated bounce on queue_hw_ctx[] map[] and
> mq_map[].
Doing som
Some uses cases repeatedly get and put references to the same file, but
the only exposed interface is doing these one at the time. As each of
these entail an atomic inc or dec on a shared structure, that cost can
add up.
Add fget_many(), which works just like fget(), except it takes an
argument fo
This is basically a direct port of bfe4037e722e, which implements a
one-shot poll command through aio. Description below is based on that
commit as well. However, instead of adding a POLL command and relying
on io_cancel(2) to remove it, we mimic the epoll(2) interface of
having a command to add a
From: Christoph Hellwig
Just call blk_poll on the iocb cookie, we can derive the block device
from the inode trivially.
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 10 ++
1 file changed, 10 insertions(+)
diff --git
Right now we punt any buffered request that ends up triggering an
-EAGAIN to an async workqueue. This works fine in terms of providing
async execution of them, but it also can create quite a lot of work
queue items. For sequentially buffered IO, it's advantageous to
serialize the issue of them. For
Add support for a polled io_uring context. When a read or write is
submitted to a polled context, the application must poll for completions
on the CQ ring through io_uring_enter(2). Polled IO may not generate
IRQ completions, hence they need to be actively found by the application
itself.
To use p
Here's v11 of the io_uring project. Main fixes in this release is a
rework of how we grab the ctx->uring_lock, never using trylock for it in
a user visible way. Outside of that, fixes around locking for the polled
list when we hit -EAGAIN conditions on IO submit. This fixes list
corruption issues w
For the upcoming async polled IO, we can't sleep allocating requests.
If we do, then we introduce a deadlock where the submitter already
has async polled IO in-flight, but can't wait for them to complete
since polled requests must be active found and reaped.
Utilize the helper in the blockdev DIRE
The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.
IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_sqe
From: Christoph Hellwig
Store the request queue the last bio was submitted to in the iocb
private data in addition to the cookie so that we find the right block
device. Also refactor the common direct I/O bio submission code into a
nice little helper.
Signed-off-by: Christoph Hellwig
Modified
From: Christoph Hellwig
This new methods is used to explicitly poll for I/O completion for an
iocb. It must be called for any iocb submitted asynchronously (that
is with a non-null ki_complete) which has the IOCB_HIPRI flag set.
The method is assisted by a new ki_cookie field in struct iocb to
From: Christoph Hellwig
Add a new fsync opcode, which either syncs a range if one is passed,
or the whole file if the offset and length fields are both cleared
to zero. A flag is provided to use fdatasync semantics, that is only
force out metadata which is required to retrieve the file data, but
We normally have to fget/fput for each IO we do on a file. Even with
the batching we do, the cost of the atomic inc/dec of the file usage
count adds up.
This adds IORING_REGISTER_FILES, and IORING_UNREGISTER_FILES opcodes
for the io_uring_register(2) system call. The arguments passed in must
be an
For an ITER_BVEC, we can just iterate the iov and add the pages
to the bio directly. This requires that the caller doesn't releases
the pages on IO completion, we add a BIO_NO_PAGE_REF flag for that.
The current two callers of bio_iov_iter_get_pages() are updated to
check if they need to release p
If we have fixed user buffers, we can map them into the kernel when we
setup the io_context. That avoids the need to do get_user_pages() for
each and every IO.
To utilize this feature, the application must call io_uring_register()
after having setup an io_uring context, passing in
IORING_REGISTER_
This enables an application to do IO, without ever entering the kernel.
By using the SQ ring to fill in new sqes and watching for completions
on the CQ ring, we can submit and reap IOs without doing a single system
call. The kernel side thread will poll for new submissions, and in case
of HIPRI/pol
Add a separate io_submit_state structure, to cache some of the things
we need for IO submission.
One such example is file reference batching. io_submit_state. We get as
many references as the number of sqes we are submitting, and drop
unused ones if we end up switching files. The assumption here i
We'll use this for the POLL implementation. Regular requests will
NOT be using references, so initialize it to 0. Any real use of
the io_kiocb ref will initialize it to at least 2.
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/io_uring.c | 8 ++--
1 file changed, 6 inserti
Similarly to how we use the state->ios_left to know how many references
to get to a file, we can use it to allocate the io_kiocb's we need in
bulk.
Signed-off-by: Jens Axboe
---
fs/io_uring.c | 45 ++---
1 file changed, 38 insertions(+), 7 deletions(-)
di
Add hint on whether a read was served out of the page cache, or if it
hit media. This is useful for buffered async IO, O_DIRECT reads would
never have this set (for obvious reasons).
If the read hit page cache, cqe->flags will have IOCQE_FLAG_CACHEHIT
set.
Signed-off-by: Jens Axboe
---
fs/io_ur
Hannes, all,
On Mon, 2019-01-28 at 14:54 +0100, Martin Wilck wrote:
> On Sat, 2019-01-26 at 11:09 +0100, Hannes Reinecke wrote:
> > On 1/18/19 10:32 PM, Martin Wilck wrote:
> > > Currently, an empty disk->events field tells the block layer not
> > > to
> > > forward
> > > media change events to us
On 2/1/19 12:55 AM, Christoph Hellwig wrote:
> The only real user of the T10 OSD protocol, the pNFS object layout
> driver never went to the point of having shipping products, and we
> removed it 1.5 years ago. Exofs is just a simple example without
> real life users.
>
> The code has been mostly
I didn't know such a thing existed... does it work on any block
device? Where do I read more about this?
On Fri, Feb 1, 2019 at 2:35 AM Christoph Hellwig wrote:
>
> On Thu, Jan 31, 2019 at 02:41:52PM -0500, John Dorminy wrote:
> > > On Wed, Jan 30, 2019 at 09:08:50AM -0500, John Dorminy wrote:
>
On Thu, Jan 31, 2019 at 7:53 AM syzbot
wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:02495e76ded5 Add linux-next specific files for 20190130
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=172ed528c0
> kernel config: ht
On 01/31/19 22:14 PM, Matias Bjørling wrote:
> On 1/30/19 2:53 AM, 김찬솔 wrote:
>>
>> Changes:
>> 1. Function pblk_rw_io to get bio* as a reference
>> 2. In pblk_rw_io bio_put call on read case removed
>>
>> A fix to address issue where
>> 1. pblk_make_rq calls pblk_rw_io passes bio* pointe
For some reason patch 5 didn't make it to my inbox, but assuming
nothing has changed this whole series looks good to me now.
55 matches
Mail list logo