On Fri, Feb 08, 2019 at 12:44:14AM +, Derrick, Jonathan wrote:
> On Thu, 2019-02-07 at 23:56 +0100, David Kozub wrote:
> > On Mon, 4 Feb 2019, Christoph Hellwig wrote:
> >
> > > On Fri, Feb 01, 2019 at 09:50:17PM +0100, David Kozub wrote:
> > > > From: Jonas Rabenstein
> > > >
> > > > Enable
On Thu, 2019-02-07 at 23:56 +0100, David Kozub wrote:
> On Mon, 4 Feb 2019, Christoph Hellwig wrote:
>
> > On Fri, Feb 01, 2019 at 09:50:17PM +0100, David Kozub wrote:
> > > From: Jonas Rabenstein
> > >
> > > Enable users to mark the shadow mbr as done without completely
> > > deactivating the s
The pull request you sent on Thu, 7 Feb 2019 11:58:40 -0500:
> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
> tags/for-5.0/dm-fixes-2
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/8b5cdbe595a05b8c767d9fe779bd47e997f934c9
Thank you!
--
D
On Mon, 4 Feb 2019, Christoph Hellwig wrote:
On Fri, Feb 01, 2019 at 09:50:17PM +0100, David Kozub wrote:
From: Jonas Rabenstein
Enable users to mark the shadow mbr as done without completely
deactivating the shadow mbr feature. This may be useful on reboots,
when the power to the disk is not
On 2/7/19 3:38 PM, Jeff Moyer wrote:
> Hi, Jens,
>
> Jens Axboe writes:
>
>> For now, buffers must not be file backed. If file backed buffers are
>> passed in, the registration will fail with -1/EOPNOTSUPP. This
>> restriction may be relaxed in the future.
>
> [...]
>
>> +down_writ
Hi, Jens,
Jens Axboe writes:
> For now, buffers must not be file backed. If file backed buffers are
> passed in, the registration will fail with -1/EOPNOTSUPP. This
> restriction may be relaxed in the future.
[...]
> + down_write(¤t->mm->mmap_sem);
> + pret = get_user_p
On Fri, Jan 25, 2019 at 05:53:47PM +0800, Ming Lei wrote:
> Now allocating interrupt sets can be done via .setup_affinity()
> easily, so remove the support for allocating interrupt sets.
>
> With this change, we don't need the limit of 'minvec == maxvec'
> any more in pci_alloc_irq_vectors_affinit
On Fri, Jan 25, 2019 at 05:53:44PM +0800, Ming Lei wrote:
> This patch introduces callback of .setup_affinity into 'struct
> irq_affinity', so that:
>
> 1) allow drivers to customize the affinity for managed IRQ, for
> example, now NVMe has special requirement for read queues & poll
> queues
>
>
On 2/7/19 3:12 PM, Jeff Moyer wrote:
> Hi, Jens,
>
>> +static int io_poll_add(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>> +{
>
> [...]
>
>> +/* one for removal from waitqueue, one for this function */
>> +refcount_set(&req->refs, 2);
>> +
>> +mask = vfs_poll(poll->file,
Hi, Jens,
> +static int io_poll_add(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
[...]
> + /* one for removal from waitqueue, one for this function */
> + refcount_set(&req->refs, 2);
> +
> + mask = vfs_poll(poll->file, &ipt.pt) & poll->events;
> + if (unlikely(!pol
On Thu, 2019-02-07 at 12:57 -0800, Evan Green wrote:
> Properly plumb out EOPNOTSUPP from loop driver operations, which may
> get returned when for instance a discard operation is attempted but not
> supported by the underlying block device. Before this change, everything
> was reported in the log
On Fri, Jan 25, 2019 at 05:53:43PM +0800, Ming Lei wrote:
> 'node_to_cpumask' is just one temparay variable for
> irq_build_affinity_masks(),
> so move it into irq_build_affinity_masks().
>
> No functioanl change.
s/temparay/temporary/
s/functioanl/functional/
> Signed-off-by: Ming Lei
Nice p
+ Mailing lists
> On 7 Feb 2019, at 18.48, Javier González wrote:
>
>
>
>> On 7 Feb 2019, at 18.12, Stephen Bates wrote:
>>
>> Hi All
>>
>>> A BPF track will join the annual LSF/MM Summit this year! Please read the
>>> updated description and CFP information below.
>>
>> Well if we are ad
On 2/7/19 1:57 PM, Jeff Moyer wrote:
> Hi, Jens,
>
> Jens Axboe writes:
>
>> +static int io_sqe_buffer_unregister(struct io_ring_ctx *ctx)
>> +{
>> +int i, j;
>> +
>> +if (!ctx->user_bufs)
>> +return -ENXIO;
>> +
>> +for (i = 0; i < ctx->sq_entries; i++) {
>> +
Hi, Jens,
Jens Axboe writes:
> +static int io_sqe_buffer_unregister(struct io_ring_ctx *ctx)
> +{
> + int i, j;
> +
> + if (!ctx->user_bufs)
> + return -ENXIO;
> +
> + for (i = 0; i < ctx->sq_entries; i++) {
> + struct io_mapped_ubuf *imu = &ctx->user_bufs[i];
This series addresses some errors seen when using the loop
device directly backed by a block device. The first change plumbs
out the correct error message, and the second change prevents the
error from occurring in many cases.
The errors look like this:
[ 90.880875] print_req_error: I/O error, d
If the backing device for a loop device is a block device,
then mirror the discard properties of the underlying block
device into the loop device. While in there, differentiate
between REQ_OP_DISCARD and REQ_OP_WRITE_ZEROES, which are
different for block devices, but which the loop device had
just
Properly plumb out EOPNOTSUPP from loop driver operations, which may
get returned when for instance a discard operation is attempted but not
supported by the underlying block device. Before this change, everything
was reported in the log as an I/O error, which is scary and not
helpful in debugging.
On 2/7/19 1:15 PM, Keith Busch wrote:
> On Thu, Feb 07, 2019 at 12:55:39PM -0700, Jens Axboe wrote:
>> IO submissions use the io_uring_sqe data structure, and completions
>> are generated in the form of io_uring_sqe data structures.
> ^^^
>
> Completions us
On Thu, Feb 07, 2019 at 12:55:39PM -0700, Jens Axboe wrote:
> IO submissions use the io_uring_sqe data structure, and completions
> are generated in the form of io_uring_sqe data structures.
^^^
Completions use _cqe, right?
On 2/7/19 3:55 AM, Jan Kara wrote:
> Currently, blktrace will not show requests that don't have any data as
> rq->__sector is initialized to -1 which is out of device range and thus
> discarded by act_log_check(). This is most notably the case for cache
> flush requests sent to the device. Fix the
For an ITER_BVEC, we can just iterate the iov and add the pages
to the bio directly. This requires that the caller doesn't releases
the pages on IO completion, we add a BIO_NO_PAGE_REF flag for that.
The current two callers of bio_iov_iter_get_pages() are updated to
check if they need to release p
From: Christoph Hellwig
Store the request queue the last bio was submitted to in the iocb
private data in addition to the cookie so that we find the right block
device. Also refactor the common direct I/O bio submission code into a
nice little helper.
Signed-off-by: Christoph Hellwig
Modified
From: Christoph Hellwig
This new methods is used to explicitly poll for I/O completion for an
iocb. It must be called for any iocb submitted asynchronously (that
is with a non-null ki_complete) which has the IOCB_HIPRI flag set.
The method is assisted by a new ki_cookie field in struct iocb to
This is basically a direct port of bfe4037e722e, which implements a
one-shot poll command through aio. Description below is based on that
commit as well. However, instead of adding a POLL command and relying
on io_cancel(2) to remove it, we mimic the epoll(2) interface of
having a command to add a
Right now we punt any buffered request that ends up triggering an
-EAGAIN to an async workqueue. This works fine in terms of providing
async execution of them, but it also can create quite a lot of work
queue items. For sequentially buffered IO, it's advantageous to
serialize the issue of them. For
For the upcoming async polled IO, we can't sleep allocating requests.
If we do, then we introduce a deadlock where the submitter already
has async polled IO in-flight, but can't wait for them to complete
since polled requests must be active found and reaped.
Utilize the helper in the blockdev DIRE
From: Christoph Hellwig
Just call blk_poll on the iocb cookie, we can derive the block device
from the inode trivially.
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 10 ++
1 file changed, 10 insertions(+)
diff --git
Similarly to how we use the state->ios_left to know how many references
to get to a file, we can use it to allocate the io_kiocb's we need in
bulk.
Signed-off-by: Jens Axboe
---
fs/io_uring.c | 45 ++---
1 file changed, 38 insertions(+), 7 deletions(-)
di
Some uses cases repeatedly get and put references to the same file, but
the only exposed interface is doing these one at the time. As each of
these entail an atomic inc or dec on a shared structure, that cost can
add up.
Add fget_many(), which works just like fget(), except it takes an
argument fo
If we have fixed user buffers, we can map them into the kernel when we
setup the io_context. That avoids the need to do get_user_pages() for
each and every IO.
To utilize this feature, the application must call io_uring_register()
after having setup an io_uring context, passing in
IORING_REGISTER_
Here's v12 of the io_uring project. This is the Al Viro special, where
Al tries to beat into my head how UNIX fd passing will mess you up. I
think we have all cases handled now. I've added the resulting test case
into the liburing test/ directory.
Outside of that, various little cleanups and fixes
Add hint on whether a read was served out of the page cache, or if it
hit media. This is useful for buffered async IO, O_DIRECT reads would
never have this set (for obvious reasons).
If the read hit page cache, cqe->flags will have IOCQE_FLAG_CACHEHIT
set.
Signed-off-by: Jens Axboe
---
fs/io_ur
Add a separate io_submit_state structure, to cache some of the things
we need for IO submission.
One such example is file reference batching. io_submit_state. We get as
many references as the number of sqes we are submitting, and drop
unused ones if we end up switching files. The assumption here i
We normally have to fget/fput for each IO we do on a file. Even with
the batching we do, the cost of the atomic inc/dec of the file usage
count adds up.
This adds IORING_REGISTER_FILES, and IORING_UNREGISTER_FILES opcodes
for the io_uring_register(2) system call. The arguments passed in must
be an
We'll use this for the POLL implementation. Regular requests will
NOT be using references, so initialize it to 0. Any real use of
the io_kiocb ref will initialize it to at least 2.
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/io_uring.c | 8 ++--
1 file changed, 6 inserti
This enables an application to do IO, without ever entering the kernel.
By using the SQ ring to fill in new sqes and watching for completions
on the CQ ring, we can submit and reap IOs without doing a single system
call. The kernel side thread will poll for new submissions, and in case
of HIPRI/pol
From: Christoph Hellwig
Add a new fsync opcode, which either syncs a range if one is passed,
or the whole file if the offset and length fields are both cleared
to zero. A flag is provided to use fdatasync semantics, that is only
force out metadata which is required to retrieve the file data, but
Add support for a polled io_uring context. When a read or write is
submitted to a polled context, the application must poll for completions
on the CQ ring through io_uring_enter(2). Polled IO may not generate
IRQ completions, hence they need to be actively found by the application
itself.
To use p
The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.
IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_sqe
On Thu, Feb 7, 2019 at 5:26 PM Al Viro wrote:
> I'm trying to put together some formal description of what's going on in
> there.
> Another question, BTW: updates of user->unix_inflight would seem to be movable
> into the callers of unix_{not,}inflight(). Any objections against lifting
> it int
On 2/7/19 11:45 AM, Jens Axboe wrote:
> On 2/6/19 9:00 PM, Al Viro wrote:
>> On Wed, Feb 06, 2019 at 06:41:00AM -0700, Jens Axboe wrote:
>>> On 2/5/19 5:56 PM, Al Viro wrote:
On Tue, Feb 05, 2019 at 12:08:25PM -0700, Jens Axboe wrote:
> Proof is in the pudding, here's the main commit intro
On 2/6/19 9:00 PM, Al Viro wrote:
> On Wed, Feb 06, 2019 at 06:41:00AM -0700, Jens Axboe wrote:
>> On 2/5/19 5:56 PM, Al Viro wrote:
>>> On Tue, Feb 05, 2019 at 12:08:25PM -0700, Jens Axboe wrote:
Proof is in the pudding, here's the main commit introducing io_uring
and now wiring it up to
Hi All
> A BPF track will join the annual LSF/MM Summit this year! Please read the
> updated description and CFP information below.
Well if we are adding BPF to LSF/MM I have to submit a request to discuss BPF
for block devices please!
There has been quite a bit of activity around the concept
Hi Linus,
Both of these fixes address issues in changes merged for 5.0-rc4.
The following changes since commit 8834f5600cf3c8db365e18a3d5cac2c2780c81e5:
Linux 5.0-rc5 (2019-02-03 13:48:04 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/device-ma
On 07/02/2019 11:44, Marc Gonzalez wrote:
> + linux-mm
>
> Summarizing the issue for linux-mm readers:
>
> If I read data from a storage device larger than my system's RAM, the system
> freezes
> once dd has read more data than available RAM.
>
> # dd if=/dev/sde of=/dev/null bs=1M & while tru
On Thu, Feb 07, 2019 at 04:30:47PM +, Al Viro wrote:
> Well, yes - once you receive it, you obviously have no references
> sitting in SCM_RIGHTS anymore.
>
> Get rid of recv_fd() there (along with fork(), while we are at it - what's
> it for?) and just do send_fd + these 3 close (or just exit,
On 2/7/19 9:30 AM, Al Viro wrote:
> On Thu, Feb 07, 2019 at 09:14:41AM -0700, Jens Axboe wrote:
>
>> I created a small app to do just that, and ran it and verified that
>> ->release() is called and the io_uring is released as expected. This
>> is run on the current -git branch, which has a socket
On Thu, Feb 07, 2019 at 09:14:41AM -0700, Jens Axboe wrote:
> I created a small app to do just that, and ran it and verified that
> ->release() is called and the io_uring is released as expected. This
> is run on the current -git branch, which has a socket backing for
> the io_uring fd itself, but
On Thu, Feb 07, 2019 at 04:27:13PM +0100, Miklos Szeredi wrote:
> On Thu, Feb 7, 2019 at 4:20 PM Al Viro wrote:
> >
> > On Thu, Feb 07, 2019 at 03:20:06PM +0100, Miklos Szeredi wrote:
> >
> > > > Am I right assuming that this queue-modifying operation is accept(),
> > > > removing
> > > > an embr
On 2/6/19 9:05 PM, Al Viro wrote:
> On Wed, Feb 06, 2019 at 10:56:41AM -0700, Jens Axboe wrote:
>> On 2/5/19 6:01 PM, Al Viro wrote:
>>> On Tue, Feb 05, 2019 at 05:27:29PM -0700, Jens Axboe wrote:
>>>
This should be better, passes some basic testing, too:
http://git.kernel.dk/cgit/li
Hi,
This is an important UPDATE to the previous LSF/MM announcement:
https://lore.kernel.org/linux-block/51b4b263-a0f2-113d-7bdc-f7960b540...@kernel.dk/
A BPF track will join the annual LSF/MM Summit this year! Please read
the updated description and CFP information below.
It will be held from
On Thu, Feb 7, 2019 at 4:20 PM Al Viro wrote:
>
> On Thu, Feb 07, 2019 at 03:20:06PM +0100, Miklos Szeredi wrote:
>
> > > Am I right assuming that this queue-modifying operation is accept(),
> > > removing
> > > an embryo unix_sock from the queue of listener and thus hiding SCM_RIGHTS
> > > in
>
On 2/7/19 5:44 AM, Aleksei Zakharov wrote:
> Hi all,
>
> I've found that changing some block queue settings leads to
> generic_make_request() latency spike.
> It is reproducable in linux-4.10 - linux-4.20.6.
> ~# /usr/share/bcc/tools/funclatency generic_make_request -i 1 -m
> Tracing 1 functions
On Thu, Feb 07, 2019 at 03:20:06PM +0100, Miklos Szeredi wrote:
> > Am I right assuming that this queue-modifying operation is accept(),
> > removing
> > an embryo unix_sock from the queue of listener and thus hiding SCM_RIGHTS in
> > _its_ queue from scan_children()?
>
> Hmm... How about just r
On Thu, Feb 7, 2019 at 2:31 PM Al Viro wrote:
>
> On Thu, Feb 07, 2019 at 10:22:53AM +0100, Miklos Szeredi wrote:
> > On Thu, Feb 07, 2019 at 04:00:59AM +, Al Viro wrote:
> >
> > > So in theory it would be possible to have
> > > * thread A: sendmsg() has SCM_RIGHTS created and populated,
>
On Thu, Feb 07, 2019 at 10:22:53AM +0100, Miklos Szeredi wrote:
> On Thu, Feb 07, 2019 at 04:00:59AM +, Al Viro wrote:
>
> > So in theory it would be possible to have
> > * thread A: sendmsg() has SCM_RIGHTS created and populated,
> > complete with file refcount and ->inflight increments i
In 'commit 752f66a75aba ("bcache: use REQ_PRIO to indicate bio for
metadata")' REQ_META is replaced by REQ_PRIO to indicate metadata bio.
This assumption is not always correct, e.g. XFS uses REQ_META to mark
metadata bio other than REQ_PRIO. This is why Nix noticed that bcache
does not cache metada
Hi all,
I've found that changing some block queue settings leads to
generic_make_request() latency spike.
It is reproducable in linux-4.10 - linux-4.20.6.
~# /usr/share/bcc/tools/funclatency generic_make_request -i 1 -m
Tracing 1 functions for "generic_make_request"... Hit Ctrl-C to end.
m
From: Joerg Roedel
This function returns the maximum segment size for a single
dma transaction of a virtio device. The possible limit comes
from the SWIOTLB implementation in the Linux kernel, that
has an upper limit of (currently) 256kb of contiguous
memory it can map. Other DMA-API implementati
From: Joerg Roedel
The function returns the maximum size that can be remapped
by the SWIOTLB implementation. This function will be later
exposed to users through the DMA-API.
Reviewed-by: Konrad Rzeszutek Wilk
Reviewed-by: Christoph Hellwig
Signed-off-by: Joerg Roedel
---
include/linux/swiot
From: Joerg Roedel
This function will be used from dma_direct code to determine
the maximum segment size of a dma mapping.
Reviewed-by: Konrad Rzeszutek Wilk
Reviewed-by: Christoph Hellwig
Signed-off-by: Joerg Roedel
---
include/linux/swiotlb.h | 6 ++
kernel/dma/swiotlb.c| 9 +++
From: Joerg Roedel
The function returns the maximum size that can be mapped
using DMA-API functions. The patch also adds the
implementation for direct DMA and a new dma_map_ops pointer
so that other implementations can expose their limit.
Reviewed-by: Konrad Rzeszutek Wilk
Reviewed-by: Christop
From: Joerg Roedel
Segments can't be larger than the maximum DMA mapping size
supported on the platform. Take that into account when
setting the maximum segment size for a block device.
Reviewed-by: Konrad Rzeszutek Wilk
Reviewed-by: Christoph Hellwig
Signed-off-by: Joerg Roedel
---
drivers/
Hi,
here is the next version of this patch-set. Previous
versions can be found here:
V1: https://lore.kernel.org/lkml/20190110134433.15672-1-j...@8bytes.org/
V2: https://lore.kernel.org/lkml/20190115132257.6426-1-j...@8bytes.org/
V3: https://lore.kernel.org/lkml/20190123
Currently, blktrace will not show requests that don't have any data as
rq->__sector is initialized to -1 which is out of device range and thus
discarded by act_log_check(). This is most notably the case for cache
flush requests sent to the device. Fix the problem by making
blk_rq_trace_sector() ret
On Wed 06-02-19 12:49:46, Jens Axboe wrote:
> On 2/6/19 5:04 AM, Jan Kara wrote:
> > Currently, blktrace will not show requests that don't have any data as
> > rq->__sector is initialized to -1 which is out of device range and thus
> > discarded by act_log_check(). This is most notably the case for
+ linux-mm
Summarizing the issue for linux-mm readers:
If I read data from a storage device larger than my system's RAM, the system
freezes
once dd has read more data than available RAM.
# dd if=/dev/sde of=/dev/null bs=1M & while true; do echo m >
/proc/sysrq-trigger; echo; echo; sleep 1; don
On Thu, Feb 07, 2019 at 04:00:59AM +, Al Viro wrote:
> So in theory it would be possible to have
> * thread A: sendmsg() has SCM_RIGHTS created and populated,
> complete with file refcount and ->inflight increments implied,
> at which point it gets preempted and loses the timeslice.
>
On Thu, Feb 07, 2019 at 09:46:13AM +0100, Joerg Roedel wrote:
> Hmm, I didn't get any kbuild emails for this series. Can you please
> forward it me so that I can look into it?
Nevermind, just found them in another inbox.
Joerg
Hi Michael,
On Tue, Feb 05, 2019 at 03:52:38PM -0500, Michael S. Tsirkin wrote:
> On Thu, Jan 31, 2019 at 05:33:58PM +0100, Joerg Roedel wrote:
> > Changes to v5 are:
> >
> > - Changed patch 3 to uninline dma_max_mapping_size()
>
> And this lead to problems reported by kbuild :(
Hmm, I didn
71 matches
Mail list logo