On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote:
> This all seems quite complicated.
>
> I think the interface we'd want is more one that has a little
> cache of a single page in the queue, and a little bitmap which
> sub-page size blocks of it are used.
>
> Something like
On Thu, Oct 18, 2018 at 04:05:51PM +0200, Christoph Hellwig wrote:
> On Thu, Oct 18, 2018 at 07:03:42AM -0700, Matthew Wilcox wrote:
> > Before we go down this road, could we have a discussion about what
> > hardware actually requires this? Storage has this weird assumption tha
On Thu, Oct 18, 2018 at 09:18:12PM +0800, Ming Lei wrote:
> Hi,
>
> Filesystems may allocate io buffer from slab, and use this buffer to
> submit bio. This way may break storage drivers if they have special
> requirement on DMA alignment.
Before we go down this road, could we have a discussion
On Wed, May 09, 2018 at 09:48:04AM +0200, Christoph Hellwig wrote:
> That way file systems don't have to go spotting for non-contiguous pages
> and work around them. It also kicks off I/O earlier, allowing it to
> finish earlier and reduce latency.
Makes sense.
> + /*
> +
On Wed, May 09, 2018 at 09:48:03AM +0200, Christoph Hellwig wrote:
> It counts the number of pages acted on, so name it nr_pages to make that
> obvious.
>
> Signed-off-by: Christoph Hellwig
Yes!
Also, it can't return an error, so how about changing it to unsigned int?
And deleting
On Wed, May 09, 2018 at 09:47:59AM +0200, Christoph Hellwig wrote:
> }
> EXPORT_SYMBOL(generic_write_end);
>
> +
> /*
Spurious?
On Wed, May 09, 2018 at 09:47:58AM +0200, Christoph Hellwig wrote:
> +/**
> + * __bio_try_merge_page - try adding data to an existing bvec
> + * @bio: destination bio
> + * @page: page to add
> + * @len: length of the range to add
> + * @off: offset into @page
> + *
> + * Try adding the data
On Thu, May 03, 2018 at 02:24:58PM -0600, Jens Axboe wrote:
> On 5/3/18 2:15 PM, Adam Manzanares wrote:
> > On 5/3/18 11:33 AM, Matthew Wilcox wrote:
> >> Or we could just make ki_hint a u8 or u16 ... seems unlikely we'll need
> >> 32 bits of ki_hint. (currently defined
On Thu, May 03, 2018 at 11:21:14AM -0700, adam.manzana...@wdc.com wrote:
> If we want to avoid bloating struct kiocb, I suggest we turn the private
> field
> into a union of the private and ki_ioprio field. It seems like the users of
> the private field all use it at a point where we can yank
On Thu, May 03, 2018 at 12:05:14PM -0400, Jeff Layton wrote:
> On Thu, 2018-05-03 at 16:42 +0200, Jan Kara wrote:
> > On Wed 25-04-18 17:07:48, Fabiano Rosas wrote:
> > > I'm looking into an issue where removing a virtio disk via sysfs while
> > > another
> > > process is issuing write() calls
I hate renting unnecessary cars, and the various transportation companies
offer a better deal if multiple people book at once.
I'm scheduled to arrive on Sunday at 3:18pm local time if anyone wants to
share transport. Does anyone have a wiki we can use to coordinate this?
On Thu, Apr 19, 2018 at 04:15:02PM -0400, Jerome Glisse wrote:
> On Thu, Apr 19, 2018 at 12:56:37PM -0700, Matthew Wilcox wrote:
> > > Well scratch that whole idea, i would need to add a new array to task
> > > struct which make it a lot less appealing. Hence a better solutio
On Thu, Apr 19, 2018 at 03:31:08PM -0400, Jerome Glisse wrote:
> > > Basicly i want a callback in __fd_install(), do_dup2(), dup_fd() and
> > > add void * *private_data; to struct fdtable (also a default array to
> > > struct files_struct). The callback would be part of struct
> > >
On Thu, Apr 19, 2018 at 10:38:25AM -0400, Jerome Glisse wrote:
> Oh can i get one more small slot for fs ? I want to ask if they are
> any people against having a callback everytime a struct file is added
> to a task_struct and also having a secondary array so that special
> file like device file
On Wed, Apr 18, 2018 at 12:34:03PM +0200, Christoph Hellwig wrote:
> s/blk/block/ for block patches.
I think this is something we should put in MAINTAINERS. Eventually
some tooling can pull it out, but I don't think this is something
that people can reasonably be expected to know.
diff --git
On Mon, Apr 09, 2018 at 05:39:16PM +0200, Christoph Hellwig wrote:
> blk_get_request is used for pass-through style I/O and thus doesn't need
> GFP_NOIO.
Obviously GFP_KERNEL is a big improvement over GFP_NOIO! But can we take
it all the way to GFP_USER, if this is always done in the ioctl path
On Mon, Apr 09, 2018 at 05:39:15PM +0200, Christoph Hellwig wrote:
> Same numerical value (for now at least), but a much better documentation
> of intent.
> @@ -499,7 +499,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk
> *disk, fmode_t mode,
> break;
> }
>
>
On Mon, Apr 09, 2018 at 01:26:50AM -0700, Christoph Hellwig wrote:
> On Mon, Apr 09, 2018 at 08:53:49AM +0200, Hannes Reinecke wrote:
> > Why don't you fold the 'flags' argument into the 'gfp_flags', and drop
> > the 'flags' argument completely?
> > Looks a bit pointless to me, having two
On Sun, Apr 08, 2018 at 04:40:59PM +, Bart Van Assche wrote:
> __GFP_KSWAPD_RECLAIM wasn't stripped off on purpose for non-atomic
> allocations. That was an oversight.
OK, good.
> Do you perhaps want me to prepare a patch that makes blk_get_request() again
> respect the full gfp mask passed
Please explain:
commit 6a15674d1e90917f1723a814e2e8c949000440f7
Author: Bart Van Assche
Date: Thu Nov 9 10:49:54 2017 -0800
block: Introduce blk_get_request_flags()
A side effect of this patch is that the GFP mask that is passed to
several allocation
On Wed, Mar 28, 2018 at 03:18:52PM +0200, David Sterba wrote:
> On Mon, Mar 26, 2018 at 05:04:21PM -0700, Matthew Wilcox wrote:
> > On Mon, Mar 26, 2018 at 04:16:26PM -0700, Omar Sandoval wrote:
> > > Even after the previous patch to drop lo_ctl_mutex while calling
On Mon, Mar 26, 2018 at 04:16:26PM -0700, Omar Sandoval wrote:
> Even after the previous patch to drop lo_ctl_mutex while calling
> vfs_getattr(), there are other cases where we can end up sleeping for a
> long time while holding lo_ctl_mutex. Let's avoid the uninterruptible
> sleep from the
On Thu, Mar 15, 2018 at 09:56:46AM -0700, Joe Perches wrote:
> I have a patchset that creates a vsprintf extension for
> print_vma_addr and removes all the uses similar to the
> print_symbol() removal.
>
> This now avoids any possible printk interleaving.
>
> Unfortunately, without some #ifdef
On Mon, Jan 22, 2018 at 08:28:54PM -0700, Jens Axboe wrote:
> On 1/22/18 8:18 PM, Goldwyn Rodrigues wrote:
> >> that their application was "already broken". I'd hate for a kernel
> >> upgrade to break them.
> >>
> >> I do wish we could make the change, and maybe we can. But it probably
> >> needs
On Wed, Jan 17, 2018 at 10:49:24AM +0800, Ming Lei wrote:
> Userfaultfd might be another choice:
>
> 1) map the block LBA space into a range of process vm space
That would limit the size of a block device to ~200TB (with my laptop's
CPU). That's probably OK for most users, but I suspect there
I see the improvements that Facebook have been making to the nbd driver,
and I think that's a wonderful thing. Maybe the outcome of this topic
is simply: "Shut up, Matthew, this is good enough".
It's clear that there's an appetite for userspace block devices; not for
swap devices or the root
On Sat, Dec 30, 2017 at 06:00:57PM -0500, Theodore Ts'o wrote:
> On Sat, Dec 30, 2017 at 05:40:28PM -0500, Theodore Ts'o wrote:
> > On Sat, Dec 30, 2017 at 12:44:17PM -0800, Matthew Wilcox wrote:
> > >
> > > I'm not sure I agree with this part. What if
On Sat, Dec 30, 2017 at 10:40:41AM -0500, Theodore Ts'o wrote:
> On Fri, Dec 29, 2017 at 10:16:24PM -0800, Matthew Wilcox wrote:
> > > The problems come from wrong classification. Waiters either classfied
> > > well or invalidated properly won't bitrot.
> >
> > I
On Fri, Dec 29, 2017 at 04:28:51PM +0900, Byungchul Park wrote:
> On Thu, Dec 28, 2017 at 10:51:46PM -0500, Theodore Ts'o wrote:
> > On Fri, Dec 29, 2017 at 10:47:36AM +0900, Byungchul Park wrote:
> > >
> > >(1) The best way: To classify all waiters correctly.
> >
> > It's really not all
On Tue, Dec 12, 2017 at 08:03:43AM -0500, Theodore Ts'o wrote:
> On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote:
> > The *problem* is false positives, since locks and waiters in
> > kernel are not classified properly, at the moment, which is just
> > a fact that is not related to
On Tue, Dec 05, 2017 at 03:19:46PM +0900, Byungchul Park wrote:
> On 12/5/2017 2:46 PM, Byungchul Park wrote:
> > On 12/5/2017 2:30 PM, Matthew Wilcox wrote:
> > > On Mon, Dec 04, 2017 at 02:16:19PM +0900, Byungchul Park wrote:
> > > > For now, wait_for_completion() /
On Mon, Dec 04, 2017 at 02:16:19PM +0900, Byungchul Park wrote:
> For now, wait_for_completion() / complete() works with lockdep, add
> lock_page() / unlock_page() and its family to lockdep support.
>
> Changes from v1
> - Move lockdep_map_cross outside of page_ext to make it flexible
> -
On Tue, Oct 03, 2017 at 10:05:11PM +0200, Luis R. Rodriguez wrote:
> On Wed, Oct 04, 2017 at 03:33:01AM +0800, Ming Lei wrote:
> > On Tue, Oct 03, 2017 at 11:53:08AM -0700, Luis R. Rodriguez wrote:
> > > INFO: task kworker/u8:8:1320 blocked for more than 10 seconds.
> > > Tainted: G
From: Jeff Layton [mailto:jlay...@poochiereds.net]
> On Thu, 2017-06-29 at 10:11 -0700, Darrick J. Wong wrote:
> > On Thu, Jun 29, 2017 at 09:19:48AM -0400, jlay...@kernel.org wrote:
> > > +Handling errors during writeback
> > > +
> > > +Most applications that
On Mon, Jun 26, 2017 at 08:09:59PM +0800, Ming Lei wrote:
> bio_for_each_segment_all(bvec, bio, i) {
> - org_vec = bio_orig->bi_io_vec + i + start;
> -
> - if (bvec->bv_page == org_vec->bv_page)
> - continue;
> + orig_vec =
t;
> Reviewed-by: Christoph Hellwig <h...@lst.de>
> Reviewed-by: Jan Kara <j...@suse.cz>
Reviewed-by: Matthew Wilcox <mawil...@microsoft.com>
On Wed, Mar 15, 2017 at 04:51:02PM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues
>
> Find out if the write will trigger a wait due to writeback. If yes,
> return -EAGAIN.
>
> This introduces a new function filemap_range_has_page() which
> returns true if the
On Wed, Mar 15, 2017 at 04:51:02PM -0500, Goldwyn Rodrigues wrote:
> This introduces a new function filemap_range_has_page() which
> returns true if the file's mapping has a page within the range
> mentioned.
I thought you were going to replace this patch with one that starts
writeback for these
On Thu, Mar 02, 2017 at 11:38:45AM +0100, Jan Kara wrote:
> On Wed 01-03-17 07:38:57, Christoph Hellwig wrote:
> > On Tue, Feb 28, 2017 at 07:46:06PM -0800, Matthew Wilcox wrote:
> > > But what's going to kick these pages out of cache? Shouldn't we rather
> > > fi
On Tue, Feb 28, 2017 at 05:36:05PM -0600, Goldwyn Rodrigues wrote:
> Find out if the write will trigger a wait due to writeback. If yes,
> return -EAGAIN.
>
> This introduces a new function filemap_range_has_page() which
> returns true if the file's mapping has a page within the range
>
On Thu, Jan 26, 2017 at 02:57:50PM +0300, Kirill A. Shutemov wrote:
> Most of work happans on head page. Only when we need to do copy data to
> userspace we find relevant subpage.
>
> We are still limited by PAGE_SIZE per iteration. Lifting this limitation
> would require some more work.
Now
On Mon, Feb 13, 2017 at 08:01:17AM -0800, Matthew Wilcox wrote:
> On Mon, Feb 13, 2017 at 06:33:42PM +0300, Kirill A. Shutemov wrote:
> > No. pagecache_get_page() returns subpage. See description of the first
> > patch.
Oh, I re-read patch 1 and it made sense now. I misse
On Mon, Feb 13, 2017 at 06:33:42PM +0300, Kirill A. Shutemov wrote:
> No. pagecache_get_page() returns subpage. See description of the first
> patch.
Your description says:
> We also change interface for page-cache lookup function:
>
> - functions that lookup for pages[1] would return
On Thu, Jan 26, 2017 at 02:57:57PM +0300, Kirill A. Shutemov wrote:
> Slab pages can be compound, but we shouldn't threat them as THP for
> pupose of hpage_* helpers, otherwise it would lead to confusing results.
>
> For instance, ext4 uses slab pages for journal pages and we shouldn't
> confuse
On Thu, Jan 26, 2017 at 02:57:58PM +0300, Kirill A. Shutemov wrote:
> We want mmap(NULL) to return PMD-aligned address if the inode can have
> huge pages in page cache.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com>
Reviewed-by: Matthew Wilcox <mawil...@microsoft.com>
On Thu, Jan 26, 2017 at 02:57:54PM +0300, Kirill A. Shutemov wrote:
> Do not assume length of bio segment is never larger than PAGE_SIZE.
> With huge pages it's HPAGE_PMD_SIZE (2M on x86-64).
I don't think we even need hugepages for BRD to be buggy. I think there are
already places which
rs on x86-64 -- 'arr' is allocated with kmalloc() for
> huge pages.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com>
Reviewed-by: Matthew Wilcox <mawil...@microsoft.com>
On Thu, Jan 26, 2017 at 02:57:55PM +0300, Kirill A. Shutemov wrote:
> We writeback whole huge page a time. Let's adjust iteration this way.
>
> Signed-off-by: Kirill A. Shutemov
I think a lot of the complexity in this patch is from pagevec_lookup_tag
giving you
On Thu, Jan 26, 2017 at 02:57:53PM +0300, Kirill A. Shutemov wrote:
> Most page cache allocation happens via readahead (sync or async), so if
> we want to have significant number of huge pages in page cache we need
> to find a ways to allocate them from readahead.
>
> Unfortunately, huge pages
On Thu, Jan 26, 2017 at 02:57:52PM +0300, Kirill A. Shutemov wrote:
> @@ -405,9 +405,14 @@ static int __filemap_fdatawait_range(struct
> address_space *mapping,
> if (page->index > end)
> continue;
>
> + page =
v <kirill.shute...@linux.intel.com>
Reviewed-by: Matthew Wilcox <mawil...@microsoft.com>
On Thu, Jan 26, 2017 at 02:57:50PM +0300, Kirill A. Shutemov wrote:
> +++ b/mm/filemap.c
> @@ -1886,6 +1886,7 @@ static ssize_t do_generic_file_read(struct file *filp,
> loff_t *ppos,
> if (unlikely(page == NULL))
> goto no_cached_page;
>
== AOP_TRUNCATED_PAGE);
But ... maybe it's OK to retry the huge page. I mean, not many
filesystems return AOP_TRUNCATED_PAGE, and they only do so rarely.
Anyway, I'm fine with the patch going in as-is. I just wanted to type out
my review notes.
Reviewed-by: Matthew Wilcox <mawil...@microsoft.com>
I think it's correct, but it still looks weird.
Reviewed-by: Matthew Wilcox <mawil...@microsoft.com>
On Thu, Jan 26, 2017 at 02:57:46PM +0300, Kirill A. Shutemov wrote:
> Let's add FileHugePages and FilePmdMapped fields into meminfo and smaps.
> It indicates how many times we allocate and map file THP.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com>
Re
On Thu, Jan 26, 2017 at 02:57:45PM +0300, Kirill A. Shutemov wrote:
> These flags are in use for filesystems with backing storage: PG_error,
> PG_writeback and PG_readahead.
Oh ;-) Then I amend my comment on patch 1 to be "patch 3 needs to go
ahead of patch 1" ;-)
> Signed-off-by: Kirill A.
On Thu, Jan 26, 2017 at 02:57:43PM +0300, Kirill A. Shutemov wrote:
> +++ b/include/linux/pagemap.h
> @@ -332,6 +332,15 @@ static inline struct page *grab_cache_page_nowait(struct
> address_space *mapping,
> mapping_gfp_mask(mapping));
> }
>
> +static inline struct page
On Thu, Jan 26, 2017 at 02:57:48PM +0300, Kirill A. Shutemov wrote:
> For filesystems that wants to be write-notified (has mkwrite), we will
> encount write-protection faults for huge PMDs in shared mappings.
>
> The easiest way to handle them is to clear the PMD and let it refault as
> wriable.
On Thu, Jan 26, 2017 at 02:57:44PM +0300, Kirill A. Shutemov wrote:
> This reverts commit 356e1c23292a4f63cfdf1daf0e0ddada51f32de8.
>
> After conversion of huge tmpfs to multi-order entries, we don't need
> this anymore.
Yay! Reviewed-by: Matthew Wilcox <mawil..
From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
> Nice work! A few random comments/questions:
>
> - It does add some complexity, but I think a few comments would make it
> more digestable.
I'm open to adding some comments ... I need some time between writing the code
and writing the
From: Matthew Wilcox
> From: Matthew Wilcox
> > Heh, I was thinking about that too. The radix tree supports "exceptional
> > entries" which have the bottom bit set. On a 64-bit machine, we could use
> 62
> > of the bits in the radix tree root to store the
From: Matthew Wilcox
> From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
> > This sounds good. I think there may still be a lot of users that never
> > allocate more than a handful of IDAs, making a 128 byte allocation still
> > somewhat excessive. One thing I co
From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
> On Fri, Dec 16 2016, Matthew Wilcox <mawil...@microsoft.com> wrote:
> > Thanks for your work on this; you've really put some effort into
> > proving your work has value. My motivation was purely aesthetic, but
> &
From: Andrew Morton [mailto:a...@linux-foundation.org]
> On Thu, 8 Dec 2016 02:22:55 +0100 Rasmus Villemoes
> wrote:
> > TL;DR: these patches save 250 KB of memory, with more low-hanging
> > fruit ready to pick.
> >
> > While browsing through the lib/idr.c code, I
From: Tejun Heo [mailto:hte...@gmail.com] On Behalf Of Tejun Heo
> Ah, yeah, great to see the silly implementation being replaced the
> radix tree. ida_pre_get() looks suspicious tho. idr_preload()
> immedicately being followed by idr_preload_end() probably is broken.
> Maybe what we need is
65 matches
Mail list logo