On Tue, Dec 15, 2020 at 02:27:18PM -0800, Yang Shi wrote:
> On Mon, Dec 14, 2020 at 6:46 PM Dave Chinner wrote:
> >
> > On Mon, Dec 14, 2020 at 02:37:19PM -0800, Yang Shi wrote:
> > > Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's
> > &g
Combine that with the proposed "watch_sb()" syscall for reporting
such errors in a generic manner to interested listeners, and we've
got a fairly solid generic path for reporting data loss events to
userspace for an appropriate user-defined action to be taken...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
Combine that with the proposed "watch_sb()" syscall for reporting
such errors in a generic manner to interested listeners, and we've
got a fairly solid generic path for reporting data loss events to
userspace for an appropriate user-defined action to be taken...
Cheers,
Dave.
--
Dave Chinner
da...
y process to perform after this...
> And how does it help in dealing with page faults upon poisoned
> dax page?
It doesn't. If the page is poisoned, the same behaviour will occur
as does now. This is simply error reporting infrastructure, not
error handling.
Future work might change how we correct the faults found in the
storage, but I think the user visible behaviour is going to be "kill
apps mapping corrupted data" for a long time yet
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
y process to perform after this...
> And how does it help in dealing with page faults upon poisoned
> dax page?
It doesn't. If the page is poisoned, the same behaviour will occur
as does now. This is simply error reporting infrastructure, not
error handling.
Future work might change how we corre
On Tue, Dec 15, 2020 at 02:53:48PM +0100, Johannes Weiner wrote:
> On Tue, Dec 15, 2020 at 01:09:57PM +1100, Dave Chinner wrote:
> > On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote:
> > > Since memcg_shrinker_map_size just can be changd under holding
&g
eturn;
>
> kfree(shrinker->nr_deferred);
> shrinker->nr_deferred = NULL;
e.g. then this function can simply do:
{
if (shrinker->flags & SHRINKER_MEMCG_AWARE)
return unregister_memcg_shrinker(shrinker);
kfree(shrinker->nr_deferred);
shrinker->nr_deferred = NULL;
}
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
an.c
> +++ b/mm/vmscan.c
> @@ -201,7 +201,7 @@ DECLARE_RWSEM(shrinker_rwsem);
> #define SHRINKER_REGISTERING ((struct shrinker *)~0UL)
>
> static DEFINE_IDR(shrinker_idr);
> -static int shrinker_nr_max;
> +int shrinker_nr_max;
Then we don't need to make yet another variable global...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
specific corner case,
it's likely to significantly change the reclaim balance of slab
caches, especially under GFP_NOFS intensive workloads where we can
only defer the work to kswapd.
Hence I think this is still a problematic approach as it doesn't
address the reason why deferred counts are increasing out of
control in the first place
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
if NUMA_AWARE &&
sc->memcg is true. so
static long shrink_slab_set_nr_deferred_memcg(...)
{
int nid = sc->nid;
deferred =
rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_deferred,
true);
return atomic_long_add_return(nr, >nr_deferred[id]);
}
static long shrink_slab_set_nr_deferred(...)
{
int nid = sc->nid;
if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
nid = 0;
else if (sc->memcg)
return shrink_slab_set_nr_deferred_memcg(, nid);
return atomic_long_add_return(nr, >nr_deferred[nid]);
}
And now there's no duplicated code.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
o the correct offset in the allocated range.
Then this patch is really only changes to the size of the chunk
being allocated, setting up the pointers and copying the relevant
data from the old to new.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
good idea. This couples the shrinker
infrastructure to internal details of how cgroups are initialised
and managed. Sure, certain operations might be done in certain
shrinker lock contexts, but that doesn't mean we should share global
locks across otherwise independent subsystems
Cheers,
Dave.
set up
that the barriers enforce.
IOWs, these memory barriers belong inside the cgroup code to
guarantee anything that sees an online cgroup will always see the
fully initialised cgroup structures. They do not belong in the
shrinker infrastructure...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Dec 15, 2020 at 01:03:45AM +, Pavel Begunkov wrote:
> On 15/12/2020 00:56, Dave Chinner wrote:
> > On Tue, Dec 15, 2020 at 12:20:23AM +, Pavel Begunkov wrote:
> >> As reported, we must not do pressure stall information accounting for
> >> direct IO,
On Tue, Dec 15, 2020 at 12:00:23PM +1100, Dave Chinner wrote:
> On Tue, Dec 15, 2020 at 12:20:24AM +, Pavel Begunkov wrote:
> > A preparation patch. It adds a simple helper which abstracts out number
> > of segments we're allocating for a bio from iov_iter_npages().
>
>
_vecs_to_alloc(struct iov_iter *iter, int max_segs)
> {
> + /* reuse iter->bvec */
> + if (iov_iter_is_bvec(iter))
> + return 0;
> return iov_iter_npages(iter, max_segs);
Ah, I'm a blind idiot... :/
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ecific patch, so it's not clear what it's
actually needed for...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
/
> + bio_clear_flag(bio, BIO_WORKINGSET);
Why only do this for the old direct IO path? Why isn't this
necessary for the iomap DIO path?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
> > This patch is based on Darrick's work to fix the issue in xfs/141 in the
> > > earlier version. [1]
> > >
> > > 1. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia
> > >
> > > Cc: Darrick J. Wong
> >
ans_context_active
> To check whehter current is in fs transcation or not
> - xfs_trans_context_swap
> Transfer the transaction context when rolling a permanent transaction
>
> These two new helpers are instroduced in xfs_trans.h.
>
> Cc: Darrick J. Wong
> Cc: Matthew W
On Wed, Dec 02, 2020 at 03:12:20PM +0800, Ruan Shiyang wrote:
> Hi Dave,
>
> On 2020/11/30 上午6:47, Dave Chinner wrote:
> > On Mon, Nov 23, 2020 at 08:41:10AM +0800, Shiyang Ruan wrote:
> > >
> > > The call trace is like this:
> > > memory_fail
On Wed, Dec 02, 2020 at 03:12:20PM +0800, Ruan Shiyang wrote:
> Hi Dave,
>
> On 2020/11/30 上午6:47, Dave Chinner wrote:
> > On Mon, Nov 23, 2020 at 08:41:10AM +0800, Shiyang Ruan wrote:
> > >
> > > The call trace is like this:
> > > memory_fail
F_KSWAPD)) ==
> PF_MEMALLOC))
> goto redirty;
>
> [2]. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia/
>
> Cc: Darrick J. Wong
> Cc: Matthew Wilcox (Oracle)
> Cc: Christoph Hellwig
> Cc: Dave Chinner
> Cc: Michal Hocko
> Cc
ntext is a bug in XFS.
IOWs, we are waiting on a new version of this patchset to be posted:
https://lore.kernel.org/linux-xfs/20201103131754.94949-1-laoar.s...@gmail.com/
so that we can get rid of this from iomap and check the transaction
recursion case directly in the XFS code. Then your problem goes
On Wed, Dec 02, 2020 at 10:04:17PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Dec 03, 2020 at 07:40:45AM +1100, Dave Chinner wrote:
> > On Wed, Dec 02, 2020 at 08:06:01PM +0100, Greg Kroah-Hartman wrote:
> > > On Wed, Dec 02, 2020 at 06:41:43PM +0100, Miklos Szeredi wrote:
>
they get propagated to users.
It also creates a clear demarcation between fixes and cc: stable for
maintainers and developers: only patches with a cc: stable will be
backported immediately to stable. Developers know what patches need
urgent backports and, unlike developers, the automated fixes scan
does not have the subject matter expertise or background to make
that judgement
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
instance then,
by definition, it does not support DAX and the bit should never be
set.
e.g. We don't talk about kernels that support reflink - what matters
to userspace is whether the filesystem instance supports reflink.
Think of the useless mess that xfs_info would be if it reported
kernel capabilities instead of filesystem instance capabilities.
i.e. we don't report that a filesystem supports reflink just because
the kernel supports it - it reports whether the filesystem instance
being queried supports reflink. And that also implies the kernel
supports it, because the kernel has to support it to mount the
filesystem...
So, yeah, I think it really does need to be conditional on the
filesystem instance being queried to be actually useful to users
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
o re-write it to disk to fix the bad data, otherwise we treat
it like a writeback error and report it on the next
write/fsync/close operation done on that file.
This gets rid of the mf_recover_controller altogether and allows
the interface to be used by any sort of block device for any sort
of bottom-up reporting of media/device failures.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
o re-write it to disk to fix the bad data, otherwise we treat
it like a writeback error and report it on the next
write/fsync/close operation done on that file.
This gets rid of the mf_recover_controller altogether and allows
the interface to be used by any sort of block device for any s
On Wed, Nov 25, 2020 at 06:46:54PM -0500, Sasha Levin wrote:
> On Thu, Nov 26, 2020 at 08:52:47AM +1100, Dave Chinner wrote:
> > We've already had one XFS upstream kernel regression in this -rc
> > cycle propagated to the stable kernels in 5.9.9 because the stable
> > proc
On Wed, Nov 25, 2020 at 10:35:50AM -0500, Sasha Levin wrote:
> From: Dave Chinner
>
> [ Upstream commit 883a790a84401f6f55992887fd7263d808d4d05d ]
>
> Jens has reported a situation where partial direct IOs can be issued
> and completed yet still return -EAGAIN. We don't w
either the
attributes or attributes_mask field because the filesystem is not
DAX capable. And given that we have filesystems with multiple block
devices that can have different DAX capabilities, I think this
statx() attr state (and mask) really has to come from the
filesystem, not VFS...
> Extra question: should we only set this in the attributes mask if
> CONFIG_FS_DAX=y ?
IMO, yes, because it will always be false on CONFIG_FS_DAX=n and so
it may well as not be emitted as a supported bit in the mask.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Nov 11, 2020 at 11:28:48AM +0100, Michal Suchánek wrote:
> On Tue, Nov 10, 2020 at 08:08:23AM +1100, Dave Chinner wrote:
> > On Mon, Nov 09, 2020 at 09:27:05PM +0100, Michal Suchánek wrote:
> > > On Mon, Nov 09, 2020 at 11:24:19AM -0800, Darrick J. Wong wrote:
> >
n a different
filesystem that isn't mounted at install time, so the installer
has no chance of detecting that the application is going to use
DAX enabled storage.
IOWs, the installer cannot make decisions based on DAX state on
behalf of applications because it does not know what environment the
application is going to be configured to run in. DAX can only be
deteted reliably by the application at runtime inside it's
production execution environment.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
orker threads, duplicating
the current creds will capture this information and won't leave
random landmines where stuff doesn't work as it should because the
worker thread is unaware of the userns that it is supposed to be
doing filesytsem operations under...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit
quire() so that people who have no clue what the
hell smp_acquire__after_ctrl_dep() means or does have some hope of
understanding of what objects the ordering semantics in the function
actually apply to
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
needs solving is integrating shrinker scanning control state
with memcgs more tightly, not force every memcg aware shrinker to
use list_lru for their subsystem shrinker implementations
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Sep 22, 2020 at 12:46:05PM -0400, Mikulas Patocka wrote:
> Thanks for reviewing NVFS.
Not a review - I've just had a cursory look and not looked any
deeper after I'd noticed various red flags...
> On Tue, 22 Sep 2020, Dave Chinner wrote:
> > IOWs, extent based trees were ch
On Tue, Sep 22, 2020 at 12:46:05PM -0400, Mikulas Patocka wrote:
> Thanks for reviewing NVFS.
Not a review - I've just had a cursory look and not looked any
deeper after I'd noticed various red flags...
> On Tue, 22 Sep 2020, Dave Chinner wrote:
> > IOWs, extent based trees were ch
fications
it knows nothing about are executed atomically?
That, too me, looks like a fundamental, unfixable flaw in this
approach...
I can see how "almost in place" modification can be done by having
two copies side by side and updating one while the other is the
active copy and switching
fications
it knows nothing about are executed atomically?
That, too me, looks like a fundamental, unfixable flaw in this
approach...
I can see how "almost in place" modification can be done by having
two copies side by side and updating one while the other is the
active copy and switching
On Thu, Sep 17, 2020 at 12:47:10AM -0700, Hugh Dickins wrote:
> On Thu, 17 Sep 2020, Dave Chinner wrote:
> > On Wed, Sep 16, 2020 at 07:04:46PM -0700, Hugh Dickins wrote:
> > > On Thu, 17 Sep 2020,
On Thu, Sep 17, 2020 at 05:12:08PM -0700, Yang Shi wrote:
> On Wed, Sep 16, 2020 at 7:37 PM Dave Chinner wrote:
> > On Wed, Sep 16, 2020 at 11:58:21AM -0700, Yang Shi wrote:
> > It clamps the worst case freeing to half the cache, and that is
> > exactly what you are seeing
ning millions of IOPS through the AIO subsystem, then the
cost of doing millions of extra atomic ops every second is going to
be noticable...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
er
thread. There are quite a few custom enterprise apps around that
rely on this POSIX behaviour, especially stuff that has come from
different Unixes that actually provided Posix compliant behaviour.
IOWs, from an upstream POV, POSIX atomic write behaviour doesn't
matter very much. From an enterprise dist
On Wed, Sep 16, 2020 at 07:04:46PM -0700, Hugh Dickins wrote:
> On Thu, 17 Sep 2020, Dave Chinner wrote:
> >
> > So
> >
> > P0 p1
> >
> > hole punch starts
> > takes XFS_MMAPLOCK_EXCL
> > truncate_pagec
bit.com/
Unfortunately, none of the MM developers showed any interest in
these patches, so when I found a different solution to the XFS
problem it got dropped on the ground.
> So why do we have to still keep it around?
Because we need a feedback mechanism to allow us to maintain control
of the size of filesystem caches that grow via GFP_NOFS allocations.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Sep 16, 2020 at 05:58:51PM +0200, Jan Kara wrote:
> On Sat 12-09-20 09:19:11, Amir Goldstein wrote:
> > On Tue, Jun 23, 2020 at 8:21 AM Dave Chinner wrote:
> > >
> > > From: Dave Chinner
> > >
> > > The page faultround path ->map_pages i
y.
I think it's pretty straight forward to do it in the iomap layer...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
t
> - Add Fixes tag in commit message
>
> fs/inode.c | 4 +++-
> include/linux/fs.h | 3 +--
> 2 files changed, 4 insertions(+), 3 deletions(-)
Looks good.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
statement.
i.e.
if (!drop &&
!(inode->i_state & I_DONTCACHE) &&
(sb->s_flags & SB_ACTIVE)) {
Which gives a clear indication that there are all at the same
precedence and separate logic statements...
Otherwise the change looks good.
Probably best to resend with the fixes tag :)
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Aug 28, 2020 at 05:04:14PM +0800, Li, Hao wrote:
> On 2020/8/28 8:35, Dave Chinner wrote:
> > On Thu, Aug 27, 2020 at 05:58:07PM +0800, Li, Hao wrote:
> >> On 2020/8/27 14:37, Dave Chinner wrote:
> >>> On Fri, Aug 21, 2020 at 09:59:53AM +0800,
On Thu, Aug 27, 2020 at 05:58:07PM +0800, Li, Hao wrote:
> On 2020/8/27 14:37, Dave Chinner wrote:
> > On Fri, Aug 21, 2020 at 09:59:53AM +0800, Hao Li wrote:
> >> Currently, DCACHE_REFERENCED prevents the dentry with DCACHE_DONTCACHE
> >> set from being killed, so th
ode->i_state, state | I_WILL_FREE);
> spin_unlock(>i_lock);
What's this supposed to do? We'll only get here with drop set if the
filesystem is mounting or unmounting. In either case, why does
having I_DONTCACHE set require the inode to be written back here
before it is evicted from the cache?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
blkbits);
}
static inline unsigned
iomap_chunks_per_page(struct inode *inode, struct page *page)
{
return page_size(page) >> inode->i_blkbits;
}
and the latter is actually the same as what i_block_per_page() is
currently implemented as
Cheers,
Dave.
--
Dave Chinner
da...@f
blkbits);
}
static inline unsigned
iomap_chunks_per_page(struct inode *inode, struct page *page)
{
return page_size(page) >> inode->i_blkbits;
}
and the latter is actually the same as what i_block_per_page() is
currently implemented as
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Aug 25, 2020 at 01:40:24PM +0100, Matthew Wilcox wrote:
> Any objection to leaving this patch as-is with a u64 length?
No objection here - I just wanted to make sure that signed/unsigned
overflow was not going to be an issue...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Aug 25, 2020 at 01:40:24PM +0100, Matthew Wilcox wrote:
> Any objection to leaving this patch as-is with a u64 length?
No objection here - I just wanted to make sure that signed/unsigned
overflow was not going to be an issue...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.
On Mon, Aug 24, 2020 at 09:35:59PM -0600, Andreas Dilger wrote:
> On Aug 24, 2020, at 9:26 PM, Matthew Wilcox wrote:
> >
> > On Tue, Aug 25, 2020 at 10:27:35AM +1000, Dave Chinner wrote:
> >>> do {
> >>> - unsigned offset, bytes;
> >>
On Mon, Aug 24, 2020 at 09:35:59PM -0600, Andreas Dilger wrote:
> On Aug 24, 2020, at 9:26 PM, Matthew Wilcox wrote:
> >
> > On Tue, Aug 25, 2020 at 10:27:35AM +1000, Dave Chinner wrote:
> >>> do {
> >>> - unsigned offset, bytes;
> >>
On Tue, Aug 25, 2020 at 02:06:05AM +0100, Matthew Wilcox wrote:
> On Tue, Aug 25, 2020 at 10:12:23AM +1000, Dave Chinner wrote:
> > > -static int
> > > -__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> > > - unsigned copied, struct pa
On Tue, Aug 25, 2020 at 02:06:05AM +0100, Matthew Wilcox wrote:
> On Tue, Aug 25, 2020 at 10:12:23AM +1000, Dave Chinner wrote:
> > > -static int
> > > -__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> > > - unsigned copied, struct pa
ncy
of stable pages in a situation like this - a mmap() write fault
could stall for many seconds waiting for a huge bio chain to finish
submission and run completion processing even when the IO for the
given page we faulted on was completed before the page fault
occurred...
Hence I think we really do need to cap the length of the bio
chains here so that we start completing and ending page writeback on
large writeback ranges long before the writeback code finishes
submitting the range it was asked to write back.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
;
> + return length;
>
> do {
> - unsigned offset, bytes;
> -
> - offset = offset_in_page(pos);
> - bytes = min_t(loff_t, PAGE_SIZE - offset, count);
> + loff_t bytes;
>
> if (IS_DAX(inode))
>
;
> + return length;
>
> do {
> - unsigned offset, bytes;
> -
> - offset = offset_in_page(pos);
> - bytes = min_t(loff_t, PAGE_SIZE - offset, count);
> + loff_t bytes;
>
> if (IS_DAX(inode))
>
size_t copied, struct page *page, struct iomap *iomap,
> + struct iomap *srcmap)
... this.
Otherwise the code looks fine.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
size_t copied, struct page *page, struct iomap *iomap,
> + struct iomap *srcmap)
... this.
Otherwise the code looks fine.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
e and why it is intentional...
Otherwise the code looks OK.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
e and why it is intentional...
Otherwise the code looks OK.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
uct iomap_page *)page_private(page);
> return NULL;
Just to confirm: this vm bug check is to needed becuse we only
attach the iomap_page to the head page of a compound page?
Assuming that I've understood the above correctly:
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.co
uct iomap_page *)page_private(page);
> return NULL;
Just to confirm: this vm bug check is to needed becuse we only
attach the iomap_page to the head page of a compound page?
Assuming that I've understood the above correctly:
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
ig
> ---
> fs/iomap/buffered-io.c | 12 ++--
> 1 file changed, 2 insertions(+), 10 deletions(-)
Looks good.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To u
page_private() handles the refcount now.
>
> Signed-off-by: Matthew Wilcox (Oracle)
> Reviewed-by: Christoph Hellwig
> ---
> fs/iomap/buffered-io.c | 10 +-
> 1 file changed, 1 insertion(+), 9 deletions(-)
The sooner this goes in the better :)
Reviewed-by: Dave
ig
> ---
> fs/iomap/buffered-io.c | 12 ++--
> 1 file changed, 2 insertions(+), 10 deletions(-)
Looks good.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
le)
> Reviewed-by: Christoph Hellwig
> ---
> fs/iomap/buffered-io.c | 8
> fs/jfs/jfs_metapage.c | 2 +-
> fs/xfs/xfs_aops.c | 2 +-
> include/linux/pagemap.h | 16
> 4 files changed, 22 insertions(+), 6 deletions(-)
Otherwise looks good.
Reviewed
page_private() handles the refcount now.
>
> Signed-off-by: Matthew Wilcox (Oracle)
> Reviewed-by: Christoph Hellwig
> ---
> fs/iomap/buffered-io.c | 10 +-
> 1 file changed, 1 insertion(+), 9 deletions(-)
The sooner this goes in the better :)
Reviewed-by: Dave
le)
> Reviewed-by: Christoph Hellwig
> ---
> fs/iomap/buffered-io.c | 8
> fs/jfs/jfs_metapage.c | 2 +-
> fs/xfs/xfs_aops.c | 2 +-
> include/linux/pagemap.h | 16
> 4 files changed, 22 insertions(+), 6 deletions(-)
Otherwise looks good.
Reviewed
gt; the best place. That means we can remove it from iomap_write_actor().
>
> Signed-off-by: Matthew Wilcox (Oracle)
> ---
> fs/iomap/buffered-io.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
looks good.
Reviewed-by: Dave Chinner
Cheers,
Dave.
--
Dave
gt; the best place. That means we can remove it from iomap_write_actor().
>
> Signed-off-by: Matthew Wilcox (Oracle)
> ---
> fs/iomap/buffered-io.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
looks good.
Reviewed-by: Dave Chinner
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ile (i.e. the file
itself is not sparse), while the extent size hint will just add 64kB
extents into the file around the write offset. That demonstrates the
other behavioural advantage that extent size hints have is they
avoid needing to extend the file, which is yet another way to
serialise concurrent IO and create IO pipeline stalls...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ile (i.e. the file
itself is not sparse), while the extent size hint will just add 64kB
extents into the file around the write offset. That demonstrates the
other behavioural advantage that extent size hints have is they
avoid needing to extend the file, which is yet another way to
serialise concurrent IO and create IO pipeline stalls...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
and filesytem are doing in real time (e.g. I use PCP for this
and visualise ithe behaviour in real time via pmchart) gives a lot
of insight into exactly what is changing during transient workload
changes liek starting a benchmark...
> I was running fio with --ramp_time=5 which ignores the first 5 seconds
> of data in order to let performance settle, but if I remove that I can
> see the effect more clearly. I can observe it with raw files (in 'off'
> and 'prealloc' modes) and qcow2 files in 'prealloc' mode. With qcow2 and
> preallocation=off the performance is stable during the whole test.
What does "preallocation=off" mean again? Is that using
fallocate(ZERO_RANGE) prior to the data write rather than
preallocating the metadata/entire file? If so, I would expect the
limiting factor is the rate at which IO can be issued because of the
fallocate() triggered pipeline bubbles. That leaves idle device time
so you're not pushing the limits of the hardware and hence none of
the behaviours above will be evident...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
and filesytem are doing in real time (e.g. I use PCP for this
and visualise ithe behaviour in real time via pmchart) gives a lot
of insight into exactly what is changing during transient workload
changes liek starting a benchmark...
> I was running fio with --ramp_time=5 which ignores the first 5 seconds
> of data in order to let performance settle, but if I remove that I can
> see the effect more clearly. I can observe it with raw files (in 'off'
> and 'prealloc' modes) and qcow2 files in 'prealloc' mode. With qcow2 and
> preallocation=off the performance is stable during the whole test.
What does "preallocation=off" mean again? Is that using
fallocate(ZERO_RANGE) prior to the data write rather than
preallocating the metadata/entire file? If so, I would expect the
limiting factor is the rate at which IO can be issued because of the
fallocate() triggered pipeline bubbles. That leaves idle device time
so you're not pushing the limits of the hardware and hence none of
the behaviours above will be evident...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Aug 21, 2020 at 10:15:33AM +0530, Ritesh Harjani wrote:
> Hello Dave,
>
> Thanks for reviewing this.
>
> On 8/21/20 4:41 AM, Dave Chinner wrote:
> > On Wed, Aug 19, 2020 at 03:58:41PM +0530, Anju T Sudhakar wrote:
> > > From: Ritesh Harjani
> > >
tting written extents, the performance of (1), (2) and (4) will
trend towards (5) as writes hit already allocated ranges of the file
and the serialisation of extent mapping changes goes away. This
occurs with guest filesystems that perform overwrite in place (such
as XFS) and hence overwrites of existin
On Fri, Aug 21, 2020 at 08:21:45AM +0800, Gao Xiang wrote:
> Hi Dave,
>
> On Fri, Aug 21, 2020 at 09:34:46AM +1000, Dave Chinner wrote:
> > On Thu, Aug 20, 2020 at 12:53:23PM +0800, Gao Xiang wrote:
> > > SWP_FS is used to make swap_{read,write}page() go through
> &
cluster size and alignment, does the swap clustering optimisations
for swapping THP pages work correctly? And, if so, is there any
performance benefit we get from enabling proper THP swap clustering
on swapfiles?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
or that device for us?)
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
tting written extents, the performance of (1), (2) and (4) will
trend towards (5) as writes hit already allocated ranges of the file
and the serialisation of extent mapping changes goes away. This
occurs with guest filesystems that perform overwrite in place (such
as XFS) and hence overwrites of existin
ry reclaim throttling, not dirty page
throttling. balance_dirty_pages() still works just fine as it does
not look at device congestion. page cleaning rate is accounted in
test_clear_page_writeback(), page dirtying rate is accounted
directly in balance_dirty_pages(). That feedback loop has not been
broken...
And I compeltely agree with Peter here - the control theory we
applied to the dirty throttling problem is still 100% valid and so
the algorithm still just works all these years later. I've only been
saying that allocation should use the same feedback model for
reclaim throttling since ~2011...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
s at once?
So, essentially, you do a DIO read into a mmap()d range from the
same file, with DIO read ascending and the mmap() range descending,
then once that is done you hole punch the file and do it again?
IOWs, this is a racing page_mkwrite()/DIO read workload, and the
moment the two threads hit the same block of the file with a
DIO read and a page_mkwrite at the same time, it throws a warning.
Well, that's completely expected behaviour. DIO is not serialised
against mmap() access at all, and so if the page_mkwrite occurs
between the writeback and the iomap_apply() call in the dio path,
then it will see the delalloc block taht the page-mkwrite allocated.
No sane application would ever do this, it's behaviour as expected,
so I don't think there's anything to care about here.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Aug 12, 2020 at 05:10:12PM -0400, Vivek Goyal wrote:
> On Wed, Aug 12, 2020 at 11:23:45AM +1000, Dave Chinner wrote:
> > On Tue, Aug 11, 2020 at 01:55:30PM -0400, Vivek Goyal wrote:
> > > On Tue, Aug 11, 2020 at 08:22:38AM +1000, Dave Chinner wrote:
> > > >
On Tue, Aug 11, 2020 at 01:55:30PM -0400, Vivek Goyal wrote:
> On Tue, Aug 11, 2020 at 08:22:38AM +1000, Dave Chinner wrote:
> > On Fri, Aug 07, 2020 at 03:55:21PM -0400, Vivek Goyal wrote:
> > > We need some kind of locking mechanism here. Normal file systems like
> > >
On Tue, Aug 11, 2020 at 01:55:30PM -0400, Vivek Goyal wrote:
> On Tue, Aug 11, 2020 at 08:22:38AM +1000, Dave Chinner wrote:
> > On Fri, Aug 07, 2020 at 03:55:21PM -0400, Vivek Goyal wrote:
> > > We need some kind of locking mechanism here. Normal file systems like
> > >
NOIO as well to restore the previous behavior.
>
> Fixes: 2e85abf053b9 ("mm: allow read-ahead with IOCB_NOWAIT set")
> Reported-by: Dave Chinner
> Signed-off-by: Jens Axboe
>
> ---
>
> This was a known change with the buffered async read change, but we
> didn't
eries, or
> pull a branch that'll go into Linus as well.
Jens, Willy,
Now that this patch has been merged and IOCB_NOWAIT semantics ifor
buffered reads are broken in Linus' tree, what's the plan to get
this regression fixed before 5.9 releases?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
struct iomap *iomap)
ditto: fuse_upgrade_dax_mapping().
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
___
Virtio-fs mailing list
Virtio-fs@redhat.com
https://www.redhat.com/mailman/listinfo/virtio-fs
ou can drop all locks
The same goes for any other operation that manipulates extents
directly (other fallocate ops, truncate, etc).
/me also wonders if there can be racing AIO+DIO in progress over the
range that is being punched and whether fuse needs to call
inode_dio_wait() before punching holes, running truncates, etc...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
___
Virtio-fs mailing list
Virtio-fs@redhat.com
https://www.redhat.com/mailman/listinfo/virtio-fs
ou can drop all locks
The same goes for any other operation that manipulates extents
directly (other fallocate ops, truncate, etc).
/me also wonders if there can be racing AIO+DIO in progress over the
range that is being punched and whether fuse needs to call
inode_dio_wait() before punching holes, running truncates, etc...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
struct iomap *iomap)
ditto: fuse_upgrade_dax_mapping().
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
401 - 500 of 5856 matches
Mail list logo