Re: [v2 PATCH 6/9] mm: vmscan: use per memcg nr_deferred of shrinker

2020-12-15 Thread Dave Chinner
On Tue, Dec 15, 2020 at 02:27:18PM -0800, Yang Shi wrote: > On Mon, Dec 14, 2020 at 6:46 PM Dave Chinner wrote: > > > > On Mon, Dec 14, 2020 at 02:37:19PM -0800, Yang Shi wrote: > > > Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's > > &g

Re: [RFC PATCH v3 8/9] md: Implement ->corrupted_range()

2020-12-15 Thread Dave Chinner
Combine that with the proposed "watch_sb()" syscall for reporting such errors in a generic manner to interested listeners, and we've got a fairly solid generic path for reporting data loss events to userspace for an appropriate user-defined action to be taken... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v3 8/9] md: Implement ->corrupted_range()

2020-12-15 Thread Dave Chinner
Combine that with the proposed "watch_sb()" syscall for reporting such errors in a generic manner to interested listeners, and we've got a fairly solid generic path for reporting data loss events to userspace for an appropriate user-defined action to be taken... Cheers, Dave. -- Dave Chinner da...

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-15 Thread Dave Chinner
y process to perform after this... > And how does it help in dealing with page faults upon poisoned > dax page? It doesn't. If the page is poisoned, the same behaviour will occur as does now. This is simply error reporting infrastructure, not error handling. Future work might change how we correct the faults found in the storage, but I think the user visible behaviour is going to be "kill apps mapping corrupted data" for a long time yet Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-15 Thread Dave Chinner
y process to perform after this... > And how does it help in dealing with page faults upon poisoned > dax page? It doesn't. If the page is poisoned, the same behaviour will occur as does now. This is simply error reporting infrastructure, not error handling. Future work might change how we corre

Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-15 Thread Dave Chinner
On Tue, Dec 15, 2020 at 02:53:48PM +0100, Johannes Weiner wrote: > On Tue, Dec 15, 2020 at 01:09:57PM +1100, Dave Chinner wrote: > > On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote: > > > Since memcg_shrinker_map_size just can be changd under holding &g

Re: [v2 PATCH 7/9] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers

2020-12-14 Thread Dave Chinner
eturn; > > kfree(shrinker->nr_deferred); > shrinker->nr_deferred = NULL; e.g. then this function can simply do: { if (shrinker->flags & SHRINKER_MEMCG_AWARE) return unregister_memcg_shrinker(shrinker); kfree(shrinker->nr_deferred); shrinker->nr_deferred = NULL; } Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 8/9] mm: memcontrol: reparent nr_deferred when memcg offline

2020-12-14 Thread Dave Chinner
an.c > +++ b/mm/vmscan.c > @@ -201,7 +201,7 @@ DECLARE_RWSEM(shrinker_rwsem); > #define SHRINKER_REGISTERING ((struct shrinker *)~0UL) > > static DEFINE_IDR(shrinker_idr); > -static int shrinker_nr_max; > +int shrinker_nr_max; Then we don't need to make yet another variable global... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 9/9] mm: vmscan: shrink deferred objects proportional to priority

2020-12-14 Thread Dave Chinner
specific corner case, it's likely to significantly change the reclaim balance of slab caches, especially under GFP_NOFS intensive workloads where we can only defer the work to kswapd. Hence I think this is still a problematic approach as it doesn't address the reason why deferred counts are increasing out of control in the first place Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 6/9] mm: vmscan: use per memcg nr_deferred of shrinker

2020-12-14 Thread Dave Chinner
if NUMA_AWARE && sc->memcg is true. so static long shrink_slab_set_nr_deferred_memcg(...) { int nid = sc->nid; deferred = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_deferred, true); return atomic_long_add_return(nr, >nr_deferred[id]); } static long shrink_slab_set_nr_deferred(...) { int nid = sc->nid; if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) nid = 0; else if (sc->memcg) return shrink_slab_set_nr_deferred_memcg(, nid); return atomic_long_add_return(nr, >nr_deferred[nid]); } And now there's no duplicated code. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 5/9] mm: memcontrol: add per memcg shrinker nr_deferred

2020-12-14 Thread Dave Chinner
o the correct offset in the allocated range. Then this patch is really only changes to the size of the chunk being allocated, setting up the pointers and copying the relevant data from the old to new. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-14 Thread Dave Chinner
good idea. This couples the shrinker infrastructure to internal details of how cgroups are initialised and managed. Sure, certain operations might be done in certain shrinker lock contexts, but that doesn't mean we should share global locks across otherwise independent subsystems Cheers, Dave.

Re: [v2 PATCH 3/9] mm: vmscan: guarantee shrinker_slab_memcg() sees valid shrinker_maps for online memcg

2020-12-14 Thread Dave Chinner
set up that the barriers enforce. IOWs, these memory barriers belong inside the cgroup code to guarantee anything that sees an online cgroup will always see the fully initialised cgroup structures. They do not belong in the shrinker infrastructure... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 4/6] block/psi: remove PSI annotations from direct IO

2020-12-14 Thread Dave Chinner
On Tue, Dec 15, 2020 at 01:03:45AM +, Pavel Begunkov wrote: > On 15/12/2020 00:56, Dave Chinner wrote: > > On Tue, Dec 15, 2020 at 12:20:23AM +, Pavel Begunkov wrote: > >> As reported, we must not do pressure stall information accounting for > >> direct IO,

Re: [PATCH v1 5/6] bio: add a helper calculating nr segments to alloc

2020-12-14 Thread Dave Chinner
On Tue, Dec 15, 2020 at 12:00:23PM +1100, Dave Chinner wrote: > On Tue, Dec 15, 2020 at 12:20:24AM +, Pavel Begunkov wrote: > > A preparation patch. It adds a simple helper which abstracts out number > > of segments we're allocating for a bio from iov_iter_npages(). > >

Re: [PATCH v1 6/6] block/iomap: don't copy bvec for direct IO

2020-12-14 Thread Dave Chinner
_vecs_to_alloc(struct iov_iter *iter, int max_segs) > { > + /* reuse iter->bvec */ > + if (iov_iter_is_bvec(iter)) > + return 0; > return iov_iter_npages(iter, max_segs); Ah, I'm a blind idiot... :/ Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 5/6] bio: add a helper calculating nr segments to alloc

2020-12-14 Thread Dave Chinner
ecific patch, so it's not clear what it's actually needed for... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 4/6] block/psi: remove PSI annotations from direct IO

2020-12-14 Thread Dave Chinner
/ > + bio_clear_flag(bio, BIO_WORKINGSET); Why only do this for the old direct IO path? Why isn't this necessary for the iomap DIO path? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Linux-cachefs] [PATCH v12 3/4] xfs: refactor the usage around xfs_trans_context_{set, clear}

2020-12-14 Thread Dave Chinner
> > This patch is based on Darrick's work to fix the issue in xfs/141 in the > > > earlier version. [1] > > > > > > 1. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia > > > > > > Cc: Darrick J. Wong > >

Re: [Linux-cachefs] [PATCH v10 4/4] xfs: use current->journal_info to avoid transaction reservation recursion

2020-12-07 Thread Dave Chinner
ans_context_active > To check whehter current is in fs transcation or not > - xfs_trans_context_swap > Transfer the transaction context when rolling a permanent transaction > > These two new helpers are instroduced in xfs_trans.h. > > Cc: Darrick J. Wong > Cc: Matthew W

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-06 Thread Dave Chinner
On Wed, Dec 02, 2020 at 03:12:20PM +0800, Ruan Shiyang wrote: > Hi Dave, > > On 2020/11/30 上午6:47, Dave Chinner wrote: > > On Mon, Nov 23, 2020 at 08:41:10AM +0800, Shiyang Ruan wrote: > > > > > > The call trace is like this: > > > memory_fail

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-06 Thread Dave Chinner
On Wed, Dec 02, 2020 at 03:12:20PM +0800, Ruan Shiyang wrote: > Hi Dave, > > On 2020/11/30 上午6:47, Dave Chinner wrote: > > On Mon, Nov 23, 2020 at 08:41:10AM +0800, Shiyang Ruan wrote: > > > > > > The call trace is like this: > > > memory_fail

Re: [Linux-cachefs] [PATCH v9 2/2] xfs: avoid transaction reservation recursion

2020-12-06 Thread Dave Chinner
F_KSWAPD)) == > PF_MEMALLOC)) > goto redirty; > > [2]. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia/ > > Cc: Darrick J. Wong > Cc: Matthew Wilcox (Oracle) > Cc: Christoph Hellwig > Cc: Dave Chinner > Cc: Michal Hocko > Cc

Re: [Linux-cachefs] Problems doing DIO to netfs cache on XFS from Ceph

2020-12-03 Thread Dave Chinner
ntext is a bug in XFS. IOWs, we are waiting on a new version of this patchset to be posted: https://lore.kernel.org/linux-xfs/20201103131754.94949-1-laoar.s...@gmail.com/ so that we can get rid of this from iomap and check the transaction recursion case directly in the XFS code. Then your problem goes

Re: [PATCH V2] uapi: fix statx attribute value overlap for DAX & MOUNT_ROOT

2020-12-02 Thread Dave Chinner
On Wed, Dec 02, 2020 at 10:04:17PM +0100, Greg Kroah-Hartman wrote: > On Thu, Dec 03, 2020 at 07:40:45AM +1100, Dave Chinner wrote: > > On Wed, Dec 02, 2020 at 08:06:01PM +0100, Greg Kroah-Hartman wrote: > > > On Wed, Dec 02, 2020 at 06:41:43PM +0100, Miklos Szeredi wrote: >

Re: [PATCH V2] uapi: fix statx attribute value overlap for DAX & MOUNT_ROOT

2020-12-02 Thread Dave Chinner
they get propagated to users. It also creates a clear demarcation between fixes and cc: stable for maintainers and developers: only patches with a cc: stable will be backported immediately to stable. Developers know what patches need urgent backports and, unlike developers, the automated fixes scan does not have the subject matter expertise or background to make that judgement Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/2] statx: move STATX_ATTR_DAX attribute handling to filesystems

2020-12-01 Thread Dave Chinner
instance then, by definition, it does not support DAX and the bit should never be set. e.g. We don't talk about kernels that support reflink - what matters to userspace is whether the filesystem instance supports reflink. Think of the useless mess that xfs_info would be if it reported kernel capabilities instead of filesystem instance capabilities. i.e. we don't report that a filesystem supports reflink just because the kernel supports it - it reports whether the filesystem instance being queried supports reflink. And that also implies the kernel supports it, because the kernel has to support it to mount the filesystem... So, yeah, I think it really does need to be conditional on the filesystem instance being queried to be actually useful to users Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-11-29 Thread Dave Chinner
o re-write it to disk to fix the bad data, otherwise we treat it like a writeback error and report it on the next write/fsync/close operation done on that file. This gets rid of the mf_recover_controller altogether and allows the interface to be used by any sort of block device for any sort of bottom-up reporting of media/device failures. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-11-29 Thread Dave Chinner
o re-write it to disk to fix the bad data, otherwise we treat it like a writeback error and report it on the next write/fsync/close operation done on that file. This gets rid of the mf_recover_controller altogether and allows the interface to be used by any sort of block device for any s

Re: [PATCH AUTOSEL 5.9 33/33] xfs: don't allow NOWAIT DIO across extent boundaries

2020-11-25 Thread Dave Chinner
On Wed, Nov 25, 2020 at 06:46:54PM -0500, Sasha Levin wrote: > On Thu, Nov 26, 2020 at 08:52:47AM +1100, Dave Chinner wrote: > > We've already had one XFS upstream kernel regression in this -rc > > cycle propagated to the stable kernels in 5.9.9 because the stable > > proc

Re: [PATCH AUTOSEL 5.9 33/33] xfs: don't allow NOWAIT DIO across extent boundaries

2020-11-25 Thread Dave Chinner
On Wed, Nov 25, 2020 at 10:35:50AM -0500, Sasha Levin wrote: > From: Dave Chinner > > [ Upstream commit 883a790a84401f6f55992887fd7263d808d4d05d ] > > Jens has reported a situation where partial direct IOs can be issued > and completed yet still return -EAGAIN. We don't w

Re: [PATCH] fs/stat: set attributes_mask for STATX_ATTR_DAX

2020-11-23 Thread Dave Chinner
either the attributes or attributes_mask field because the filesystem is not DAX capable. And given that we have filesystems with multiple block devices that can have different DAX capabilities, I think this statx() attr state (and mask) really has to come from the filesystem, not VFS... > Extra question: should we only set this in the attributes mask if > CONFIG_FS_DAX=y ? IMO, yes, because it will always be false on CONFIG_FS_DAX=n and so it may well as not be emitted as a supported bit in the mask. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/2] xfs: show the dax option in mount options.

2020-11-11 Thread Dave Chinner
On Wed, Nov 11, 2020 at 11:28:48AM +0100, Michal Suchánek wrote: > On Tue, Nov 10, 2020 at 08:08:23AM +1100, Dave Chinner wrote: > > On Mon, Nov 09, 2020 at 09:27:05PM +0100, Michal Suchánek wrote: > > > On Mon, Nov 09, 2020 at 11:24:19AM -0800, Darrick J. Wong wrote: > >

Re: [PATCH 1/2] xfs: show the dax option in mount options.

2020-11-09 Thread Dave Chinner
n a different filesystem that isn't mounted at install time, so the installer has no chance of detecting that the application is going to use DAX enabled storage. IOWs, the installer cannot make decisions based on DAX state on behalf of applications because it does not know what environment the application is going to be configured to run in. DAX can only be deteted reliably by the application at runtime inside it's production execution environment. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 00/34] fs: idmapped mounts

2020-10-29 Thread Dave Chinner
orker threads, duplicating the current creds will capture this information and won't leave random landmines where stuff doesn't work as it should because the worker thread is unaware of the userns that it is supposed to be doing filesytsem operations under... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit

Re: [PATCH] fs/dcache: optimize start_dir_add()

2020-10-26 Thread Dave Chinner
quire() so that people who have no clue what the hell smp_acquire__after_ctrl_dep() means or does have some hope of understanding of what objects the ordering semantics in the function actually apply to Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH 0/2] Remove shrinker's nr_deferred

2020-09-30 Thread Dave Chinner
needs solving is integrating shrinker scanning control state with memcgs more tightly, not force every memcg aware shrinker to use list_lru for their subsystem shrinker implementations Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache)

2020-09-22 Thread Dave Chinner
On Tue, Sep 22, 2020 at 12:46:05PM -0400, Mikulas Patocka wrote: > Thanks for reviewing NVFS. Not a review - I've just had a cursory look and not looked any deeper after I'd noticed various red flags... > On Tue, 22 Sep 2020, Dave Chinner wrote: > > IOWs, extent based trees were ch

Re: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache)

2020-09-22 Thread Dave Chinner
On Tue, Sep 22, 2020 at 12:46:05PM -0400, Mikulas Patocka wrote: > Thanks for reviewing NVFS. Not a review - I've just had a cursory look and not looked any deeper after I'd noticed various red flags... > On Tue, 22 Sep 2020, Dave Chinner wrote: > > IOWs, extent based trees were ch

Re: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache)

2020-09-21 Thread Dave Chinner
fications it knows nothing about are executed atomically? That, too me, looks like a fundamental, unfixable flaw in this approach... I can see how "almost in place" modification can be done by having two copies side by side and updating one while the other is the active copy and switching

Re: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache)

2020-09-21 Thread Dave Chinner
fications it knows nothing about are executed atomically? That, too me, looks like a fundamental, unfixable flaw in this approach... I can see how "almost in place" modification can be done by having two copies side by side and updating one while the other is the active copy and switching

Re: More filesystem need this fix (xfs: use MMAPLOCK around filemap_map_pages())

2020-09-21 Thread Dave Chinner
On Thu, Sep 17, 2020 at 12:47:10AM -0700, Hugh Dickins wrote: > On Thu, 17 Sep 2020, Dave Chinner wrote: > > On Wed, Sep 16, 2020 at 07:04:46PM -0700, Hugh Dickins wrote: > > > On Thu, 17 Sep 2020,

Re: [RFC PATCH 0/2] Remove shrinker's nr_deferred

2020-09-20 Thread Dave Chinner
On Thu, Sep 17, 2020 at 05:12:08PM -0700, Yang Shi wrote: > On Wed, Sep 16, 2020 at 7:37 PM Dave Chinner wrote: > > On Wed, Sep 16, 2020 at 11:58:21AM -0700, Yang Shi wrote: > > It clamps the worst case freeing to half the cache, and that is > > exactly what you are seeing

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-20 Thread Dave Chinner
ning millions of IOPS through the AIO subsystem, then the cost of doing millions of extra atomic ops every second is going to be noticable... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: the "read" syscall sees partial effects of the "write" syscall

2020-09-20 Thread Dave Chinner
er thread. There are quite a few custom enterprise apps around that rely on this POSIX behaviour, especially stuff that has come from different Unixes that actually provided Posix compliant behaviour. IOWs, from an upstream POV, POSIX atomic write behaviour doesn't matter very much. From an enterprise dist

Re: More filesystem need this fix (xfs: use MMAPLOCK around filemap_map_pages())

2020-09-17 Thread Dave Chinner
On Wed, Sep 16, 2020 at 07:04:46PM -0700, Hugh Dickins wrote: > On Thu, 17 Sep 2020, Dave Chinner wrote: > > > > So > > > > P0 p1 > > > > hole punch starts > > takes XFS_MMAPLOCK_EXCL > > truncate_pagec

Re: [RFC PATCH 0/2] Remove shrinker's nr_deferred

2020-09-16 Thread Dave Chinner
bit.com/ Unfortunately, none of the MM developers showed any interest in these patches, so when I found a different solution to the XFS problem it got dropped on the ground. > So why do we have to still keep it around? Because we need a feedback mechanism to allow us to maintain control of the size of filesystem caches that grow via GFP_NOFS allocations. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: More filesystem need this fix (xfs: use MMAPLOCK around filemap_map_pages())

2020-09-16 Thread Dave Chinner
On Wed, Sep 16, 2020 at 05:58:51PM +0200, Jan Kara wrote: > On Sat 12-09-20 09:19:11, Amir Goldstein wrote: > > On Tue, Jun 23, 2020 at 8:21 AM Dave Chinner wrote: > > > > > > From: Dave Chinner > > > > > > The page faultround path ->map_pages i

Re: Support for I/O to a bitbucket

2020-09-06 Thread Dave Chinner
y. I think it's pretty straight forward to do it in the iomap layer... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2] fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode()

2020-09-06 Thread Dave Chinner
t > - Add Fixes tag in commit message > > fs/inode.c | 4 +++- > include/linux/fs.h | 3 +-- > 2 files changed, 4 insertions(+), 3 deletions(-) Looks good. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [PATCH] fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode()

2020-09-03 Thread Dave Chinner
statement. i.e. if (!drop && !(inode->i_state & I_DONTCACHE) && (sb->s_flags & SB_ACTIVE)) { Which gives a clear indication that there are all at the same precedence and separate logic statements... Otherwise the change looks good. Probably best to resend with the fixes tag :) Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is set

2020-08-30 Thread Dave Chinner
On Fri, Aug 28, 2020 at 05:04:14PM +0800, Li, Hao wrote: > On 2020/8/28 8:35, Dave Chinner wrote: > > On Thu, Aug 27, 2020 at 05:58:07PM +0800, Li, Hao wrote: > >> On 2020/8/27 14:37, Dave Chinner wrote: > >>> On Fri, Aug 21, 2020 at 09:59:53AM +0800,

Re: [PATCH] fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is set

2020-08-27 Thread Dave Chinner
On Thu, Aug 27, 2020 at 05:58:07PM +0800, Li, Hao wrote: > On 2020/8/27 14:37, Dave Chinner wrote: > > On Fri, Aug 21, 2020 at 09:59:53AM +0800, Hao Li wrote: > >> Currently, DCACHE_REFERENCED prevents the dentry with DCACHE_DONTCACHE > >> set from being killed, so th

Re: [PATCH] fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is set

2020-08-27 Thread Dave Chinner
ode->i_state, state | I_WILL_FREE); > spin_unlock(>i_lock); What's this supposed to do? We'll only get here with drop set if the filesystem is mounting or unmounting. In either case, why does having I_DONTCACHE set require the inode to be written back here before it is evicted from the cache? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/9] fs: Introduce i_blocks_per_page

2020-08-25 Thread Dave Chinner
blkbits); } static inline unsigned iomap_chunks_per_page(struct inode *inode, struct page *page) { return page_size(page) >> inode->i_blkbits; } and the latter is actually the same as what i_block_per_page() is currently implemented as Cheers, Dave. -- Dave Chinner da...@f

Re: [PATCH 2/9] fs: Introduce i_blocks_per_page

2020-08-25 Thread Dave Chinner
blkbits); } static inline unsigned iomap_chunks_per_page(struct inode *inode, struct page *page) { return page_size(page) >> inode->i_blkbits; } and the latter is actually the same as what i_block_per_page() is currently implemented as Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 9/9] iomap: Change calling convention for zeroing

2020-08-25 Thread Dave Chinner
On Tue, Aug 25, 2020 at 01:40:24PM +0100, Matthew Wilcox wrote: > Any objection to leaving this patch as-is with a u64 length? No objection here - I just wanted to make sure that signed/unsigned overflow was not going to be an issue... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 9/9] iomap: Change calling convention for zeroing

2020-08-25 Thread Dave Chinner
On Tue, Aug 25, 2020 at 01:40:24PM +0100, Matthew Wilcox wrote: > Any objection to leaving this patch as-is with a u64 length? No objection here - I just wanted to make sure that signed/unsigned overflow was not going to be an issue... Cheers, Dave. -- Dave Chinner da...@fromorbit.

Re: [PATCH 9/9] iomap: Change calling convention for zeroing

2020-08-24 Thread Dave Chinner
On Mon, Aug 24, 2020 at 09:35:59PM -0600, Andreas Dilger wrote: > On Aug 24, 2020, at 9:26 PM, Matthew Wilcox wrote: > > > > On Tue, Aug 25, 2020 at 10:27:35AM +1000, Dave Chinner wrote: > >>> do { > >>> - unsigned offset, bytes; > >>

Re: [PATCH 9/9] iomap: Change calling convention for zeroing

2020-08-24 Thread Dave Chinner
On Mon, Aug 24, 2020 at 09:35:59PM -0600, Andreas Dilger wrote: > On Aug 24, 2020, at 9:26 PM, Matthew Wilcox wrote: > > > > On Tue, Aug 25, 2020 at 10:27:35AM +1000, Dave Chinner wrote: > >>> do { > >>> - unsigned offset, bytes; > >>

Re: [PATCH 8/9] iomap: Convert iomap_write_end types

2020-08-24 Thread Dave Chinner
On Tue, Aug 25, 2020 at 02:06:05AM +0100, Matthew Wilcox wrote: > On Tue, Aug 25, 2020 at 10:12:23AM +1000, Dave Chinner wrote: > > > -static int > > > -__iomap_write_end(struct inode *inode, loff_t pos, unsigned len, > > > - unsigned copied, struct pa

Re: [PATCH 8/9] iomap: Convert iomap_write_end types

2020-08-24 Thread Dave Chinner
On Tue, Aug 25, 2020 at 02:06:05AM +0100, Matthew Wilcox wrote: > On Tue, Aug 25, 2020 at 10:12:23AM +1000, Dave Chinner wrote: > > > -static int > > > -__iomap_write_end(struct inode *inode, loff_t pos, unsigned len, > > > - unsigned copied, struct pa

Re: [PATCH] iomap: Fix the write_count in iomap_add_to_ioend().

2020-08-24 Thread Dave Chinner
ncy of stable pages in a situation like this - a mmap() write fault could stall for many seconds waiting for a huge bio chain to finish submission and run completion processing even when the IO for the given page we faulted on was completed before the page fault occurred... Hence I think we really do need to cap the length of the bio chains here so that we start completing and ending page writeback on large writeback ranges long before the writeback code finishes submitting the range it was asked to write back. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 9/9] iomap: Change calling convention for zeroing

2020-08-24 Thread Dave Chinner
; > + return length; > > do { > - unsigned offset, bytes; > - > - offset = offset_in_page(pos); > - bytes = min_t(loff_t, PAGE_SIZE - offset, count); > + loff_t bytes; > > if (IS_DAX(inode)) >

Re: [PATCH 9/9] iomap: Change calling convention for zeroing

2020-08-24 Thread Dave Chinner
; > + return length; > > do { > - unsigned offset, bytes; > - > - offset = offset_in_page(pos); > - bytes = min_t(loff_t, PAGE_SIZE - offset, count); > + loff_t bytes; > > if (IS_DAX(inode)) >

Re: [PATCH 8/9] iomap: Convert iomap_write_end types

2020-08-24 Thread Dave Chinner
size_t copied, struct page *page, struct iomap *iomap, > + struct iomap *srcmap) ... this. Otherwise the code looks fine. Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

Re: [PATCH 8/9] iomap: Convert iomap_write_end types

2020-08-24 Thread Dave Chinner
size_t copied, struct page *page, struct iomap *iomap, > + struct iomap *srcmap) ... this. Otherwise the code looks fine. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 6/9] iomap: Convert read_count to byte count

2020-08-24 Thread Dave Chinner
e and why it is intentional... Otherwise the code looks OK. Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

Re: [PATCH 6/9] iomap: Convert read_count to byte count

2020-08-24 Thread Dave Chinner
e and why it is intentional... Otherwise the code looks OK. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 5/9] iomap: Support arbitrarily many blocks per page

2020-08-24 Thread Dave Chinner
uct iomap_page *)page_private(page); > return NULL; Just to confirm: this vm bug check is to needed becuse we only attach the iomap_page to the head page of a compound page? Assuming that I've understood the above correctly: Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.co

Re: [PATCH 5/9] iomap: Support arbitrarily many blocks per page

2020-08-24 Thread Dave Chinner
uct iomap_page *)page_private(page); > return NULL; Just to confirm: this vm bug check is to needed becuse we only attach the iomap_page to the head page of a compound page? Assuming that I've understood the above correctly: Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [PATCH 4/9] iomap: Use bitmap ops to set uptodate bits

2020-08-24 Thread Dave Chinner
ig > --- > fs/iomap/buffered-io.c | 12 ++-- > 1 file changed, 2 insertions(+), 10 deletions(-) Looks good. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To u

Re: [PATCH 3/9] iomap: Use kzalloc to allocate iomap_page

2020-08-24 Thread Dave Chinner
page_private() handles the refcount now. > > Signed-off-by: Matthew Wilcox (Oracle) > Reviewed-by: Christoph Hellwig > --- > fs/iomap/buffered-io.c | 10 +- > 1 file changed, 1 insertion(+), 9 deletions(-) The sooner this goes in the better :) Reviewed-by: Dave

Re: [PATCH 4/9] iomap: Use bitmap ops to set uptodate bits

2020-08-24 Thread Dave Chinner
ig > --- > fs/iomap/buffered-io.c | 12 ++-- > 1 file changed, 2 insertions(+), 10 deletions(-) Looks good. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/9] fs: Introduce i_blocks_per_page

2020-08-24 Thread Dave Chinner
le) > Reviewed-by: Christoph Hellwig > --- > fs/iomap/buffered-io.c | 8 > fs/jfs/jfs_metapage.c | 2 +- > fs/xfs/xfs_aops.c | 2 +- > include/linux/pagemap.h | 16 > 4 files changed, 22 insertions(+), 6 deletions(-) Otherwise looks good. Reviewed

Re: [PATCH 3/9] iomap: Use kzalloc to allocate iomap_page

2020-08-24 Thread Dave Chinner
page_private() handles the refcount now. > > Signed-off-by: Matthew Wilcox (Oracle) > Reviewed-by: Christoph Hellwig > --- > fs/iomap/buffered-io.c | 10 +- > 1 file changed, 1 insertion(+), 9 deletions(-) The sooner this goes in the better :) Reviewed-by: Dave

Re: [PATCH 2/9] fs: Introduce i_blocks_per_page

2020-08-24 Thread Dave Chinner
le) > Reviewed-by: Christoph Hellwig > --- > fs/iomap/buffered-io.c | 8 > fs/jfs/jfs_metapage.c | 2 +- > fs/xfs/xfs_aops.c | 2 +- > include/linux/pagemap.h | 16 > 4 files changed, 22 insertions(+), 6 deletions(-) Otherwise looks good. Reviewed

Re: [PATCH 1/9] iomap: Fix misplaced page flushing

2020-08-24 Thread Dave Chinner
gt; the best place. That means we can remove it from iomap_write_actor(). > > Signed-off-by: Matthew Wilcox (Oracle) > --- > fs/iomap/buffered-io.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) looks good. Reviewed-by: Dave Chinner Cheers, Dave. -- Dave

Re: [PATCH 1/9] iomap: Fix misplaced page flushing

2020-08-24 Thread Dave Chinner
gt; the best place. That means we can remove it from iomap_write_actor(). > > Signed-off-by: Matthew Wilcox (Oracle) > --- > fs/iomap/buffered-io.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) looks good. Reviewed-by: Dave Chinner Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-23 Thread Dave Chinner
ile (i.e. the file itself is not sparse), while the extent size hint will just add 64kB extents into the file around the write offset. That demonstrates the other behavioural advantage that extent size hints have is they avoid needing to extend the file, which is yet another way to serialise concurrent IO and create IO pipeline stalls... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-23 Thread Dave Chinner
ile (i.e. the file itself is not sparse), while the extent size hint will just add 64kB extents into the file around the write offset. That demonstrates the other behavioural advantage that extent size hints have is they avoid needing to extend the file, which is yet another way to serialise concurrent IO and create IO pipeline stalls... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-23 Thread Dave Chinner
and filesytem are doing in real time (e.g. I use PCP for this and visualise ithe behaviour in real time via pmchart) gives a lot of insight into exactly what is changing during transient workload changes liek starting a benchmark... > I was running fio with --ramp_time=5 which ignores the first 5 seconds > of data in order to let performance settle, but if I remove that I can > see the effect more clearly. I can observe it with raw files (in 'off' > and 'prealloc' modes) and qcow2 files in 'prealloc' mode. With qcow2 and > preallocation=off the performance is stable during the whole test. What does "preallocation=off" mean again? Is that using fallocate(ZERO_RANGE) prior to the data write rather than preallocating the metadata/entire file? If so, I would expect the limiting factor is the rate at which IO can be issued because of the fallocate() triggered pipeline bubbles. That leaves idle device time so you're not pushing the limits of the hardware and hence none of the behaviours above will be evident... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-23 Thread Dave Chinner
and filesytem are doing in real time (e.g. I use PCP for this and visualise ithe behaviour in real time via pmchart) gives a lot of insight into exactly what is changing during transient workload changes liek starting a benchmark... > I was running fio with --ramp_time=5 which ignores the first 5 seconds > of data in order to let performance settle, but if I remove that I can > see the effect more clearly. I can observe it with raw files (in 'off' > and 'prealloc' modes) and qcow2 files in 'prealloc' mode. With qcow2 and > preallocation=off the performance is stable during the whole test. What does "preallocation=off" mean again? Is that using fallocate(ZERO_RANGE) prior to the data write rather than preallocating the metadata/entire file? If so, I would expect the limiting factor is the rate at which IO can be issued because of the fallocate() triggered pipeline bubbles. That leaves idle device time so you're not pushing the limits of the hardware and hence none of the behaviours above will be evident... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] iomap: Fix the write_count in iomap_add_to_ioend().

2020-08-21 Thread Dave Chinner
On Fri, Aug 21, 2020 at 10:15:33AM +0530, Ritesh Harjani wrote: > Hello Dave, > > Thanks for reviewing this. > > On 8/21/20 4:41 AM, Dave Chinner wrote: > > On Wed, Aug 19, 2020 at 03:58:41PM +0530, Anju T Sudhakar wrote: > > > From: Ritesh Harjani > > >

Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-20 Thread Dave Chinner
tting written extents, the performance of (1), (2) and (4) will trend towards (5) as writes hit already allocated ranges of the file and the serialisation of extent mapping changes goes away. This occurs with guest filesystems that perform overwrite in place (such as XFS) and hence overwrites of existin

Re: [PATCH v2] mm, THP, swap: fix allocating cluster for swapfile by mistake

2020-08-20 Thread Dave Chinner
On Fri, Aug 21, 2020 at 08:21:45AM +0800, Gao Xiang wrote: > Hi Dave, > > On Fri, Aug 21, 2020 at 09:34:46AM +1000, Dave Chinner wrote: > > On Thu, Aug 20, 2020 at 12:53:23PM +0800, Gao Xiang wrote: > > > SWP_FS is used to make swap_{read,write}page() go through > &

Re: [PATCH v2] mm, THP, swap: fix allocating cluster for swapfile by mistake

2020-08-20 Thread Dave Chinner
cluster size and alignment, does the swap clustering optimisations for swapping THP pages work correctly? And, if so, is there any performance benefit we get from enabling proper THP swap clustering on swapfiles? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] iomap: Fix the write_count in iomap_add_to_ioend().

2020-08-20 Thread Dave Chinner
or that device for us?) Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-20 Thread Dave Chinner
tting written extents, the performance of (1), (2) and (4) will trend towards (5) as writes hit already allocated ranges of the file and the serialisation of extent mapping changes goes away. This occurs with guest filesystems that perform overwrite in place (such as XFS) and hence overwrites of existin

Re: [RFC PATCH 0/8] memcg: Enable fine-grained per process memory control

2020-08-20 Thread Dave Chinner
ry reclaim throttling, not dirty page throttling. balance_dirty_pages() still works just fine as it does not look at device congestion. page cleaning rate is accounted in test_clear_page_writeback(), page dirtying rate is accounted directly in balance_dirty_pages(). That feedback loop has not been broken... And I compeltely agree with Peter here - the control theory we applied to the dirty throttling problem is still 100% valid and so the algorithm still just works all these years later. I've only been saying that allocation should use the same feedback model for reclaim throttling since ~2011... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: WARN_ON_ONCE(1) in iomap_dio_actor()

2020-08-12 Thread Dave Chinner
s at once? So, essentially, you do a DIO read into a mmap()d range from the same file, with DIO read ascending and the mmap() range descending, then once that is done you hole punch the file and do it again? IOWs, this is a racing page_mkwrite()/DIO read workload, and the moment the two threads hit the same block of the file with a DIO read and a page_mkwrite at the same time, it throws a warning. Well, that's completely expected behaviour. DIO is not serialised against mmap() access at all, and so if the page_mkwrite occurs between the writeback and the iomap_apply() call in the dio path, then it will see the delalloc block taht the page-mkwrite allocated. No sane application would ever do this, it's behaviour as expected, so I don't think there's anything to care about here. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 15/20] fuse, dax: Take ->i_mmap_sem lock during dax page fault

2020-08-12 Thread Dave Chinner
On Wed, Aug 12, 2020 at 05:10:12PM -0400, Vivek Goyal wrote: > On Wed, Aug 12, 2020 at 11:23:45AM +1000, Dave Chinner wrote: > > On Tue, Aug 11, 2020 at 01:55:30PM -0400, Vivek Goyal wrote: > > > On Tue, Aug 11, 2020 at 08:22:38AM +1000, Dave Chinner wrote: > > > >

Re: [Virtio-fs] [PATCH v2 15/20] fuse, dax: Take ->i_mmap_sem lock during dax page fault

2020-08-11 Thread Dave Chinner
On Tue, Aug 11, 2020 at 01:55:30PM -0400, Vivek Goyal wrote: > On Tue, Aug 11, 2020 at 08:22:38AM +1000, Dave Chinner wrote: > > On Fri, Aug 07, 2020 at 03:55:21PM -0400, Vivek Goyal wrote: > > > We need some kind of locking mechanism here. Normal file systems like > > >

Re: [PATCH v2 15/20] fuse, dax: Take ->i_mmap_sem lock during dax page fault

2020-08-11 Thread Dave Chinner
On Tue, Aug 11, 2020 at 01:55:30PM -0400, Vivek Goyal wrote: > On Tue, Aug 11, 2020 at 08:22:38AM +1000, Dave Chinner wrote: > > On Fri, Aug 07, 2020 at 03:55:21PM -0400, Vivek Goyal wrote: > > > We need some kind of locking mechanism here. Normal file systems like > > >

Re: [PATCH] fs: RWF_NOWAIT should imply IOCB_NOIO

2020-08-11 Thread Dave Chinner
NOIO as well to restore the previous behavior. > > Fixes: 2e85abf053b9 ("mm: allow read-ahead with IOCB_NOWAIT set") > Reported-by: Dave Chinner > Signed-off-by: Jens Axboe > > --- > > This was a known change with the buffered async read change, but we > didn't

Re: [PATCH 05/15] mm: allow read-ahead with IOCB_NOWAIT set

2020-08-10 Thread Dave Chinner
eries, or > pull a branch that'll go into Linus as well. Jens, Willy, Now that this patch has been merged and IOCB_NOWAIT semantics ifor buffered reads are broken in Linus' tree, what's the plan to get this regression fixed before 5.9 releases? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Virtio-fs] [PATCH v2 13/20] fuse, dax: Implement dax read/write operations

2020-08-10 Thread Dave Chinner
struct iomap *iomap) ditto: fuse_upgrade_dax_mapping(). Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Virtio-fs mailing list Virtio-fs@redhat.com https://www.redhat.com/mailman/listinfo/virtio-fs

Re: [Virtio-fs] [PATCH v2 15/20] fuse, dax: Take ->i_mmap_sem lock during dax page fault

2020-08-10 Thread Dave Chinner
ou can drop all locks The same goes for any other operation that manipulates extents directly (other fallocate ops, truncate, etc). /me also wonders if there can be racing AIO+DIO in progress over the range that is being punched and whether fuse needs to call inode_dio_wait() before punching holes, running truncates, etc... Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Virtio-fs mailing list Virtio-fs@redhat.com https://www.redhat.com/mailman/listinfo/virtio-fs

Re: [PATCH v2 15/20] fuse, dax: Take ->i_mmap_sem lock during dax page fault

2020-08-10 Thread Dave Chinner
ou can drop all locks The same goes for any other operation that manipulates extents directly (other fallocate ops, truncate, etc). /me also wonders if there can be racing AIO+DIO in progress over the range that is being punched and whether fuse needs to call inode_dio_wait() before punching holes, running truncates, etc... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 13/20] fuse, dax: Implement dax read/write operations

2020-08-10 Thread Dave Chinner
struct iomap *iomap) ditto: fuse_upgrade_dax_mapping(). Cheers, Dave. -- Dave Chinner da...@fromorbit.com

<    1   2   3   4   5   6   7   8   9   10   >