from:"Dave Chinner"

Re: [PATCH] dax: Allow block size > PAGE_SIZE

2024-11-04 Thread Dave Chinner

VMM will satisfy by mmapping 128 16K pages from its page cache (at > arbitrary physical memory addresses) into guest "physical" memory as one > contiguous block. Then the guest will see the whole 2MiB mapping as > contiguous, even though it isn't in physical RAM, a

Re: [RFC PATCH 00/20] Introduce the famfs shared-memory file system

2024-05-25 Thread Dave Chinner

x27;t work either. No device created in /dev (dax or pmem). I think you need to do some ndctl magic to get the memory to be namespaced correctly for the correct devices to appear. https://docs.pmem.io/ndctl-user-guide/managing-namespaces IIRC, need to set the type to pmem and the mode to fsdax, devdax or raw to get the relevant device nodes to be created for the range.. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 7/8] Introduce dcache_is_aliasing() across all architectures

2024-01-31 Thread Dave Chinner

On Wed, Jan 31, 2024 at 09:58:21AM -0500, Mathieu Desnoyers wrote: > On 2024-01-30 21:48, Dave Chinner wrote: > > On Tue, Jan 30, 2024 at 11:52:54AM -0500, Mathieu Desnoyers wrote: > > > Introduce a generic way to query whether the dcache is virtually aliased > > >

Re: [RFC PATCH v2 8/8] dax: Fix incorrect list of dcache aliasing architectures

2024-01-30 Thread Dave Chinner

ne liner should go into fs_dax_get_by_bdev(), similar to the blk_queue_dax() check at the start of the function. I also noticed that device mapper uses fs_dax_get_by_bdev() to determine if it can support DAX, but this patch set does not address that case. Hence it really seems to me like fs_dax_get_by_bdev() is the right place to put this check. -Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 7/8] Introduce dcache_is_aliasing() across all architectures

2024-01-30 Thread Dave Chinner

igurations with the VFS dentry cache aliasing when we read this code? Something like cpu_dcache_is_aliased(), perhaps? -Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 1/8] dax: Introduce dax_is_supported()

2024-01-30 Thread Dave Chinner

h currently returns NULL if CONFIG_FS_DAX=n and so should be cahnged to return NULL if any of these platform configs is enabled. Then I don't think you need to change a single line of filesystem code - they'll all just do what they do now if the block device doesn't support DAX -Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH 7/7] xfs: Use dax_is_supported()

2024-01-29 Thread Dave Chinner

is instantiated in cache - if the inode has a flag that says "use DAX" and dax is suppoortable by the hardware, then the turn on DAX for that inode. Otherwise we just use the normal non-dax IO paths. Again, we don't error out the filesystem if DAX is not supported, we just don't turn it on. This check is done in xfs_inode_should_enable_dax() and I think all you need to do is replace the IS_ENABLED(CONFIG_FS_DAX) with a dax_is_supported() call... -Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v9 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

2023-02-05 Thread Dave Chinner

be write() IO dirtying new data or other transactions running dirtying the journal/metadata. Both sync_filesystem() and super_drop_pagecache() operate on current state - they don't prevent future dax mapping instantiation or dirtying from happening on the device, so they don't prevent this... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH] xfs: drop experimental warning for fsdax

2022-09-27 Thread Dave Chinner

On Tue, Sep 27, 2022 at 09:02:48AM -0700, Darrick J. Wong wrote: > On Tue, Sep 27, 2022 at 02:53:14PM +0800, Shiyang Ruan wrote: > > > > > > 在 2022/9/20 5:15, Dave Chinner 写道: > > > On Mon, Sep 19, 2022 at 02:50:03PM +1000, Dave Chinner wrote: > > > >

Re: [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

2022-09-19 Thread Dave Chinner

down, then everything will fail before removal finally triggers, and the act of unmounting the filesystem post device removal will clean up the page cache and all the other caches. IOWs, I don't understand why the page cache is considered special here (as opposed to, say, the inode or dentry caches), nor why we aren't shutting down the filesystem directly after syncing it to disk to ensure that we don't end up with applications losing data as a result of racing with the removal Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH] xfs: drop experimental warning for fsdax

2022-09-19 Thread Dave Chinner

On Tue, Sep 20, 2022 at 09:17:07AM +0800, Shiyang Ruan wrote: > Hi Dave, > > 在 2022/9/20 5:15, Dave Chinner 写道: > > On Mon, Sep 19, 2022 at 02:50:03PM +1000, Dave Chinner wrote: > > > On Thu, Sep 15, 2022 at 09:26:42AM +, Shiyang Ruan wrote: > > > > Since

Re: [RFC PATCH] xfs: drop experimental warning for fsdax

2022-09-19 Thread Dave Chinner

On Mon, Sep 19, 2022 at 02:50:03PM +1000, Dave Chinner wrote: > On Thu, Sep 15, 2022 at 09:26:42AM +, Shiyang Ruan wrote: > > Since reflink&fsdax can work together now, the last obstacle has been > > resolved. It's time to remove restrictions and drop this warning

Re: [RFC PATCH] xfs: drop experimental warning for fsdax

2022-09-18 Thread Dave Chinner

0x3bed bytes) 6( 6 mod 256): TRUNCATE DOWN from 0x4 to 0x28b68 ** 7( 7 mod 256): COLLAPSE 0x14000 thru 0x14fff (0x1000 bytes) 8( 8 mod 256): TRUNCATE UP from 0x27b68 to 0x3a9c4 ** 9( 9 mod 256): READ 0x9cb7 thru 0x19799(0xfae3 bytes) 10( 10 mod 256): PUNCH0x1b3a8 thru 0x1dff8 (0x2c51 bytes) -- Dave Chinner da...@fromorbit.com

Re: [PATCH] xfs: on memory failure, only shut down fs after scanning all mappings

2022-08-21 Thread Dave Chinner

> > While we're at it, add the usual "xfs_" prefix to struct failure_info, > and actually initialize mf_flags. > > Signed-off-by: Darrick J. Wong Looks fine. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [PATCHSETS] v14 fsdax-rmap + v11 fsdax-reflink

2022-05-10 Thread Dave Chinner

: pwritev2(RWF_NOWAIT) can return -EOPNOTSUPP on buffered writes. Documented in the man page. FICLONERANGE on an filesystem that doesn't support reflink will return -EOPNOTSUPP. Documented in the man page. mmap(MAP_SYNC) returns -EOPNOTSUPP if the underlying filesystem and/or storage doesn't support DAX. Documented in the man page. I could go on, but I think I've made the point already... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCHSETS] v14 fsdax-rmap + v11 fsdax-reflink

2022-05-10 Thread Dave Chinner

On Tue, May 10, 2022 at 06:55:50PM -0700, Dan Williams wrote: > [ add Andrew ] > > > On Tue, May 10, 2022 at 6:49 PM Dave Chinner wrote: > > > > On Tue, May 10, 2022 at 05:03:52PM -0700, Darrick J. Wong wrote: > > > On Sun, May 08, 2022 at 10:36:06PM +0800, Shiy

Re: [PATCHSETS] v14 fsdax-rmap + v11 fsdax-reflink

2022-05-10 Thread Dave Chinner

ubt it would be ready for merge in the next cycle... > I could just add the entire series to iomap-5.20-merge and base the > xfs-5.20-merge off of that? But I'm not sure what else might be landing > in the other subsystems, so I'm open to input. It'll need to be a stable branch somewhere, but I don't think it really matters where al long as it's merged into the xfs for-next tree so it gets filesystem test coverage... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v13 0/7] fsdax: introduce fs query to support reflink

2022-04-22 Thread Dave Chinner

On Fri, Apr 22, 2022 at 02:27:32PM -0700, Dan Williams wrote: > On Thu, Apr 21, 2022 at 12:47 AM Dave Chinner wrote: > > > > On Wed, Apr 20, 2022 at 10:54:59PM -0700, Christoph Hellwig wrote: > > > On Thu, Apr 21, 2022 at 02:35:02PM +1000, Dave Chinner wrote: > >

Re: [PATCH v13 0/7] fsdax: introduce fs query to support reflink

2022-04-21 Thread Dave Chinner

On Wed, Apr 20, 2022 at 10:54:59PM -0700, Christoph Hellwig wrote: > On Thu, Apr 21, 2022 at 02:35:02PM +1000, Dave Chinner wrote: > > Sure, I'm not a maintainer and just the stand-in patch shepherd for > > a single release. However, being unable to cleanly merge code we >

Re: [PATCH v13 0/7] fsdax: introduce fs query to support reflink

2022-04-20 Thread Dave Chinner

On Wed, Apr 20, 2022 at 07:20:07PM -0700, Dan Williams wrote: > [ add Andrew and Naoya ] > > On Wed, Apr 20, 2022 at 6:48 PM Shiyang Ruan wrote: > > > > Hi Dave, > > > > 在 2022/4/21 9:20, Dave Chinner 写道: > > > Hi Ruan, > > > > > >

Re: [PATCH v13 0/7] fsdax: introduce fs query to support reflink

2022-04-20 Thread Dave Chinner

o that we can run it through filesystem level DAX+reflink testing. That will mean we need this in a stable shared topic branch and tighter co-ordination between the trees. So before we go any further we need to know if the dax+reflink enablement patchset is near being ready to merge Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v12 6/7] xfs: Implement ->notify_failure() for XFS

2022-04-12 Thread Dave Chinner

On Tue, Apr 12, 2022 at 07:06:40PM -0700, Dan Williams wrote: > On Tue, Apr 12, 2022 at 5:04 PM Dave Chinner wrote: > > On Mon, Apr 11, 2022 at 12:09:03AM +0800, Shiyang Ruan wrote: > > > Introduce xfs_notify_failure.c to handle failure related works, such as > > >

Re: [PATCH v12 6/7] xfs: Implement ->notify_failure() for XFS

2022-04-12 Thread Dave Chinner

structures this rmapbt walk is dependent on (e.g. perag structures) have been initialised yet so there's null pointer dereferences going to happen here. Perhaps even worse is that the rmapbt is not guaranteed to be in consistent state until after log recovery has completed, so this walk could get stuck forever in a stale on-disk cycle that recovery would have corrected Hence these notifications need to be delayed until after the filesystem is mounted, all the internal structures have been set up and log recovery has completed. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: High kmalloc-32 slab cache consumption with 10k containers

2021-04-18 Thread Dave Chinner

On Fri, Apr 16, 2021 at 10:14:39AM +0530, Bharata B Rao wrote: > On Wed, Apr 07, 2021 at 08:28:07AM +1000, Dave Chinner wrote: > > On Mon, Apr 05, 2021 at 11:18:48AM +0530, Bharata B Rao wrote: > > > > > As an alternative approach, I have this below hack that does lazy

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-14 Thread Dave Chinner

On Wed, Apr 14, 2021 at 01:16:52AM -0600, Yu Zhao wrote: > On Tue, Apr 13, 2021 at 10:50 PM Dave Chinner wrote: > > On Tue, Apr 13, 2021 at 09:40:12PM -0600, Yu Zhao wrote: > > > On Tue, Apr 13, 2021 at 5:14 PM Dave Chinner wrote: > > > > Profiles would be intere

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-14 Thread Dave Chinner

On Wed, Apr 14, 2021 at 08:43:36AM -0600, Jens Axboe wrote: > On 4/13/21 5:14 PM, Dave Chinner wrote: > > On Tue, Apr 13, 2021 at 10:13:24AM -0600, Jens Axboe wrote: > >> On 4/13/21 1:51 AM, SeongJae Park wrote: > >>> From: SeongJae Park > >>> > >&

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-13 Thread Dave Chinner

On Tue, Apr 13, 2021 at 09:40:12PM -0600, Yu Zhao wrote: > On Tue, Apr 13, 2021 at 5:14 PM Dave Chinner wrote: > > On Tue, Apr 13, 2021 at 10:13:24AM -0600, Jens Axboe wrote: > > > On 4/13/21 1:51 AM, SeongJae Park wrote: > > > > From: SeongJ

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-13 Thread Dave Chinner

nds to me like reclaim *might* be batching page cache removal better (e.g. fewer, larger batches) and so spending less time contending on the mapping tree lock... IOWs, I suspect this result might actually be a result of less lock contention due to a change in batch processing characteristics of the new algorithm rather than it being a "better" algorithm... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: bl_list and lockdep

2021-04-13 Thread Dave Chinner

On Tue, Apr 13, 2021 at 01:18:35AM +0200, Thomas Gleixner wrote: > Dave, > > On Tue, Apr 13 2021 at 08:15, Dave Chinner wrote: > > On Mon, Apr 12, 2021 at 05:20:53PM +0200, Thomas Gleixner wrote: > >> On Wed, Apr 07 2021 at 07:22, Dave Chinner wrote: > >> &

Re: bl_list and lockdep

2021-04-12 Thread Dave Chinner

On Mon, Apr 12, 2021 at 05:20:53PM +0200, Thomas Gleixner wrote: > Dave, > > On Wed, Apr 07 2021 at 07:22, Dave Chinner wrote: > > On Tue, Apr 06, 2021 at 02:28:34PM +0100, Matthew Wilcox wrote: > >> On Tue, Apr 06, 2021 at 10:33:43PM +1000, Dave Chinner wrote:

Re: [RFC bpf-next 1/1] bpf: Introduce iter_pagecache

2021-04-08 Thread Dave Chinner

iated with a memcg very quickly (via mem_cgroup_lruvec()). This will find pages associated directly with the memcg, so it gives you a fairly accurate picture of the page cache usage within the container. This has none of the issues that arise from "sb != mnt_ns" that walking superblocks and inode lists have, and it doesn't require you to play games with mounts, superblocks and inode references Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: High kmalloc-32 slab cache consumption with 10k containers

2021-04-06 Thread Dave Chinner

RINKER_MEMCG_AWARE flag. This could be based on fstype - most virtual filesystems that expose system information do not really need full memcg awareness because they are generally only visible to a single memcg instance... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: bl_list and lockdep

2021-04-06 Thread Dave Chinner

On Tue, Apr 06, 2021 at 02:28:34PM +0100, Matthew Wilcox wrote: > On Tue, Apr 06, 2021 at 10:33:43PM +1000, Dave Chinner wrote: > > +++ b/fs/inode.c > > @@ -57,8 +57,7 @@ > > > > static unsigned int i_hash_mask __read_mostly; > > static unsigned int i_hash

[PATCH 3/3] vfs: inode cache conversion to hash-bl

2021-04-06 Thread Dave Chinner

From: Dave Chinner Because scalability of the global inode_hash_lock really, really sucks and prevents me from doing scalability characterisation and analysis of bcachefs algorithms. Profiles of a 32-way concurrent create of 51.2m inodes with fsmark on a couple of different filesystems on a

[PATCH 2/3] hlist-bl: add hlist_bl_fake()

2021-04-06 Thread Dave Chinner

From: Dave Chinner in preparation for switching the VFS inode cache over the hlist_bl lists, we nee dto be able to fake a list node that looks like it is hased for correct operation of filesystems that don't directly use the VFS indoe cache. Signed-off-by: Dave Chinner --- include/

[PATCH 1/3] vfs: factor out inode hash head calculation

2021-04-06 Thread Dave Chinner

From: Dave Chinner In preparation for changing the inode hash table implementation. Signed-off-by: Dave Chinner --- fs/inode.c | 44 +--- 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index a047ab306f9a

[RFC PATCH 0/3] vfs: convert inode cache to hlist-bl

2021-04-06 Thread Dave Chinner

Hi folks, Recently I've been doing some scalability characterisation of various filesystems, and one of the limiting factors that has prevented me from exploring filesystem characteristics is the inode hash table. namely, the global inode_hash_lock that protects it. This has long been a problem,

Re: [PATCH 2/3] mm, dax, pmem: Introduce dev_pagemap_failure()

2021-03-19 Thread Dave Chinner

On Thu, Mar 18, 2021 at 12:20:35PM -0700, Dan Williams wrote: > On Wed, Mar 17, 2021 at 9:58 PM Dave Chinner wrote: > > > > On Wed, Mar 17, 2021 at 09:08:23PM -0700, Dan Williams wrote: > > > Jason wondered why the get_user_pages_fast() path takes references on a

Re: [PATCH 2/3] mm, dax, pmem: Introduce dev_pagemap_failure()

2021-03-17 Thread Dave Chinner

to run a device wide invalidation. SO, yeah, I think this should simply be a single ranged call to the filesystem like: ->memory_failure(dev, 0, -1ULL) to tell the filesystem that the entire backing device has gone away, and leave the filesystem to handle failure entirely at the filesystem level. -Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v4 00/25] Page folios

2021-03-15 Thread Dave Chinner

on would be for a "struct cage" as in Compound pAGE Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Metadata writtenback notification? -- was Re: fscache: Redesigning the on-disk cache

2021-03-08 Thread Dave Chinner

gt; synchronous metadata changes being committed to the cache in one go > (truncates, fallocates, fsync, xattrs, unlink+link of tmpfile) - and this > can take quite a long time. The cache needs to be more proactive in > getting stuff committed as it goes along. Workqueues giv

Re: fscache: Redesigning the on-disk cache

2021-03-08 Thread Dave Chinner

ted to written until the data is written back and the filesystem runs a conversion transaction. So, yeah, if you use FIEMAP to determine where data lies in a file that is being actively modified, you're going get corrupt data sooner rather than later. SEEK_HOLE/DATA are coherent with in memory user data, so don't have this problem. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-03-02 Thread Dave Chinner

ough the layers, and device disappearance may in fact manifest to the user as data corruption rather than causing data to be inaccessible. Hence "remove" notifications just don't work in the storage stack. They need to be translated to block ranges going bad (i.e. media errors), a

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-03-02 Thread Dave Chinner

On Mon, Mar 01, 2021 at 07:33:28PM -0800, Dan Williams wrote: > On Mon, Mar 1, 2021 at 6:42 PM Dave Chinner wrote: > [..] > > We do not need a DAX specific mechanism to tell us "DAX device > > gone", we need a generic block device interface that tells us "

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-03-01 Thread Dave Chinner

On Mon, Mar 01, 2021 at 04:32:36PM -0800, Dan Williams wrote: > On Mon, Mar 1, 2021 at 2:47 PM Dave Chinner wrote: > > Now we have the filesytem people providing a mechanism for the pmem > > devices to tell the filesystems about physical device failures so > > they can

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-03-01 Thread Dave Chinner

On Mon, Mar 01, 2021 at 12:55:53PM -0800, Dan Williams wrote: > On Sun, Feb 28, 2021 at 2:39 PM Dave Chinner wrote: > > > > On Sat, Feb 27, 2021 at 03:40:24PM -0800, Dan Williams wrote: > > > On Sat, Feb 27, 2021 at 2:36 PM Dave Chinner wrote: > > > > On F

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-28 Thread Dave Chinner

On Sat, Feb 27, 2021 at 03:40:24PM -0800, Dan Williams wrote: > On Sat, Feb 27, 2021 at 2:36 PM Dave Chinner wrote: > > On Fri, Feb 26, 2021 at 02:41:34PM -0800, Dan Williams wrote: > > > On Fri, Feb 26, 2021 at 1:28 PM Dave Chinner wrote: > > > > On Fri, Feb 26,

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-27 Thread Dave Chinner

On Fri, Feb 26, 2021 at 02:41:34PM -0800, Dan Williams wrote: > On Fri, Feb 26, 2021 at 1:28 PM Dave Chinner wrote: > > On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote: > > > On Fri, Feb 26, 2021 at 12:51 PM Dave Chinner wrote: > > > > > My imm

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-26 Thread Dave Chinner

On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote: > On Fri, Feb 26, 2021 at 12:51 PM Dave Chinner wrote: > > > > On Fri, Feb 26, 2021 at 11:24:53AM -0800, Dan Williams wrote: > > > On Fri, Feb 26, 2021 at 11:05 AM Darrick J. Wong > > > wrote: > &

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-26 Thread Dave Chinner

hen when userspace tries to access the mapped DAX pages we get a new page fault. In processing the fault, the filesystem will try to get direct access to the pmem from the block device. This will get an ENODEV error from the block device because because the backing store (pmem) has been unplugged and is no longer there... AFAICT, as long as pmem removal invalidates all the active ptes that point at the pmem being removed, the filesystem doesn't need to care about device removal at all, DAX or no DAX... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v5 0/4] add simple copy support

2021-02-21 Thread Dave Chinner

n care at this point about cross-device XCOPY at this point? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

2021-02-14 Thread Dave Chinner

On Fri, Feb 12, 2021 at 03:54:48PM -0800, Darrick J. Wong wrote: > On Sat, Feb 13, 2021 at 10:27:26AM +1100, Dave Chinner wrote: > > On Fri, Feb 12, 2021 at 03:07:39PM -0800, Ian Lance Taylor wrote: > > > On Fri, Feb 12, 2021 at 3:03 PM Dave Chinner wrote: > > > > &

Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

2021-02-12 Thread Dave Chinner

On Fri, Feb 12, 2021 at 03:07:39PM -0800, Ian Lance Taylor wrote: > On Fri, Feb 12, 2021 at 3:03 PM Dave Chinner wrote: > > > > On Fri, Feb 12, 2021 at 04:45:41PM +0100, Greg KH wrote: > > > On Fri, Feb 12, 2021 at 07:33:57AM -0800, Ian Lance Taylor wrote: > > > &

Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

2021-02-12 Thread Dave Chinner

ly breaking? What changed in > > the kernel that caused this? Procfs has been around for a _very_ long > > time :) > > That would be because of (v5.3): > > 5dae222a5ff0 vfs: allow copy_file_range to copy across devices > > The intention of this change (series) was to

Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

2021-02-12 Thread Dave Chinner

It is not intended as a copy mechanism for copying data from one random file descriptor to another. The use of it as a general file copy mechanism in the Go system library is incorrect and wrong. It is a userspace bug. Userspace has done the wrong thing, userspace needs to be fixed. -Dave. -- Dave Chinner da...@fromorbit.com

Re: rcu: INFO: rcu_sched self-detected stall on CPU: Workqueue: xfs-conv/md0 xfs_end_io

2021-02-08 Thread Dave Chinner

back. It's likely to be too much work for a bound workqueue, too, especially when you consider that the workqueue completion code will merge sequential ioends into one ioend, hence making the IO completion loop counts bigger and latency problems worse rather than better... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 00/18] new API for FS_IOC_[GS]ETFLAGS/FS_IOC_FS[GS]ETXATTR

2021-02-07 Thread Dave Chinner

to list the requested attributes of all directories and files in the tree... So, yeah, we do indeed do thousands of these fsxattr based operations a second, sometimes tens of thousands a second or more, and sometimes they are issued in bulk in performance critical paths for container build/deployment operations Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] fs: generic_copy_file_checks: Do not adjust count based on file size

2021-01-26 Thread Dave Chinner

mechanisms. Of course, with these special zero length files that contain ephemeral data, userspace can't actually tell that they contain data from userspace using stat(). So as far as userspace is concerned, copy_file_range() correctly returned zero bytes copied from a zero byte long file and there's nothing more to do. This zero length file behaviour is, fundamentally, a kernel filesystem implementation bug, not a copy_file_range() bug. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [BUG] copy_file_range with sysfs file as input

2021-01-26 Thread Dave Chinner

On Tue, Jan 26, 2021 at 11:50:50AM +0800, Nicolas Boichat wrote: > On Tue, Jan 26, 2021 at 9:34 AM Dave Chinner wrote: > > > > On Mon, Jan 25, 2021 at 03:54:31PM +0800, Nicolas Boichat wrote: > > > Hi copy_file_range experts, > > > > > > We hit this in

Re: [BUG] copy_file_range with sysfs file as input

2021-01-26 Thread Dave Chinner

x27;t check the file size and just attempts to read unconditionally from the file. Hence it happily returns non-existent stale data from busted filesystem implementations that allow data to be read from beyond EOF... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Expense of read_iter

2021-01-19 Thread Dave Chinner

and so provide the same benefit to all the filesystems that use it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2021-01-13 Thread Dave Chinner

On Fri, Jan 08, 2021 at 11:56:57AM -0500, Brian Foster wrote: > On Fri, Jan 08, 2021 at 08:54:44AM +1100, Dave Chinner wrote: > > e.g. we run the first transaction into the CIL, it steals the sapce > > needed for the cil checkpoint headers for the transaciton. Then if > > the

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2021-01-13 Thread Dave Chinner

On Mon, Jan 11, 2021 at 11:38:48AM -0500, Brian Foster wrote: > On Fri, Jan 08, 2021 at 11:56:57AM -0500, Brian Foster wrote: > > On Fri, Jan 08, 2021 at 08:54:44AM +1100, Dave Chinner wrote: > > > On Mon, Jan 04, 2021 at 11:23:53AM -0500, Brian Foster wrote: > > > >

Re: [PATCH] mm: vmscan: support complete shrinker reclaim

2021-01-10 Thread Dave Chinner

.com/ and that should also allow accrual of the work skipped on each memcg be accounted across multiple calls to the shrinkers for the same memcg. Hence as memory pressure within the memcg goes up, the repeated calls to direct reclaim within that memcg will result in all of the freeable items in each cache eventually being freed... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH] fs: block_dev: compute nr_vecs hint for improving writeback bvecs allocation

2021-01-08 Thread Dave Chinner

On Fri, Jan 08, 2021 at 03:59:22PM +0800, Ming Lei wrote: > On Thu, Jan 07, 2021 at 09:21:11AM +1100, Dave Chinner wrote: > > On Wed, Jan 06, 2021 at 04:45:48PM +0800, Ming Lei wrote: > > > On Tue, Jan 05, 2021 at 07:39:38PM +0100, Christoph Hellwig wrote: > > > > A

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2021-01-07 Thread Dave Chinner

On Sun, Jan 03, 2021 at 05:03:33PM +0100, Donald Buczek wrote: > On 02.01.21 23:44, Dave Chinner wrote: > > On Sat, Jan 02, 2021 at 08:12:56PM +0100, Donald Buczek wrote: > > > On 31.12.20 22:59, Dave Chinner wrote: > > > > On Thu, Dec 31, 2020 at 12:48:5

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2021-01-07 Thread Dave Chinner

On Mon, Jan 04, 2021 at 11:23:53AM -0500, Brian Foster wrote: > On Thu, Dec 31, 2020 at 09:16:11AM +1100, Dave Chinner wrote: > > On Wed, Dec 30, 2020 at 12:56:27AM +0100, Donald Buczek wrote: > > > If the value goes below the limit while some threads are > > > already

Re: [RFC PATCH] fs: block_dev: compute nr_vecs hint for improving writeback bvecs allocation

2021-01-06 Thread Dave Chinner

rything we need to determine whether we should do a large or small bio vec allocation in the iomap writeback path... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2021-01-02 Thread Dave Chinner

On Sat, Jan 02, 2021 at 08:12:56PM +0100, Donald Buczek wrote: > On 31.12.20 22:59, Dave Chinner wrote: > > On Thu, Dec 31, 2020 at 12:48:56PM +0100, Donald Buczek wrote: > > > On 30.12.20 23:16, Dave Chinner wrote: > > One could argue that, but one should al

Re: [xfs] db962cd266: Assertion_failed

2021-01-01 Thread Dave Chinner

lifts of the context setting up into xfs_trans_alloc() back into the patchset before adding the current->journal functionality patch. Also, you need to test XFS code with CONFIG_XFS_DEBUG=y so that asserts are actually built into the code and exercised, because this ASSERT should have fired on the first rolling transaction that the kernel executes... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2020-12-31 Thread Dave Chinner

On Thu, Dec 31, 2020 at 12:48:56PM +0100, Donald Buczek wrote: > On 30.12.20 23:16, Dave Chinner wrote: > > On Wed, Dec 30, 2020 at 12:56:27AM +0100, Donald Buczek wrote: > > > Threads, which committed items to the CIL, wait in the > > > xc_push_wait waitqueue when use

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2020-12-30 Thread Dave Chinner

> wake_up_all(&cil->xc_push_wait); That just smells wrong to me. It *might* be correct, but this condition should pair with the sleep condition, as space used by a CIL context should never actually decrease Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: v5.10.1 xfs deadlock

2020-12-18 Thread Dave Chinner

is > related to that, because the md block devices itself are > responsive (`xxd /dev/md0` ) My bet is that the OOT driver/hardware had dropped a log IO on the floor - XFS is waiting for the CIL push to complete, and I'm betting that is stuck waiting for iclog IO completion while writing the CIL to the journal. The sysrq output will tell us if this is the case, so that's the first place to look. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 7/9] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers

2020-12-17 Thread Dave Chinner

inspection. But I'm > not a VFS expert so I'm not quite sure. Uh, if you have a shrinker racing to register and unregister, you've got a major bug in your object initialisation/teardown code. i.e. calling reagister/unregister at the same time for the same shrinker is a bug, pure and simple. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v3 4/9] mm, fsdax: Refactor memory-failure handler for dax mapping

2020-12-16 Thread Dave Chinner

way. So, AFAICT, the dax_lock() stuff is only necessary when the filesystem can't be used to resolve the owner of physical page that went bad Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 6/9] mm: vmscan: use per memcg nr_deferred of shrinker

2020-12-15 Thread Dave Chinner

On Tue, Dec 15, 2020 at 02:27:18PM -0800, Yang Shi wrote: > On Mon, Dec 14, 2020 at 6:46 PM Dave Chinner wrote: > > > > On Mon, Dec 14, 2020 at 02:37:19PM -0800, Yang Shi wrote: > > > Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's >

Re: [RFC PATCH v3 8/9] md: Implement ->corrupted_range()

2020-12-15 Thread Dave Chinner

Combine that with the proposed "watch_sb()" syscall for reporting such errors in a generic manner to interested listeners, and we've got a fairly solid generic path for reporting data loss events to userspace for an appropriate user-defined action to be taken... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-15 Thread Dave Chinner

u still have a uesr data recovery process to perform after this... > And how does it help in dealing with page faults upon poisoned > dax page? It doesn't. If the page is poisoned, the same behaviour will occur as does now. This is simply error reporting infrastructure, not error handling. Future work might change how we correct the faults found in the storage, but I think the user visible behaviour is going to be "kill apps mapping corrupted data" for a long time yet Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-15 Thread Dave Chinner

On Tue, Dec 15, 2020 at 02:53:48PM +0100, Johannes Weiner wrote: > On Tue, Dec 15, 2020 at 01:09:57PM +1100, Dave Chinner wrote: > > On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote: > > > Since memcg_shrinker_map_size just can be changd under holding &g

Re: [v2 PATCH 7/9] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers

2020-12-14 Thread Dave Chinner

return; > > kfree(shrinker->nr_deferred); > shrinker->nr_deferred = NULL; e.g. then this function can simply do: { if (shrinker->flags & SHRINKER_MEMCG_AWARE) return unregister_memcg_shrinker(shrinker); kfree(shrinker->nr_deferred); shrinker->nr_deferred = NULL; } Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 8/9] mm: memcontrol: reparent nr_deferred when memcg offline

2020-12-14 Thread Dave Chinner

acd..693a41e89969 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -201,7 +201,7 @@ DECLARE_RWSEM(shrinker_rwsem); > #define SHRINKER_REGISTERING ((struct shrinker *)~0UL) > > static DEFINE_IDR(shrinker_idr); > -static int shrinker_nr_max; > +int shrinker_nr_max; Then we don't need to make yet another variable global... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 9/9] mm: vmscan: shrink deferred objects proportional to priority

2020-12-14 Thread Dave Chinner

ile it may help your specific corner case, it's likely to significantly change the reclaim balance of slab caches, especially under GFP_NOFS intensive workloads where we can only defer the work to kswapd. Hence I think this is still a problematic approach as it doesn't address the reason why deferred counts are increasing out of control in the first place Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 6/9] mm: vmscan: use per memcg nr_deferred of shrinker

2020-12-14 Thread Dave Chinner

r will do that for static functions automatically if it makes sense. Ok, so you only do the memcg nr_deferred thing if NUMA_AWARE && sc->memcg is true. so static long shrink_slab_set_nr_deferred_memcg(...) { int nid = sc->nid; deferred = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_deferred, true); return atomic_long_add_return(nr, &deferred->nr_deferred[id]); } static long shrink_slab_set_nr_deferred(...) { int nid = sc->nid; if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) nid = 0; else if (sc->memcg) return shrink_slab_set_nr_deferred_memcg(, nid); return atomic_long_add_return(nr, &shrinker->nr_deferred[nid]); } And now there's no duplicated code. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 5/9] mm: memcontrol: add per memcg shrinker nr_deferred

2020-12-14 Thread Dave Chinner

nd nr_deferred pointers to the correct offset in the allocated range. Then this patch is really only changes to the size of the chunk being allocated, setting up the pointers and copying the relevant data from the old to new. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-14 Thread Dave Chinner

is a good idea. This couples the shrinker infrastructure to internal details of how cgroups are initialised and managed. Sure, certain operations might be done in certain shrinker lock contexts, but that doesn't mean we should share global locks across otherwise independent subsystems Chee

Re: [v2 PATCH 3/9] mm: vmscan: guarantee shrinker_slab_memcg() sees valid shrinker_maps for online memcg

2020-12-14 Thread Dave Chinner

up that the barriers enforce. IOWs, these memory barriers belong inside the cgroup code to guarantee anything that sees an online cgroup will always see the fully initialised cgroup structures. They do not belong in the shrinker infrastructure... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 4/6] block/psi: remove PSI annotations from direct IO

2020-12-14 Thread Dave Chinner

On Tue, Dec 15, 2020 at 01:03:45AM +, Pavel Begunkov wrote: > On 15/12/2020 00:56, Dave Chinner wrote: > > On Tue, Dec 15, 2020 at 12:20:23AM +, Pavel Begunkov wrote: > >> As reported, we must not do pressure stall information accounting for > >> direct IO, beca

Re: [PATCH v1 5/6] bio: add a helper calculating nr segments to alloc

2020-12-14 Thread Dave Chinner

On Tue, Dec 15, 2020 at 12:00:23PM +1100, Dave Chinner wrote: > On Tue, Dec 15, 2020 at 12:20:24AM +, Pavel Begunkov wrote: > > A preparation patch. It adds a simple helper which abstracts out number > > of segments we're allocating for a bio from iov_iter_npages(). >

Re: [PATCH v1 6/6] block/iomap: don't copy bvec for direct IO

2020-12-14 Thread Dave Chinner

io_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs) > { > + /* reuse iter->bvec */ > + if (iov_iter_is_bvec(iter)) > + return 0; > return iov_iter_npages(iter, max_segs); Ah, I'm a blind idiot... :/ Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 5/6] bio: add a helper calculating nr segments to alloc

2020-12-14 Thread Dave Chinner

de this specific patch, so it's not clear what it's actually needed for... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 4/6] block/psi: remove PSI annotations from direct IO

2020-12-14 Thread Dave Chinner

for paging IO */ > + bio_clear_flag(bio, BIO_WORKINGSET); Why only do this for the old direct IO path? Why isn't this necessary for the iomap DIO path? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-06 Thread Dave Chinner

On Wed, Dec 02, 2020 at 03:12:20PM +0800, Ruan Shiyang wrote: > Hi Dave, > > On 2020/11/30 上午6:47, Dave Chinner wrote: > > On Mon, Nov 23, 2020 at 08:41:10AM +0800, Shiyang Ruan wrote: > > > > > > The call trace is like this: > > > memory_fail

Re: [PATCH V2] uapi: fix statx attribute value overlap for DAX & MOUNT_ROOT

2020-12-02 Thread Dave Chinner

On Wed, Dec 02, 2020 at 10:04:17PM +0100, Greg Kroah-Hartman wrote: > On Thu, Dec 03, 2020 at 07:40:45AM +1100, Dave Chinner wrote: > > On Wed, Dec 02, 2020 at 08:06:01PM +0100, Greg Kroah-Hartman wrote: > > > On Wed, Dec 02, 2020 at 06:41:43PM +0100, Miklos Szeredi wrote: >

Re: [PATCH V2] uapi: fix statx attribute value overlap for DAX & MOUNT_ROOT

2020-12-02 Thread Dave Chinner

orrect regressions in fixes before they get propagated to users. It also creates a clear demarcation between fixes and cc: stable for maintainers and developers: only patches with a cc: stable will be backported immediately to stable. Developers know what patches need urgent backports and, unlike developers, the automated fixes scan does not have the subject matter expertise or background to make that judgement Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/2] statx: move STATX_ATTR_DAX attribute handling to filesystems

2020-12-01 Thread Dave Chinner

r that filesystem instance then, by definition, it does not support DAX and the bit should never be set. e.g. We don't talk about kernels that support reflink - what matters to userspace is whether the filesystem instance supports reflink. Think of the useless mess that xfs_info would be if it reported kernel capabilities instead of filesystem instance capabilities. i.e. we don't report that a filesystem supports reflink just because the kernel supports it - it reports whether the filesystem instance being queried supports reflink. And that also implies the kernel supports it, because the kernel has to support it to mount the filesystem... So, yeah, I think it really does need to be conditional on the filesystem instance being queried to be actually useful to users Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-11-29 Thread Dave Chinner

is cached then we can try to re-write it to disk to fix the bad data, otherwise we treat it like a writeback error and report it on the next write/fsync/close operation done on that file. This gets rid of the mf_recover_controller altogether and allows the interface to be used by any sort of block device for any sort of bottom-up reporting of media/device failures. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH AUTOSEL 5.9 33/33] xfs: don't allow NOWAIT DIO across extent boundaries

2020-11-25 Thread Dave Chinner

On Wed, Nov 25, 2020 at 06:46:54PM -0500, Sasha Levin wrote: > On Thu, Nov 26, 2020 at 08:52:47AM +1100, Dave Chinner wrote: > > We've already had one XFS upstream kernel regression in this -rc > > cycle propagated to the stable kernels in 5.9.9 because the stable > > pr

Re: [PATCH AUTOSEL 5.9 33/33] xfs: don't allow NOWAIT DIO across extent boundaries

2020-11-25 Thread Dave Chinner

On Wed, Nov 25, 2020 at 10:35:50AM -0500, Sasha Levin wrote: > From: Dave Chinner > > [ Upstream commit 883a790a84401f6f55992887fd7263d808d4d05d ] > > Jens has reported a situation where partial direct IOs can be issued > and completed yet still return -EAGAIN. We don't

Re: [PATCH] fs/stat: set attributes_mask for STATX_ATTR_DAX

2020-11-23 Thread Dave Chinner

TX_ATTR_DAX in statx for either the attributes or attributes_mask field because the filesystem is not DAX capable. And given that we have filesystems with multiple block devices that can have different DAX capabilities, I think this statx() attr state (and mask) really has to come from the filesystem, not VFS... > Extra question: should we only set this in the attributes mask if > CONFIG_FS_DAX=y ? IMO, yes, because it will always be false on CONFIG_FS_DAX=n and so it may well as not be emitted as a supported bit in the mask. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/2] xfs: show the dax option in mount options.

2020-11-11 Thread Dave Chinner

On Wed, Nov 11, 2020 at 11:28:48AM +0100, Michal Suchánek wrote: > On Tue, Nov 10, 2020 at 08:08:23AM +1100, Dave Chinner wrote: > > On Mon, Nov 09, 2020 at 09:27:05PM +0100, Michal Suchánek wrote: > > > On Mon, Nov 09, 2020 at 11:24:19AM -0800, Darrick J. Wong wrote: > >

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1188 matches

Mail list logo