Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-03-02 Thread Dave Chinner
ough the layers, and device disappearance may in fact manifest to the user as data corruption rather than causing data to be inaccessible. Hence "remove" notifications just don't work in the storage stack. They need to be translated to block ranges going bad (i.e. media errors), a

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-03-02 Thread Dave Chinner
On Mon, Mar 01, 2021 at 07:33:28PM -0800, Dan Williams wrote: > On Mon, Mar 1, 2021 at 6:42 PM Dave Chinner wrote: > [..] > > We do not need a DAX specific mechanism to tell us "DAX device > > gone", we need a generic block device interface that tells us "

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-03-02 Thread Dave Chinner
On Mon, Mar 01, 2021 at 04:32:36PM -0800, Dan Williams wrote: > On Mon, Mar 1, 2021 at 2:47 PM Dave Chinner wrote: > > Now we have the filesytem people providing a mechanism for the pmem > > devices to tell the filesystems about physical device failures so > > they can

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-03-01 Thread Dave Chinner
On Mon, Mar 01, 2021 at 12:55:53PM -0800, Dan Williams wrote: > On Sun, Feb 28, 2021 at 2:39 PM Dave Chinner wrote: > > > > On Sat, Feb 27, 2021 at 03:40:24PM -0800, Dan Williams wrote: > > > On Sat, Feb 27, 2021 at 2:36 PM Dave Chinner wrote: > > > > On F

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-28 Thread Dave Chinner
On Sat, Feb 27, 2021 at 03:40:24PM -0800, Dan Williams wrote: > On Sat, Feb 27, 2021 at 2:36 PM Dave Chinner wrote: > > On Fri, Feb 26, 2021 at 02:41:34PM -0800, Dan Williams wrote: > > > On Fri, Feb 26, 2021 at 1:28 PM Dave Chinner wrote: > > > > On Fri, Feb 26,

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-27 Thread Dave Chinner
On Fri, Feb 26, 2021 at 02:41:34PM -0800, Dan Williams wrote: > On Fri, Feb 26, 2021 at 1:28 PM Dave Chinner wrote: > > On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote: > > > On Fri, Feb 26, 2021 at 12:51 PM Dave Chinner wrote: > > > > > My imm

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-26 Thread Dave Chinner
On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote: > On Fri, Feb 26, 2021 at 12:51 PM Dave Chinner wrote: > > > > On Fri, Feb 26, 2021 at 11:24:53AM -0800, Dan Williams wrote: > > > On Fri, Feb 26, 2021 at 11:05 AM Darrick J. Wong > > > wrote: > &

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-26 Thread Dave Chinner
hen when userspace tries to access the mapped DAX pages we get a new page fault. In processing the fault, the filesystem will try to get direct access to the pmem from the block device. This will get an ENODEV error from the block device because because the backing store (pmem) has been unplugged and is no longer there... AFAICT, as long as pmem removal invalidates all the active ptes that point at the pmem being removed, the filesystem doesn't need to care about device removal at all, DAX or no DAX... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: page->index limitation on 32bit system?

2021-02-21 Thread Dave Chinner
fix it in mainline that I know of. > As I said, some vendors have tried to fix it in their NAS products, > but I don't know where to find that patch any more. It's not suportable from a disaster recovery perspective. I recently saw a 14TB filesystem with billions of hardlinks in it require 240GB of RAM to run xfs_repair. We just can't support large filesystems on 32 bit systems, and it has nothing to do with simple stuff like page cache index sizes... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: page->index limitation on 32bit system?

2021-02-21 Thread Dave Chinner
fset for such systems to 16TB so sparse files can't be larger than what the kernel supports. See xfs_sb_validate_fsb_count() call and the file offset checks against MAX_LFS_FILESIZE in xfs_fs_fill_super()... FWIW, XFS has been doing this for roughly 20 years now - >16TB on 32 bit machines w

Re: Unexpected reflink/subvol snapshot behaviour

2021-02-01 Thread Dave Chinner
On Mon, Feb 01, 2021 at 06:14:21PM -0800, Darrick J. Wong wrote: > On Fri, Jan 22, 2021 at 09:20:51AM +1100, Dave Chinner wrote: > > Hi btrfs-gurus, > > > > I'm running a simple reflink/snapshot/COW scalability test at the > > moment. It is just a loop that d

Re: Unexpected reflink/subvol snapshot behaviour

2021-02-01 Thread Dave Chinner
On Fri, Jan 29, 2021 at 06:25:50PM -0500, Zygo Blaxell wrote: > On Mon, Jan 25, 2021 at 09:36:55AM +1100, Dave Chinner wrote: > > On Sat, Jan 23, 2021 at 04:42:33PM +0800, Qu Wenruo wrote: > > > > > > > > > On 2021/1/22 上午6:20, Dave Chinner wrote: > >

Re: Unexpected reflink/subvol snapshot behaviour

2021-01-24 Thread Dave Chinner
On Sat, Jan 23, 2021 at 04:42:33PM +0800, Qu Wenruo wrote: > > > On 2021/1/22 上午6:20, Dave Chinner wrote: > > Hi btrfs-gurus, > > > > I'm running a simple reflink/snapshot/COW scalability test at the > > moment. It is just a loop that does "fio overwri

Re: Unexpected reflink/subvol snapshot behaviour

2021-01-24 Thread Dave Chinner
On Sat, Jan 23, 2021 at 07:19:03PM -0500, Zygo Blaxell wrote: > On Fri, Jan 22, 2021 at 09:20:51AM +1100, Dave Chinner wrote: > > Hi btrfs-gurus, > > > > I'm running a simple reflink/snapshot/COW scalability test at the > > moment. It is just a loop that d

Unexpected reflink/subvol snapshot behaviour

2021-01-21 Thread Dave Chinner
workload, I suspect the issues I note above are btrfs issues, not expected behaviour. I'm not sure what the expected scalability of btrfs file clones and snapshots are though, so I'm interested to hear if these results are expected or not. Cheers, Dave. -- Dave Chinner da...@fromorbit.com JOBS=4 IODEPTH=4 IOCOUNT=$((1 / $JOBS)) FILESIZE=4g cat >$fio_config <

Re: [PATCH 1/2] iomap: Separate out generic_write_sync() from iomap_dio_complete()

2020-12-15 Thread Dave Chinner
AP_DIO_NEED_SYNC)) > - ret = generic_write_sync(iocb, ret); > + ret = generic_write_sync(dio->iocb, ret); > > kfree(dio); > > return ret; > } > -EXPORT_SYMBOL_GPL(iomap_dio_complete); > + NACK. If you don't want iomap_dio_comple

Re: [RFC PATCH v2 0/5] fs: interface for directly reading/writing compressed data

2019-10-20 Thread Dave Chinner
em. It is based on my previous series which > added a Btrfs-specific ioctl [1], but it is now an extension to > preadv2()/pwritev2() as suggested by Dave Chinner [2]. I've included a > man page patch describing the API in detail. Test cases and examples > programs are available [3]. &g

Re: [RFC PATCH 2/3] fs: add RWF_ENCODED for writing compressed data

2019-09-25 Thread Dave Chinner
On Wed, Sep 25, 2019 at 08:07:12AM -0400, Colin Walters wrote: > > > On Wed, Sep 25, 2019, at 3:11 AM, Dave Chinner wrote: > > > > We're talking about user data read/write access here, not some > > special security capability. Access to the data has already bee

Re: [RFC PATCH 2/3] fs: add RWF_ENCODED for writing compressed data

2019-09-25 Thread Dave Chinner
checked, so why should the format that the data is supplied to the kernel in suddenly require new privilege checks? i.e. writing encoded data to a file requires exactly the same access permissions as writing cleartext data to the file. The only extra information here is whether the _filesystem_ supports encoded data, and that doesn't change regardless of what the open file gets passed to. Hence the capability is either there or it isn't, it doesn't transform not matter what privilege boundary the file is passed across. Similarly, we have permission to access the data or we don't through the struct file, it doesn't transform either. Hence I don't see why CAP_SYS_ADMIN or any special permissions are needed for an application with access permissions to file data to use these RWF_ENCODED IO interfaces. I am inclined to think the permission check here is wrong and should be dropped, and then all these issues go away. Yes, the app that is going to use this needs root perms because it accesses all data in the fs (it's a backup app!), but that doesn't mean you can only use RWF_ENCODED if you have root perms. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 15/15] xfs: Use the new iomap infrastructure for CoW

2019-09-06 Thread Dave Chinner
; > This now at least survives xfstests -g quick on a 4k xfs file system > for. Here is my current tree: > > http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/xfs-cow-iomap That looks somewhat reasonable. The XFS mapping function is turning into spagetti and getting really hard to follow again, though. Perhaps we should consider splitting the shared/COW path out of it... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/2] btrfs: add ioctl for directly writing compressed data

2019-09-06 Thread Dave Chinner
On Fri, Sep 06, 2019 at 11:19:49AM -0700, Omar Sandoval wrote: > On Thu, Sep 05, 2019 at 12:10:12PM +1000, Dave Chinner wrote: > > On Wed, Sep 04, 2019 at 12:13:26PM -0700, Omar Sandoval wrote: > > > From: Omar Sandoval > > > > > > This adds an API for wri

Re: [PATCH 2/2] btrfs: add ioctl for directly writing compressed data

2019-09-06 Thread Dave Chinner
On Thu, Sep 05, 2019 at 02:16:37PM +0200, Johannes Thumshirn wrote: > On 05/09/2019 04:10, Dave Chinner wrote: > > On Wed, Sep 04, 2019 at 12:13:26PM -0700, Omar Sandoval wrote: > >> From: Omar Sandoval > >> > >> This adds an API for writing compressed data

Re: [PATCH 2/2] btrfs: add ioctl for directly writing compressed data

2019-09-04 Thread Dave Chinner
hat skips the compression/decompression code and sets a few extra flags in the iocb that is passed down to the direct IO code. We don't need a whole new IO path just to skip a data transformation step in the direct IO path Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 10/13] iomap: use a function pointer for dio submits

2019-08-08 Thread Dave Chinner
now if there's hardware encryption below or software encryption on top becomes problematic... So really, from a filesystem and iomap perspective, What Eric says is the right - it's the only order that makes sense... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 10/13] iomap: use a function pointer for dio submits

2019-08-05 Thread Dave Chinner
On Mon, Aug 05, 2019 at 04:08:43PM +, Goldwyn Rodrigues wrote: > On Mon, 2019-08-05 at 09:43 +1000, Dave Chinner wrote: > > On Fri, Aug 02, 2019 at 05:00:45PM -0500, Goldwyn Rodrigues wrote: > > > From: Goldwyn Rodrigues > > > > > > This helps filesyste

Re: [PATCH 05/13] btrfs: Add CoW in iomap based writes

2019-08-04 Thread Dave Chinner
iomap->type = IOMAP_DELALLOC; > + } > + > iomap->addr = IOMAP_NULL_ADDR; > iomap->type = IOMAP_DELALLOC; The iomap->type is overwritten here and so IOMAP_COW will never be seen by the iomap infrastructure... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 04/13] btrfs: Add a simple buffered iomap write

2019-08-04 Thread Dave Chinner
en; > + if (iocb->ki_pos > i_size_read(inode)) > + i_size_write(inode, iocb->ki_pos); > + return written; Looks like it fails to handle O_[D]SYNC writes. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 01/13] iomap: Use a IOMAP_COW/srcmap for a read-modify-write I/O

2019-08-04 Thread Dave Chinner
((name)[IOMAP_SOURCE_MAP]) And now we only have to pass a single iomap parameter to each function as "struct iomap **iomap". This makes the code somewhat simpler, and we only ever need to use IOMAP_S(iomap) when IOMAP_B(iomap)->type == IOMAP_COW. The other advantage of this is that if we even need new functionality that requires 2 (or more) iomaps, we don't have to change APIs again Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 02/13] iomap: Read page from srcmap for IOMAP_COW

2019-08-04 Thread Dave Chinner
Darrick on CONFIG_IOMAP_DEBUG here - we need to start locking down invalid behaviour and invalid combinations with asserts that tell developers they've broken something. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 10/13] iomap: use a function pointer for dio submits

2019-08-04 Thread Dave Chinner
louts is completely the wrong approach to be taking. We need to do these things in a generic manner so that all filesystems (and block devices!) that use the iomap infrastructure can take advantage of them, not just one of them. Quite frankly, I don't care if it takes more time and work up

Re: [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA

2019-06-02 Thread Dave Chinner
On Sat, Jun 01, 2019 at 11:01:42AM +0300, Amir Goldstein wrote: > On Sat, Jun 1, 2019 at 2:28 AM Dave Chinner wrote: > > > > On Sat, Jun 01, 2019 at 08:45:49AM +1000, Dave Chinner wrote: > > > Given that we can already use AIO to provide this sort of ordering, > > &

Re: [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA

2019-05-31 Thread Dave Chinner
On Sat, Jun 01, 2019 at 08:45:49AM +1000, Dave Chinner wrote: > Given that we can already use AIO to provide this sort of ordering, > and AIO is vastly faster than synchronous IO, I don't see any point > in adding complex barrier interfaces that can be /easily implemented > in

Re: [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA

2019-05-31 Thread Dave Chinner
ee any point in adding complex barrier interfaces that can be /easily implemented in userspace/ using existing AIO primitives. You should start thinking about expanding libaio with stuff like "link_after_fdatasync()" and suddenly the whole problem of filesystem data vs metadata ordering goes away because the application directly controls all ordering without blocking and doesn't need to care what the filesystem under it does Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes

2019-05-30 Thread Dave Chinner
On Thu, May 30, 2019 at 01:16:05PM +0200, Jan Kara wrote: > On Thu 30-05-19 08:14:45, Dave Chinner wrote: > > On Wed, May 29, 2019 at 03:46:29PM +0200, Jan Kara wrote: > > > On Wed 29-05-19 14:46:58, Dave Chinner wrote: > > > > iomap_apply() > >

Re: [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes

2019-05-29 Thread Dave Chinner
On Wed, May 29, 2019 at 03:46:29PM +0200, Jan Kara wrote: > On Wed 29-05-19 14:46:58, Dave Chinner wrote: > > iomap_apply() > > > > ->iomap_begin() > > map old data extent that we copy from > > > > alloca

Re: [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes

2019-05-28 Thread Dave Chinner
On Tue, May 28, 2019 at 09:07:19PM -0700, Darrick J. Wong wrote: > On Wed, May 29, 2019 at 12:02:40PM +0800, Shiyang Ruan wrote: > > On 5/29/19 10:47 AM, Dave Chinner wrote: > > > On Wed, May 29, 2019 at 10:01:58AM +0800, Shiyang Ruan wrote: > > > > On 5

Re: [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes

2019-05-28 Thread Dave Chinner
des the passed arguments but that does not > > seem to be your concern). So what do you exactly want to do? > > Hi Jan, > > Thanks for pointing out, and I'm sorry for my mistake. It's > ->dax_iomap_rw(), not ->dax_iomap_actor(). > > I want to call the callback function at the end of ->dax_iomap_rw(). > > Like this: > dax_iomap_rw(..., callback) { > > ... > while (...) { > iomap_apply(...); > } > > if (callback != null) { > callback(); > } > return ...; > } Why does this need to be in dax_iomap_rw()? We already do post-dax_iomap_rw() "io-end callbacks" directly in xfs_file_dax_write() to update the file size Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes

2019-05-22 Thread Dave Chinner
ad you quote was that using two iomaps looked to be the better, more general approach to solving the iomap read-modify-write issue at hand. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 1/7] fsstress: allow fsync on directories too

2019-04-03 Thread Dave Chinner
On Wed, Apr 03, 2019 at 05:35:20PM +, Filipe Manana wrote: > On Wed, Apr 3, 2019 at 3:18 AM Dave Chinner wrote: > > > > On Mon, Apr 01, 2019 at 01:50:18PM +0100, fdman...@kernel.org wrote: > > > From: Filipe Manana > > > > > > Currently the fs

Re: [PATCH 04/15] dax: Introduce IOMAP_F_COW for copy-on-write

2019-04-02 Thread Dave Chinner
On Tue, Apr 02, 2019 at 08:56:31PM -0500, Goldwyn Rodrigues wrote: > On 10:06 02/04, Dave Chinner wrote: > > On Mon, Apr 01, 2019 at 04:41:02PM -0500, Goldwyn Rodrigues wrote: > > > After Darrick's suggestion, we can even do away with cow_pos, so > > > only the rea

Re: [PATCH v2 1/7] fsstress: allow fsync on directories too

2019-04-02 Thread Dave Chinner
ame(&f); > return; > } > - fd = open_path(&f, O_WRONLY); > + fd = open_file_or_dir(&f, O_WRONLY, &dir); > e = fd < 0 ? errno : 0; > check_cwd(); > if (fd < 0) { This whole hunk - from init_pathname to check_cwd - was what I was suggesting you factor out, not just the open_path() code. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 04/15] dax: Introduce IOMAP_F_COW for copy-on-write

2019-04-01 Thread Dave Chinner
On Mon, Apr 01, 2019 at 04:41:02PM -0500, Goldwyn Rodrigues wrote: > On 15:38 01/04, Dave Chinner wrote: > > On Tue, Mar 26, 2019 at 02:02:50PM -0500, Goldwyn Rodrigues wrote: > > > From: Goldwyn Rodrigues > > > > > > The IOMAP_F_COW is a flag to notify dax

Re: [PATCH 04/15] dax: Introduce IOMAP_F_COW for copy-on-write

2019-03-31 Thread Dave Chinner
#define IOMAP_F_DIRTY0x02/* uncommitted metadata */ > #define IOMAP_F_BUFFER_HEAD 0x04/* file system requires buffer heads */ > +#define IOMAP_F_COW 0x08/* cow before write */ "Copy on write before write"? :) Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 5/7] fsstress: allow afsync on directories too

2019-03-28 Thread Dave Chinner
procid, opno, e); > - free_pathname(&f); > - close(fd); > - return; > + goto out; > } > > e = event.res2; > if (v) > printf("%d/%d: afsync %s %d\n", procid, opno, f.path, e); > +out: > free_pathname(&f); > - close(fd); > + if (dir) > + closedir(dir); > + else > + close(fd); Same here for close. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/7] fsstress: add operation for setting xattrs on files and directories

2019-03-28 Thread Dave Chinner
ded attribute frequency. There are some tests that actually use "-f setxattr=n" (and who knows how many custom test scripts using fsstress built from fstests), so I don't think we should be renaming existing operations to something else and then reusing the name for a new type of operation like this I certainly agree with the idea of adding extended attributes to fsstress, just not this way... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/7] fsstress: rename setxattr operation to chproj

2019-03-28 Thread Dave Chinner
d it to testing other bits of the FS_IOC_FS[GS]ETXATTR interface, so it's appropriately named. if youare going to change it, then "fssetxattr" is probably the right thing to change it to... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 3/3] xfs: don't allow most setxattr to immutable files

2019-03-28 Thread Dave Chinner
_IMMUTABLE)) > + return -EPERM; > + > /* diflags2 only valid for v3 inodes. */ > di_flags2 = xfs_flags2diflags2(ip, fa->fsx_xflags); > if (di_flags2 && ip->i_d.di_version < 3) Looks fine - catches both FS_IOC_SETFLAGS and FS_IOC_FSSETXATTR for XFS. Do the other filesystems that implement FS_IOC_FSSETXATTR have the same bug? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/3] xfs: reset page mappings after setting immutable

2019-03-28 Thread Dave Chinner
+ goto out_unlock; > + > + *join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL; > + return 0; > + > +out_unlock: > + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL); > + return error; > + > +} Doesn't wait for direct IO to drain. Wouldn't it be better to do this? lock() xfs_flush_unmap_range(ip, 0, XFS_SIZE(ip)); unlock() Otherwise looks ok. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] generic: add test for fsync after shrinking truncate and rename

2019-03-18 Thread Dave Chinner
(Sorry, missed this email and only just noticed it...) On Fri, Mar 08, 2019 at 09:11:19AM -0600, Vijay Chidambaram wrote: > On Thu, Mar 7, 2019 at 10:35 PM Dave Chinner wrote: > > > > On Thu, Mar 07, 2019 at 05:19:51PM -0600, Jayashree Mohan wrote: > > > Hi Amir, >

Re: [PATCH] generic: add test for fsync after shrinking truncate and rename

2019-03-07 Thread Dave Chinner
ps://patchwork.kernel.org/patch/8293181/). I really wouldn't try to infer anything from the bugs in btrfs fsync behaviour or the test cases that expose them. 'Behave like other filesystems" is not a substitute for having solid fundamental algorithms... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] generic: add test for fsync after shrinking truncate and rename

2019-03-07 Thread Dave Chinner
On Thu, Mar 07, 2019 at 09:52:03AM +0200, Amir Goldstein wrote: > On Wed, Mar 6, 2019 at 11:48 PM Dave Chinner wrote: > > > > On Wed, Mar 06, 2019 at 09:51:23AM +0200, Amir Goldstein wrote: > > > On Wed, Mar 6, 2019 at 12:33 AM Dave Chinner wrote: > > > >

Re: [PATCH] generic: add test for fsync after shrinking truncate and rename

2019-03-06 Thread Dave Chinner
On Wed, Mar 06, 2019 at 09:51:23AM +0200, Amir Goldstein wrote: > On Wed, Mar 6, 2019 at 12:33 AM Dave Chinner wrote: > > > So the reason this is working is because 2nd fsync needs to > > > persist ctime of B and not because it needs to persist the > > > truncat

Re: [PATCH] generic: add test for fsync after shrinking truncate and rename

2019-03-05 Thread Dave Chinner
On Tue, Mar 05, 2019 at 07:39:28AM +0200, Amir Goldstein wrote: > On Tue, Mar 5, 2019 at 2:50 AM Dave Chinner wrote: > > > > On Mon, Mar 04, 2019 at 05:04:23PM +0200, Amir Goldstein wrote: > > > On Mon, Mar 4, 2019 at 4:44 PM wrote: > > > > > > > >

Re: [PATCH] generic: add test for fsync after shrinking truncate and rename

2019-03-04 Thread Dave Chinner
On Tue, Mar 05, 2019 at 11:50:20AM +1100, Dave Chinner wrote: > On Mon, Mar 04, 2019 at 05:04:23PM +0200, Amir Goldstein wrote: > > On Mon, Mar 4, 2019 at 4:44 PM wrote: > > > > > > From: Filipe Manana > > > > > > Test that if we truncate a file to re

Re: [PATCH] generic: add test for fsync after shrinking truncate and rename

2019-03-04 Thread Dave Chinner
llows the behaviour to be implementation specific. In this case, file systems with strictly ordered metadata will end up making the rename visible because the rename occurred before the truncate that the fsync() is persisting... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-21 Thread Dave Chinner
ent if you write > or discard at the smaller granularity. Filesystems discard extents these days, not individual blocks. If you free a 1MB file, they you are likely to get a 1MB discard. Or if you use fstrim, then it's free space extent sizes (on XFS can be hundred of GBs) and small fre

Re: [RFC PATCH 0/6] Allow setting file birth time with utimensat()

2019-02-18 Thread Dave Chinner
t in VFS defined system xattrs so that it is common across all filesystems. It also means that backup applications can preserve them during file copies without really even being aware of their meaning, simply by copying all the xattrs on the file... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LSF/MM TOPIC] Software RAID Support for NV-DIMM

2019-02-18 Thread Dave Chinner
On Mon, Feb 18, 2019 at 06:15:34PM -0800, Jane Chu wrote: > On 2/15/2019 9:39 PM, Dave Chinner wrote: > > >On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote: > >>On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote: > >>>(This is a jo

Re: [RFC PATCH 0/6] Allow setting file birth time with utimensat()

2019-02-18 Thread Dave Chinner
nd expose it through statx() (as authored time, not birth time), but store it a system xattr rather than an internal filesystem metadata field that requires was never intended to be user modifiable. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-17 Thread Dave Chinner
On Sun, Feb 17, 2019 at 06:42:59PM -0500, Ric Wheeler wrote: > On 2/17/19 4:09 PM, Dave Chinner wrote: > >On Sun, Feb 17, 2019 at 03:36:10PM -0500, Ric Wheeler wrote: > >>One proposal for btrfs was that we should look at getting discard > >>out of the synchronous pa

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-17 Thread Dave Chinner
st of the various discard > commands - how painful is it for modern SSD's? AIUI, it still depends on the SSD implementation, unfortunately. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LSF/MM TOPIC] Software RAID Support for NV-DIMM

2019-02-16 Thread Dave Chinner
On Sat, Feb 16, 2019 at 09:05:31AM -0800, Dan Williams wrote: > On Fri, Feb 15, 2019 at 9:40 PM Dave Chinner wrote: > > > > On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote: > > > On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote: > >

Re: [LSF/MM TOPIC] Software RAID Support for NV-DIMM

2019-02-15 Thread Dave Chinner
On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote: > On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote: > > (This is a joint proposal with Hannes Reinecke) > > > > Servers with NV-DIMM are slowly emerging in data centers but one key feature > &g

Re: [LSF/MM TOPIC] Software RAID Support for NV-DIMM

2019-02-15 Thread Dave Chinner
all the metadata goes to the software raided pmem block devices that aren't DAX capable. Problem already solved, yes? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH 0/6] Allow setting file birth time with utimensat()

2019-02-14 Thread Dave Chinner
On Thu, Feb 14, 2019 at 03:14:29PM -0800, Omar Sandoval wrote: > On Fri, Feb 15, 2019 at 09:06:26AM +1100, Dave Chinner wrote: > > On Thu, Feb 14, 2019 at 02:00:07AM -0800, Omar Sandoval wrote: > > > From: Omar Sandoval > > > > > > Hi, > > > >

Re: [RFC PATCH 0/6] Allow setting file birth time with utimensat()

2019-02-14 Thread Dave Chinner
e create time doesn't really help, because once you've broken into a system, this makes it really easy to cover tracks (e.g. we can't find files that were created and unlinked during the break in window anymore) and lay false trails Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 7/7] btrfs: drop mmap_sem in mkwrite for btrfs

2018-10-22 Thread Dave Chinner
On Mon, Oct 22, 2018 at 01:56:54PM -0400, Josef Bacik wrote: > On Fri, Oct 19, 2018 at 02:48:47PM +1100, Dave Chinner wrote: > > On Thu, Oct 18, 2018 at 04:23:18PM -0400, Josef Bacik wrote: > > > ->page_mkwrite is extremely expensive in btrfs. We have to reserve > >

Re: [PATCH v6 00/28] fs: fixes for serious clone/dedupe problems

2018-10-21 Thread Dave Chinner
On Mon, Oct 22, 2018 at 08:42:29AM +0300, Amir Goldstein wrote: > On Mon, Oct 22, 2018 at 8:09 AM Dave Chinner wrote: > > > > On Mon, Oct 22, 2018 at 05:52:49AM +0100, Al Viro wrote: > > > On Mon, Oct 22, 2018 at 03:37:41PM +1100, Dave Chinner wrote: > > > > &g

Re: [PATCH v6 00/28] fs: fixes for serious clone/dedupe problems

2018-10-21 Thread Dave Chinner
On Mon, Oct 22, 2018 at 05:52:49AM +0100, Al Viro wrote: > On Mon, Oct 22, 2018 at 03:37:41PM +1100, Dave Chinner wrote: > > > Ok, this is a bit of a mess. the patches do not merge cleanly to a > > 4.19-rc1 base kernel because of all the changes to > > include/linux/fs.

Re: [PATCH v6 00/28] fs: fixes for serious clone/dedupe problems

2018-10-21 Thread Dave Chinner
On Mon, Oct 22, 2018 at 01:21:12PM +1100, Dave Chinner wrote: > On Sun, Oct 21, 2018 at 09:15:03AM -0700, Darrick J. Wong wrote: > > Hi all, > > > > Dave, Eric, and I have been chasing a stale data exposure bug in the XFS > > reflink implementation, and tracked it down

Re: [PATCH v6 00/28] fs: fixes for serious clone/dedupe problems

2018-10-21 Thread Dave Chinner
we want to merge this? I can take it through the XFS tree given that there is a bit of XFS changes that needs to be co-ordinated with it, or should it go through some other tree? The other question I have is who reviews ocfs2 changes these days? Do they get reviewed, or just shepherded in via akpm's tree? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 28/28] xfs: remove [cm]time update from reflink calls

2018-10-21 Thread Dave Chinner
On Sun, Oct 21, 2018 at 09:18:17AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong > > Now that the vfs remap helper dirties the inode [cm]time for us, xfs no > longer needs to do that on its own. > > Signed-off-by: Darrick J. Wong looks good. Reviewed-by: Dave

Re: [PATCH 27/28] xfs: remove xfs_reflink_remap_range

2018-10-21 Thread Dave Chinner
; > Signed-off-by: Darrick J. Wong Sensible enough. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [PATCH 25/28] xfs: support returning partial reflink results

2018-10-21 Thread Dave Chinner
egular write. > > Signed-off-by: Darrick J. Wong Looks ok to me. remap_file_range() still returns the full length, so there's no change of behaviour there. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [PATCH 24/28] xfs: clean up xfs_reflink_remap_blocks call site

2018-10-21 Thread Dave Chinner
On Sun, Oct 21, 2018 at 09:17:50AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong > > Move the offset <-> blocks unit conversions into > xfs_reflink_remap_blocks to make the call site less ugly. > > Signed-off-by: Darrick J. Wong Looks fine. Reviewed-by: Dave C

Re: [PATCH 7/7] btrfs: drop mmap_sem in mkwrite for btrfs

2018-10-18 Thread Dave Chinner
eneric_file_read_iter(struct kiocb *iocb, struct > iov_iter *iter) > EXPORT_SYMBOL(generic_file_read_iter); > > #ifdef CONFIG_MMU > -static struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, int > flags) > +struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, int flags) > { > if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == > FAULT_FLAG_ALLOW_RETRY) { > struct file *file; > @@ -2377,6 +2377,7 @@ static struct file *maybe_unlock_mmap_for_io(struct > vm_area_struct *vma, int fla > } > return NULL; > } > +EXPORT_SYMBOL_GPL(maybe_unlock_mmap_for_io); These API mods (if this functionality remains in the filesystem code) belong in whatever patch introduced this function. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 4/7] mm: use the cached page for filemap_fault

2018-10-18 Thread Dave Chinner
} > + unlock_page(cached_page); > + put_page(cached_page); > + } > + Can you factor this out so the main code path doesn't get any more complex than it already is? i.e. something like: error = vmf_has_cached_page(vmf, &page); if (error) goto out_retry; if (page) goto have_cached_page; -dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 3/7] mm: drop the mmap_sem in all read fault cases

2018-10-18 Thread Dave Chinner
unlock_mmap_for_io(vmf->vma, vmf->flags); > + > /* >* Umm, take care of errors if the page isn't up-to-date. >* Try to re-read it _once_. We do this synchronously, Same here. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 6/7] mm: allow ->page_mkwrite to do retries

2018-10-18 Thread Dave Chinner
) Mess. #define __FAIL_FLAGS(VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY) if (ret & __FAIL_FLAGS) Should kill the unlikely() at the same time. -Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 5/7] mm: add a flag to indicate we used a cached page

2018-10-18 Thread Dave Chinner
alised by a prior fault attempt, not that "we used a cached page". "cached page" is a horribly overloaded term - I suspect we should not overload it more, especially as the flag get cleared if the cached page is not up to date (i.e. the data on it hasn't been fully initialised). FAULT_FLAG_PAGE_INITIALISED? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/7] mm: drop mmap_sem for page cache read IO submission

2018-10-18 Thread Dave Chinner
retry; > + } > + } else > + __lock_page(page); > } > > /* Did it get truncated? */ > @@ -2607,6 +2655,19 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > /* Things didn't work out. Return zero to tell the mm layer so. */ > shrink_readahead_size_eio(file, ra); > return VM_FAULT_SIGBUS; > + > +out_retry_wait: > + if (page) { > + if (flags & FAULT_FLAG_KILLABLE) and here. Any reason for this discrepancy? -Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 05/25] vfs: avoid problematic remapping requests into partial EOF block

2018-10-14 Thread Dave Chinner
committed, are you going to update it and repost as it clearly had value Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 24/25] xfs: support returning partial reflink results

2018-10-14 Thread Dave Chinner
flink_remap_range at this point? Yeah, that seems like a good idea to me - pulling all the vfs/generic code interactions back up into xfs_file.c would match how the rest of the file operations are layered w.r.t. external and internal XFS code... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 25/25] xfs: remove redundant remap partial EOF block checks

2018-10-11 Thread Dave Chinner
On Wed, Oct 10, 2018 at 09:15:26PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong > > Now that we've moved the partial EOF block checks to the VFS helpers, we > can remove the redundantn functionality from XFS. > > Signed-off-by: Darrick J. Wong looks fine. Re

Re: [PATCH 24/25] xfs: support returning partial reflink results

2018-10-11 Thread Dave Chinner
ped, pos_out + len); > + remapped = min_t(int64_t, len, XFS_FSB_TO_B(mp, remapped)); So remapped is returned as a block count, then immediately converted to a byte count? Can we return it as byte count so that we don't have this weird unit conversion? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 23/25] xfs: fix pagecache truncation prior to reflink

2018-10-11 Thread Dave Chinner
; + truncate_inode_pages_range(&inode_out->i_data, > + round_down(pos_out, PAGE_SIZE), > + round_up(pos_out + *len, PAGE_SIZE) - 1); Looks good. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [PATCH 05/25] vfs: avoid problematic remapping requests into partial EOF block

2018-10-11 Thread Dave Chinner
e entire request. A subsequent > patch will enable us to shorten dedupe requests correctly. Ok, so this patch rejects whole file dedupe requests, and then a later patch adds support back in for it? Doesn't that leave a bisect landmine behind? Why separate the functionality like this? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 05/25] vfs: check file ranges before cloning files

2018-10-10 Thread Dave Chinner
ing all the way to the end? */ > isize = i_size_read(inode_in); > - if (isize == 0) > - return 0; This looks like a change of behaviour. Instead of skipping zero legnth source files and returning success, this will now return -EINVAL as other checks fail? That needs to be documented in the commit message if it's intentional and a valid change to make... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 00/25] fs: fixes for serious clone/dedupe problems

2018-10-09 Thread Dave Chinner
cently sent to fstests > exercises the fixes in this series. Tests are in [2]. Can you rebase this on the for-next branch on the xfs tree which already contains some of the initial fixes in the series and a couple of other reflink/dedupe data corruption fixes? I'm planning on pushi

Re: [PATCH 01/25] xfs: add a per-xfs trace_printk macro

2018-10-09 Thread Dave Chinner
calling this trace point is not committed? If we decide to add this, it needs to be a CONFIG_XFS_DEBUG=y only definition because trace_printk() is only for temporary debugging code and has substantial performance overheads even when these trace points are not being traced. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 02/15] xfs: refactor clonerange preparation into a separate helper

2018-10-05 Thread Dave Chinner
On Fri, Oct 05, 2018 at 10:21:59AM -0700, Darrick J. Wong wrote: > On Fri, Oct 05, 2018 at 07:02:42PM +1000, Dave Chinner wrote: > > On Fri, Oct 05, 2018 at 05:02:28PM +1000, Dave Chinner wrote: > > > On Thu, Oct 04, 2018 at 05:44:47PM -0700, Darrick J. Wong wrote: > > &

Re: [PATCH 02/15] xfs: refactor clonerange preparation into a separate helper

2018-10-05 Thread Dave Chinner
On Fri, Oct 05, 2018 at 05:02:28PM +1000, Dave Chinner wrote: > On Thu, Oct 04, 2018 at 05:44:47PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong > > > > Refactor all the reflink preparation steps into a separate helper that > > we'll use to land all the

Re: [PATCH 02/15] xfs: refactor clonerange preparation into a separate helper

2018-10-05 Thread Dave Chinner
g to do and < 0 for an error and catch it in this code. I note that later patches in the series change the vfs_clone_file_prep_inodes() behaviour so this behaviour is probably masked by those later changes. It's still a nasty bisect landmine, though, so I'll fix it here. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 04/15] xfs: update ctime and remove suid before cloning files

2018-10-04 Thread Dave Chinner
-- > fs/xfs/xfs_reflink.c | 25 + > 1 file changed, 25 insertions(+) Looks good. Reviewed-by: Dave Chinner Because this fixes a security related problem, I'm going to push this with the data corruption fixes. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 03/15] xfs: zero posteof blocks when cloning above eof

2018-10-04 Thread Dave Chinner
id: https://bugzilla.kernel.org/show_bug.cgi?id=201259 > Signed-off-by: Darrick J. Wong > --- > fs/xfs/xfs_reflink.c | 33 + > 1 file changed, 25 insertions(+), 8 deletions(-) Looks good. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [PATCH 02/15] xfs: refactor clonerange preparation into a separate helper

2018-10-04 Thread Dave Chinner
so moves the invalidation of the destination range to the prep function so that it is done before the range is remapped. This ensures that nobody can access the data in range being remapped until the remap is complete. -- Sound OK? Otherwise this looks fine. Reviewed-by: Dave Chinner -Dave. >

Re: [PATCH 00/15] fs: fixes for serious clone/dedupe problems

2018-10-04 Thread Dave Chinner
if (pos_out + len < i_size_read(inode_out)) { + ret = -EINVAL; + goto out_unlock; + } + } It might be better to put these in with the eof-zeroing patch then add all the other changes on top? Let me post them separately, as they may be candidates for 4.19-rc7 along with the eof zeroing. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 8/8] btrfs: drop mmap_sem in mkwrite for btrfs

2018-09-25 Thread Dave Chinner
rea_struct > *vma, > + int flags) > +{ > + return NULL; > +} This doesn't compile either. -Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/8] mm: push vm_fault into the page fault handlers

2018-09-25 Thread Dave Chinner
struct siginfo si; > @@ -493,7 +494,8 @@ static int __kprobes do_page_fault(unsigned long addr, > unsigned int esr, > #endif > } > > - fault = __do_page_fault(mm, addr, mm_flags, vm_flags, tsk); > + vm_fault_init(&vmf, NULL, addr, mm_flags); > + fault = __do_page_fault(mm, vmf, vm_flags, tsk); I'm betting this doesn't compile, either. /me stops looking. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [patch] file dedupe (and maybe clone) data corruption (was Re: [PATCH] generic: test for deduplication between different files)

2018-09-20 Thread Dave Chinner
On Wed, Sep 19, 2018 at 12:12:03AM -0400, Zygo Blaxell wrote: > On Mon, Sep 10, 2018 at 07:06:46PM +1000, Dave Chinner wrote: > > On Thu, Sep 06, 2018 at 11:53:06PM -0400, Zygo Blaxell wrote: > > > On Thu, Sep 06, 2018 at 06:38:09PM +1000, Dave Chinner wrote: > > > >

  1   2   3   4   5   6   >