ough the layers, and device disappearance may in fact manifest to
the user as data corruption rather than causing data to be
inaccessible.
Hence "remove" notifications just don't work in the storage stack.
They need to be translated to block ranges going bad (i.e. media
errors), a
On Mon, Mar 01, 2021 at 07:33:28PM -0800, Dan Williams wrote:
> On Mon, Mar 1, 2021 at 6:42 PM Dave Chinner wrote:
> [..]
> > We do not need a DAX specific mechanism to tell us "DAX device
> > gone", we need a generic block device interface that tells us "
On Mon, Mar 01, 2021 at 04:32:36PM -0800, Dan Williams wrote:
> On Mon, Mar 1, 2021 at 2:47 PM Dave Chinner wrote:
> > Now we have the filesytem people providing a mechanism for the pmem
> > devices to tell the filesystems about physical device failures so
> > they can
On Mon, Mar 01, 2021 at 12:55:53PM -0800, Dan Williams wrote:
> On Sun, Feb 28, 2021 at 2:39 PM Dave Chinner wrote:
> >
> > On Sat, Feb 27, 2021 at 03:40:24PM -0800, Dan Williams wrote:
> > > On Sat, Feb 27, 2021 at 2:36 PM Dave Chinner wrote:
> > > > On F
On Sat, Feb 27, 2021 at 03:40:24PM -0800, Dan Williams wrote:
> On Sat, Feb 27, 2021 at 2:36 PM Dave Chinner wrote:
> > On Fri, Feb 26, 2021 at 02:41:34PM -0800, Dan Williams wrote:
> > > On Fri, Feb 26, 2021 at 1:28 PM Dave Chinner wrote:
> > > > On Fri, Feb 26,
On Fri, Feb 26, 2021 at 02:41:34PM -0800, Dan Williams wrote:
> On Fri, Feb 26, 2021 at 1:28 PM Dave Chinner wrote:
> > On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote:
> > > On Fri, Feb 26, 2021 at 12:51 PM Dave Chinner wrote:
> > > > > My imm
On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote:
> On Fri, Feb 26, 2021 at 12:51 PM Dave Chinner wrote:
> >
> > On Fri, Feb 26, 2021 at 11:24:53AM -0800, Dan Williams wrote:
> > > On Fri, Feb 26, 2021 at 11:05 AM Darrick J. Wong
> > > wrote:
> &
hen when userspace tries to access the
mapped DAX pages we get a new page fault. In processing the fault, the
filesystem will try to get direct access to the pmem from the block
device. This will get an ENODEV error from the block device because
because the backing store (pmem) has been unplugged and is no longer
there...
AFAICT, as long as pmem removal invalidates all the active ptes that
point at the pmem being removed, the filesystem doesn't need to
care about device removal at all, DAX or no DAX...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
fix it in mainline that I know of.
> As I said, some vendors have tried to fix it in their NAS products,
> but I don't know where to find that patch any more.
It's not suportable from a disaster recovery perspective. I recently
saw a 14TB filesystem with billions of hardlinks in it require 240GB
of RAM to run xfs_repair. We just can't support large filesystems
on 32 bit systems, and it has nothing to do with simple stuff like
page cache index sizes...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
fset for such systems to 16TB so sparse files can't be larger
than what the kernel supports. See xfs_sb_validate_fsb_count() call
and the file offset checks against MAX_LFS_FILESIZE in
xfs_fs_fill_super()...
FWIW, XFS has been doing this for roughly 20 years now - >16TB on 32
bit machines w
On Mon, Feb 01, 2021 at 06:14:21PM -0800, Darrick J. Wong wrote:
> On Fri, Jan 22, 2021 at 09:20:51AM +1100, Dave Chinner wrote:
> > Hi btrfs-gurus,
> >
> > I'm running a simple reflink/snapshot/COW scalability test at the
> > moment. It is just a loop that d
On Fri, Jan 29, 2021 at 06:25:50PM -0500, Zygo Blaxell wrote:
> On Mon, Jan 25, 2021 at 09:36:55AM +1100, Dave Chinner wrote:
> > On Sat, Jan 23, 2021 at 04:42:33PM +0800, Qu Wenruo wrote:
> > >
> > >
> > > On 2021/1/22 上午6:20, Dave Chinner wrote:
> >
On Sat, Jan 23, 2021 at 04:42:33PM +0800, Qu Wenruo wrote:
>
>
> On 2021/1/22 上午6:20, Dave Chinner wrote:
> > Hi btrfs-gurus,
> >
> > I'm running a simple reflink/snapshot/COW scalability test at the
> > moment. It is just a loop that does "fio overwri
On Sat, Jan 23, 2021 at 07:19:03PM -0500, Zygo Blaxell wrote:
> On Fri, Jan 22, 2021 at 09:20:51AM +1100, Dave Chinner wrote:
> > Hi btrfs-gurus,
> >
> > I'm running a simple reflink/snapshot/COW scalability test at the
> > moment. It is just a loop that d
workload, I suspect the issues I note above are
btrfs issues, not expected behaviour. I'm not sure what the
expected scalability of btrfs file clones and snapshots are though,
so I'm interested to hear if these results are expected or not.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
JOBS=4
IODEPTH=4
IOCOUNT=$((1 / $JOBS))
FILESIZE=4g
cat >$fio_config <
AP_DIO_NEED_SYNC))
> - ret = generic_write_sync(iocb, ret);
> + ret = generic_write_sync(dio->iocb, ret);
>
> kfree(dio);
>
> return ret;
> }
> -EXPORT_SYMBOL_GPL(iomap_dio_complete);
> +
NACK.
If you don't want iomap_dio_comple
em. It is based on my previous series which
> added a Btrfs-specific ioctl [1], but it is now an extension to
> preadv2()/pwritev2() as suggested by Dave Chinner [2]. I've included a
> man page patch describing the API in detail. Test cases and examples
> programs are available [3].
&g
On Wed, Sep 25, 2019 at 08:07:12AM -0400, Colin Walters wrote:
>
>
> On Wed, Sep 25, 2019, at 3:11 AM, Dave Chinner wrote:
> >
> > We're talking about user data read/write access here, not some
> > special security capability. Access to the data has already bee
checked, so why should the format that the data is
supplied to the kernel in suddenly require new privilege checks?
i.e. writing encoded data to a file requires exactly the same
access permissions as writing cleartext data to the file. The only
extra information here is whether the _filesystem_ supports encoded
data, and that doesn't change regardless of what the open file gets
passed to. Hence the capability is either there or it isn't, it
doesn't transform not matter what privilege boundary the file is
passed across. Similarly, we have permission to access the data
or we don't through the struct file, it doesn't transform either.
Hence I don't see why CAP_SYS_ADMIN or any special permissions are
needed for an application with access permissions to file data to
use these RWF_ENCODED IO interfaces. I am inclined to think the
permission check here is wrong and should be dropped, and then all
these issues go away.
Yes, the app that is going to use this needs root perms because it
accesses all data in the fs (it's a backup app!), but that doesn't
mean you can only use RWF_ENCODED if you have root perms.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
;
> This now at least survives xfstests -g quick on a 4k xfs file system
> for. Here is my current tree:
>
> http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/xfs-cow-iomap
That looks somewhat reasonable. The XFS mapping function is turning
into spagetti and getting really hard to follow again, though.
Perhaps we should consider splitting the shared/COW path out of
it...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Sep 06, 2019 at 11:19:49AM -0700, Omar Sandoval wrote:
> On Thu, Sep 05, 2019 at 12:10:12PM +1000, Dave Chinner wrote:
> > On Wed, Sep 04, 2019 at 12:13:26PM -0700, Omar Sandoval wrote:
> > > From: Omar Sandoval
> > >
> > > This adds an API for wri
On Thu, Sep 05, 2019 at 02:16:37PM +0200, Johannes Thumshirn wrote:
> On 05/09/2019 04:10, Dave Chinner wrote:
> > On Wed, Sep 04, 2019 at 12:13:26PM -0700, Omar Sandoval wrote:
> >> From: Omar Sandoval
> >>
> >> This adds an API for writing compressed data
hat skips the compression/decompression code
and sets a few extra flags in the iocb that is passed down to the
direct IO code.
We don't need a whole new IO path just to skip a data transformation
step in the direct IO path
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
now if there's hardware encryption
below or software encryption on top becomes problematic...
So really, from a filesystem and iomap perspective, What Eric says
is the right - it's the only order that makes sense...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Aug 05, 2019 at 04:08:43PM +, Goldwyn Rodrigues wrote:
> On Mon, 2019-08-05 at 09:43 +1000, Dave Chinner wrote:
> > On Fri, Aug 02, 2019 at 05:00:45PM -0500, Goldwyn Rodrigues wrote:
> > > From: Goldwyn Rodrigues
> > >
> > > This helps filesyste
iomap->type = IOMAP_DELALLOC;
> + }
> +
> iomap->addr = IOMAP_NULL_ADDR;
> iomap->type = IOMAP_DELALLOC;
The iomap->type is overwritten here and so IOMAP_COW will never be
seen by the iomap infrastructure...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
en;
> + if (iocb->ki_pos > i_size_read(inode))
> + i_size_write(inode, iocb->ki_pos);
> + return written;
Looks like it fails to handle O_[D]SYNC writes.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
((name)[IOMAP_SOURCE_MAP])
And now we only have to pass a single iomap parameter to each
function as "struct iomap **iomap". This makes the code somewhat
simpler, and we only ever need to use IOMAP_S(iomap) when
IOMAP_B(iomap)->type == IOMAP_COW.
The other advantage of this is that if we even need new
functionality that requires 2 (or more) iomaps, we don't have to
change APIs again
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
Darrick on CONFIG_IOMAP_DEBUG here - we need to start
locking down invalid behaviour and invalid combinations with asserts
that tell developers they've broken something.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
louts is
completely the wrong approach to be taking. We need to do these
things in a generic manner so that all filesystems (and block
devices!) that use the iomap infrastructure can take advantage of
them, not just one of them.
Quite frankly, I don't care if it takes more time and work up
On Sat, Jun 01, 2019 at 11:01:42AM +0300, Amir Goldstein wrote:
> On Sat, Jun 1, 2019 at 2:28 AM Dave Chinner wrote:
> >
> > On Sat, Jun 01, 2019 at 08:45:49AM +1000, Dave Chinner wrote:
> > > Given that we can already use AIO to provide this sort of ordering,
> > &
On Sat, Jun 01, 2019 at 08:45:49AM +1000, Dave Chinner wrote:
> Given that we can already use AIO to provide this sort of ordering,
> and AIO is vastly faster than synchronous IO, I don't see any point
> in adding complex barrier interfaces that can be /easily implemented
> in
ee any point
in adding complex barrier interfaces that can be /easily implemented
in userspace/ using existing AIO primitives. You should start
thinking about expanding libaio with stuff like
"link_after_fdatasync()" and suddenly the whole problem of
filesystem data vs metadata ordering goes away because the
application directly controls all ordering without blocking and
doesn't need to care what the filesystem under it does
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, May 30, 2019 at 01:16:05PM +0200, Jan Kara wrote:
> On Thu 30-05-19 08:14:45, Dave Chinner wrote:
> > On Wed, May 29, 2019 at 03:46:29PM +0200, Jan Kara wrote:
> > > On Wed 29-05-19 14:46:58, Dave Chinner wrote:
> > > > iomap_apply()
> >
On Wed, May 29, 2019 at 03:46:29PM +0200, Jan Kara wrote:
> On Wed 29-05-19 14:46:58, Dave Chinner wrote:
> > iomap_apply()
> >
> > ->iomap_begin()
> > map old data extent that we copy from
> >
> > alloca
On Tue, May 28, 2019 at 09:07:19PM -0700, Darrick J. Wong wrote:
> On Wed, May 29, 2019 at 12:02:40PM +0800, Shiyang Ruan wrote:
> > On 5/29/19 10:47 AM, Dave Chinner wrote:
> > > On Wed, May 29, 2019 at 10:01:58AM +0800, Shiyang Ruan wrote:
> > > > On 5
des the passed arguments but that does not
> > seem to be your concern). So what do you exactly want to do?
>
> Hi Jan,
>
> Thanks for pointing out, and I'm sorry for my mistake. It's
> ->dax_iomap_rw(), not ->dax_iomap_actor().
>
> I want to call the callback function at the end of ->dax_iomap_rw().
>
> Like this:
> dax_iomap_rw(..., callback) {
>
> ...
> while (...) {
> iomap_apply(...);
> }
>
> if (callback != null) {
> callback();
> }
> return ...;
> }
Why does this need to be in dax_iomap_rw()?
We already do post-dax_iomap_rw() "io-end callbacks" directly in
xfs_file_dax_write() to update the file size
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ad you quote was
that using two iomaps looked to be the better, more general approach
to solving the iomap read-modify-write issue at hand.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Apr 03, 2019 at 05:35:20PM +, Filipe Manana wrote:
> On Wed, Apr 3, 2019 at 3:18 AM Dave Chinner wrote:
> >
> > On Mon, Apr 01, 2019 at 01:50:18PM +0100, fdman...@kernel.org wrote:
> > > From: Filipe Manana
> > >
> > > Currently the fs
On Tue, Apr 02, 2019 at 08:56:31PM -0500, Goldwyn Rodrigues wrote:
> On 10:06 02/04, Dave Chinner wrote:
> > On Mon, Apr 01, 2019 at 04:41:02PM -0500, Goldwyn Rodrigues wrote:
> > > After Darrick's suggestion, we can even do away with cow_pos, so
> > > only the rea
ame(&f);
> return;
> }
> - fd = open_path(&f, O_WRONLY);
> + fd = open_file_or_dir(&f, O_WRONLY, &dir);
> e = fd < 0 ? errno : 0;
> check_cwd();
> if (fd < 0) {
This whole hunk - from init_pathname to check_cwd - was what I was
suggesting you factor out, not just the open_path() code.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Apr 01, 2019 at 04:41:02PM -0500, Goldwyn Rodrigues wrote:
> On 15:38 01/04, Dave Chinner wrote:
> > On Tue, Mar 26, 2019 at 02:02:50PM -0500, Goldwyn Rodrigues wrote:
> > > From: Goldwyn Rodrigues
> > >
> > > The IOMAP_F_COW is a flag to notify dax
#define IOMAP_F_DIRTY0x02/* uncommitted metadata */
> #define IOMAP_F_BUFFER_HEAD 0x04/* file system requires buffer heads */
> +#define IOMAP_F_COW 0x08/* cow before write */
"Copy on write before write"? :)
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
procid, opno, e);
> - free_pathname(&f);
> - close(fd);
> - return;
> + goto out;
> }
>
> e = event.res2;
> if (v)
> printf("%d/%d: afsync %s %d\n", procid, opno, f.path, e);
> +out:
> free_pathname(&f);
> - close(fd);
> + if (dir)
> + closedir(dir);
> + else
> + close(fd);
Same here for close.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ded attribute frequency. There are
some tests that actually use "-f setxattr=n" (and who knows how many
custom test scripts using fsstress built from fstests), so I don't
think we should be renaming existing operations to something else
and then reusing the name for a new type of operation like this
I certainly agree with the idea of adding extended attributes to
fsstress, just not this way...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
d it to testing other bits of
the FS_IOC_FS[GS]ETXATTR interface, so it's appropriately named. if
youare going to change it, then "fssetxattr" is probably the right
thing to change it to...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
_IMMUTABLE))
> + return -EPERM;
> +
> /* diflags2 only valid for v3 inodes. */
> di_flags2 = xfs_flags2diflags2(ip, fa->fsx_xflags);
> if (di_flags2 && ip->i_d.di_version < 3)
Looks fine - catches both FS_IOC_SETFLAGS and FS_IOC_FSSETXATTR
for XFS.
Do the other filesystems that implement FS_IOC_FSSETXATTR have
the same bug?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
+ goto out_unlock;
> +
> + *join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
> + return 0;
> +
> +out_unlock:
> + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> + return error;
> +
> +}
Doesn't wait for direct IO to drain. Wouldn't it be better to do
this?
lock()
xfs_flush_unmap_range(ip, 0, XFS_SIZE(ip));
unlock()
Otherwise looks ok.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
(Sorry, missed this email and only just noticed it...)
On Fri, Mar 08, 2019 at 09:11:19AM -0600, Vijay Chidambaram wrote:
> On Thu, Mar 7, 2019 at 10:35 PM Dave Chinner wrote:
> >
> > On Thu, Mar 07, 2019 at 05:19:51PM -0600, Jayashree Mohan wrote:
> > > Hi Amir,
>
ps://patchwork.kernel.org/patch/8293181/).
I really wouldn't try to infer anything from the bugs in btrfs fsync
behaviour or the test cases that expose them. 'Behave like other
filesystems" is not a substitute for having solid fundamental
algorithms...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Mar 07, 2019 at 09:52:03AM +0200, Amir Goldstein wrote:
> On Wed, Mar 6, 2019 at 11:48 PM Dave Chinner wrote:
> >
> > On Wed, Mar 06, 2019 at 09:51:23AM +0200, Amir Goldstein wrote:
> > > On Wed, Mar 6, 2019 at 12:33 AM Dave Chinner wrote:
> > > >
On Wed, Mar 06, 2019 at 09:51:23AM +0200, Amir Goldstein wrote:
> On Wed, Mar 6, 2019 at 12:33 AM Dave Chinner wrote:
> > > So the reason this is working is because 2nd fsync needs to
> > > persist ctime of B and not because it needs to persist the
> > > truncat
On Tue, Mar 05, 2019 at 07:39:28AM +0200, Amir Goldstein wrote:
> On Tue, Mar 5, 2019 at 2:50 AM Dave Chinner wrote:
> >
> > On Mon, Mar 04, 2019 at 05:04:23PM +0200, Amir Goldstein wrote:
> > > On Mon, Mar 4, 2019 at 4:44 PM wrote:
> > > >
> > > >
On Tue, Mar 05, 2019 at 11:50:20AM +1100, Dave Chinner wrote:
> On Mon, Mar 04, 2019 at 05:04:23PM +0200, Amir Goldstein wrote:
> > On Mon, Mar 4, 2019 at 4:44 PM wrote:
> > >
> > > From: Filipe Manana
> > >
> > > Test that if we truncate a file to re
llows the behaviour to
be implementation specific.
In this case, file systems with strictly ordered metadata will end
up making the rename visible because the rename occurred before the
truncate that the fsync() is persisting...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ent if you write
> or discard at the smaller granularity.
Filesystems discard extents these days, not individual blocks. If
you free a 1MB file, they you are likely to get a 1MB discard. Or if
you use fstrim, then it's free space extent sizes (on XFS can be
hundred of GBs) and small fre
t in VFS
defined system xattrs so that it is common across all filesystems.
It also means that backup applications can preserve them during file
copies without really even being aware of their meaning, simply by
copying all the xattrs on the file...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Feb 18, 2019 at 06:15:34PM -0800, Jane Chu wrote:
> On 2/15/2019 9:39 PM, Dave Chinner wrote:
>
> >On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote:
> >>On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote:
> >>>(This is a jo
nd expose it
through statx() (as authored time, not birth time), but store it a
system xattr rather than an internal filesystem metadata field that
requires was never intended to be user modifiable.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Sun, Feb 17, 2019 at 06:42:59PM -0500, Ric Wheeler wrote:
> On 2/17/19 4:09 PM, Dave Chinner wrote:
> >On Sun, Feb 17, 2019 at 03:36:10PM -0500, Ric Wheeler wrote:
> >>One proposal for btrfs was that we should look at getting discard
> >>out of the synchronous pa
st of the various discard
> commands - how painful is it for modern SSD's?
AIUI, it still depends on the SSD implementation, unfortunately.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Sat, Feb 16, 2019 at 09:05:31AM -0800, Dan Williams wrote:
> On Fri, Feb 15, 2019 at 9:40 PM Dave Chinner wrote:
> >
> > On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote:
> > > On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote:
> >
On Sat, Feb 16, 2019 at 04:31:33PM +1100, Dave Chinner wrote:
> On Fri, Feb 15, 2019 at 10:57:12AM +0100, Johannes Thumshirn wrote:
> > (This is a joint proposal with Hannes Reinecke)
> >
> > Servers with NV-DIMM are slowly emerging in data centers but one key feature
> &g
all the metadata goes to the software
raided pmem block devices that aren't DAX capable.
Problem already solved, yes?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Feb 14, 2019 at 03:14:29PM -0800, Omar Sandoval wrote:
> On Fri, Feb 15, 2019 at 09:06:26AM +1100, Dave Chinner wrote:
> > On Thu, Feb 14, 2019 at 02:00:07AM -0800, Omar Sandoval wrote:
> > > From: Omar Sandoval
> > >
> > > Hi,
> > >
>
e create time doesn't really help,
because once you've broken into a system, this makes it really easy
to cover tracks (e.g. we can't find files that were created and
unlinked during the break in window anymore) and lay false
trails
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Oct 22, 2018 at 01:56:54PM -0400, Josef Bacik wrote:
> On Fri, Oct 19, 2018 at 02:48:47PM +1100, Dave Chinner wrote:
> > On Thu, Oct 18, 2018 at 04:23:18PM -0400, Josef Bacik wrote:
> > > ->page_mkwrite is extremely expensive in btrfs. We have to reserve
> >
On Mon, Oct 22, 2018 at 08:42:29AM +0300, Amir Goldstein wrote:
> On Mon, Oct 22, 2018 at 8:09 AM Dave Chinner wrote:
> >
> > On Mon, Oct 22, 2018 at 05:52:49AM +0100, Al Viro wrote:
> > > On Mon, Oct 22, 2018 at 03:37:41PM +1100, Dave Chinner wrote:
> > >
> &g
On Mon, Oct 22, 2018 at 05:52:49AM +0100, Al Viro wrote:
> On Mon, Oct 22, 2018 at 03:37:41PM +1100, Dave Chinner wrote:
>
> > Ok, this is a bit of a mess. the patches do not merge cleanly to a
> > 4.19-rc1 base kernel because of all the changes to
> > include/linux/fs.
On Mon, Oct 22, 2018 at 01:21:12PM +1100, Dave Chinner wrote:
> On Sun, Oct 21, 2018 at 09:15:03AM -0700, Darrick J. Wong wrote:
> > Hi all,
> >
> > Dave, Eric, and I have been chasing a stale data exposure bug in the XFS
> > reflink implementation, and tracked it down
we want to merge this? I can take it through the
XFS tree given that there is a bit of XFS changes that needs to be
co-ordinated with it, or should it go through some other tree?
The other question I have is who reviews ocfs2 changes these days?
Do they get reviewed, or just shepherded in via akpm's tree?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Sun, Oct 21, 2018 at 09:18:17AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong
>
> Now that the vfs remap helper dirties the inode [cm]time for us, xfs no
> longer needs to do that on its own.
>
> Signed-off-by: Darrick J. Wong
looks good.
Reviewed-by: Dave
;
> Signed-off-by: Darrick J. Wong
Sensible enough.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
egular write.
>
> Signed-off-by: Darrick J. Wong
Looks ok to me. remap_file_range() still returns the full length,
so there's no change of behaviour there.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
On Sun, Oct 21, 2018 at 09:17:50AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong
>
> Move the offset <-> blocks unit conversions into
> xfs_reflink_remap_blocks to make the call site less ugly.
>
> Signed-off-by: Darrick J. Wong
Looks fine.
Reviewed-by: Dave C
eneric_file_read_iter(struct kiocb *iocb, struct
> iov_iter *iter)
> EXPORT_SYMBOL(generic_file_read_iter);
>
> #ifdef CONFIG_MMU
> -static struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, int
> flags)
> +struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, int flags)
> {
> if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) ==
> FAULT_FLAG_ALLOW_RETRY) {
> struct file *file;
> @@ -2377,6 +2377,7 @@ static struct file *maybe_unlock_mmap_for_io(struct
> vm_area_struct *vma, int fla
> }
> return NULL;
> }
> +EXPORT_SYMBOL_GPL(maybe_unlock_mmap_for_io);
These API mods (if this functionality remains in the filesystem
code) belong in whatever patch introduced this function.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
}
> + unlock_page(cached_page);
> + put_page(cached_page);
> + }
> +
Can you factor this out so the main code path doesn't get any more
complex than it already is? i.e. something like:
error = vmf_has_cached_page(vmf, &page);
if (error)
goto out_retry;
if (page)
goto have_cached_page;
-dave.
--
Dave Chinner
da...@fromorbit.com
unlock_mmap_for_io(vmf->vma, vmf->flags);
> +
> /*
>* Umm, take care of errors if the page isn't up-to-date.
>* Try to re-read it _once_. We do this synchronously,
Same here.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
)
Mess.
#define __FAIL_FLAGS(VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)
if (ret & __FAIL_FLAGS)
Should kill the unlikely() at the same time.
-Dave.
--
Dave Chinner
da...@fromorbit.com
alised by a prior
fault attempt, not that "we used a cached page". "cached page" is a
horribly overloaded term - I suspect we should not overload it more,
especially as the flag get cleared if the cached page is not up to
date (i.e. the data on it hasn't been fully initialised).
FAULT_FLAG_PAGE_INITIALISED?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
retry;
> + }
> + } else
> + __lock_page(page);
> }
>
> /* Did it get truncated? */
> @@ -2607,6 +2655,19 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
> /* Things didn't work out. Return zero to tell the mm layer so. */
> shrink_readahead_size_eio(file, ra);
> return VM_FAULT_SIGBUS;
> +
> +out_retry_wait:
> + if (page) {
> + if (flags & FAULT_FLAG_KILLABLE)
and here.
Any reason for this discrepancy?
-Dave.
--
Dave Chinner
da...@fromorbit.com
committed, are you going
to update it and repost as it clearly had value
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
flink_remap_range at this point?
Yeah, that seems like a good idea to me - pulling all the
vfs/generic code interactions back up into xfs_file.c would match
how the rest of the file operations are layered w.r.t. external and
internal XFS code...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Oct 10, 2018 at 09:15:26PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong
>
> Now that we've moved the partial EOF block checks to the VFS helpers, we
> can remove the redundantn functionality from XFS.
>
> Signed-off-by: Darrick J. Wong
looks fine.
Re
ped, pos_out + len);
> + remapped = min_t(int64_t, len, XFS_FSB_TO_B(mp, remapped));
So remapped is returned as a block count, then immediately converted
to a byte count? Can we return it as byte count so that we don't
have this weird unit conversion?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
; + truncate_inode_pages_range(&inode_out->i_data,
> + round_down(pos_out, PAGE_SIZE),
> + round_up(pos_out + *len, PAGE_SIZE) - 1);
Looks good.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
e entire request. A subsequent
> patch will enable us to shorten dedupe requests correctly.
Ok, so this patch rejects whole file dedupe requests, and then a
later patch adds support back in for it?
Doesn't that leave a bisect landmine behind? Why separate the
functionality like this?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ing all the way to the end? */
> isize = i_size_read(inode_in);
> - if (isize == 0)
> - return 0;
This looks like a change of behaviour. Instead of skipping zero
legnth source files and returning success, this will now return
-EINVAL as other checks fail? That needs to be documented in the
commit message if it's intentional and a valid change to make...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
cently sent to fstests
> exercises the fixes in this series. Tests are in [2].
Can you rebase this on the for-next branch on the xfs tree which
already contains some of the initial fixes in the series and a
couple of other reflink/dedupe data corruption fixes? I'm planning
on pushi
calling this trace point is not committed?
If we decide to add this, it needs to be a CONFIG_XFS_DEBUG=y only
definition because trace_printk() is only for temporary debugging
code and has substantial performance overheads even when these trace
points are not being traced.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Oct 05, 2018 at 10:21:59AM -0700, Darrick J. Wong wrote:
> On Fri, Oct 05, 2018 at 07:02:42PM +1000, Dave Chinner wrote:
> > On Fri, Oct 05, 2018 at 05:02:28PM +1000, Dave Chinner wrote:
> > > On Thu, Oct 04, 2018 at 05:44:47PM -0700, Darrick J. Wong wrote:
> > &
On Fri, Oct 05, 2018 at 05:02:28PM +1000, Dave Chinner wrote:
> On Thu, Oct 04, 2018 at 05:44:47PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong
> >
> > Refactor all the reflink preparation steps into a separate helper that
> > we'll use to land all the
g to do and < 0 for an error and catch it in this code.
I note that later patches in the series change the
vfs_clone_file_prep_inodes() behaviour so this behaviour is probably
masked by those later changes. It's still a nasty bisect landmine,
though, so I'll fix it here.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
> fs/xfs/xfs_reflink.c | 25 +
> 1 file changed, 25 insertions(+)
Looks good.
Reviewed-by: Dave Chinner
Because this fixes a security related problem, I'm going to push
this with the data corruption fixes.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
id: https://bugzilla.kernel.org/show_bug.cgi?id=201259
> Signed-off-by: Darrick J. Wong
> ---
> fs/xfs/xfs_reflink.c | 33 +
> 1 file changed, 25 insertions(+), 8 deletions(-)
Looks good.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
so moves the invalidation of the destination range to
the prep function so that it is done before the range is remapped.
This ensures that nobody can access the data in range being remapped
until the remap is complete.
--
Sound OK?
Otherwise this looks fine.
Reviewed-by: Dave Chinner
-Dave.
>
if (pos_out + len < i_size_read(inode_out)) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+ }
It might be better to put these in with the eof-zeroing patch then
add all the other changes on top? Let me post them separately,
as they may be candidates for 4.19-rc7 along with the eof zeroing.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
rea_struct
> *vma,
> + int flags)
> +{
> + return NULL;
> +}
This doesn't compile either.
-Dave.
--
Dave Chinner
da...@fromorbit.com
struct siginfo si;
> @@ -493,7 +494,8 @@ static int __kprobes do_page_fault(unsigned long addr,
> unsigned int esr,
> #endif
> }
>
> - fault = __do_page_fault(mm, addr, mm_flags, vm_flags, tsk);
> + vm_fault_init(&vmf, NULL, addr, mm_flags);
> + fault = __do_page_fault(mm, vmf, vm_flags, tsk);
I'm betting this doesn't compile, either.
/me stops looking.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Sep 19, 2018 at 12:12:03AM -0400, Zygo Blaxell wrote:
> On Mon, Sep 10, 2018 at 07:06:46PM +1000, Dave Chinner wrote:
> > On Thu, Sep 06, 2018 at 11:53:06PM -0400, Zygo Blaxell wrote:
> > > On Thu, Sep 06, 2018 at 06:38:09PM +1000, Dave Chinner wrote:
> > > >
1 - 100 of 563 matches
Mail list logo