Re: [PATCH v2] Btrfs: fix null pointer dereference on compressed write path error

2018-10-12 Thread Liu Bo
On Fri, Oct 12, 2018 at 4:38 PM wrote: > > From: Filipe Manana > > At inode.c:compress_file_range(), under the "free_pages_out" label, we can > end up dereferencing the "pages" pointer when it has a NULL value. This > case happens when "start" has a value of 0 and we fail to allocate memory > for

[PATCH 23/25] xfs: fix pagecache truncation prior to reflink

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Prior to remapping blocks, it is necessary to remove pages from the destination file's page cache. Unfortunately, the truncation is not aggressive enough -- if page size > block size, we'll end up zeroing subpage blocks instead of removing them. So, round the start offset

[PATCH 22/25] ocfs2: support partial clone range and dedupe range

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Change the ocfs2 remap code to allow for returning partial results. Signed-off-by: Darrick J. Wong --- fs/ocfs2/file.c |7 + fs/ocfs2/refcounttree.c | 73 ++- fs/ocfs2/refcounttree.h | 12 3 files ch

[PATCH 24/25] xfs: support returning partial reflink results

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Back when the XFS reflink code only supported clone_file_range, we were only able to return zero or negative error codes to userspace. However, now that copy_file_range (which returns bytes copied) can use XFS' clone_file_range, we have the opportunity to return partial res

[PATCH 25/25] xfs: remove redundant remap partial EOF block checks

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Now that we've moved the partial EOF block checks to the VFS helpers, we can remove the redundantn functionality from XFS. Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/xfs_reflink.c | 20 1 file changed, 20 deletions(-) dif

[PATCH 21/25] ocfs2: fix pagecache truncation prior to reflink

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Prior to remapping blocks, it is necessary to remove pages from the destination file's page cache. Unfortunately, the truncation is not aggressive enough -- if page size > block size, we'll end up zeroing subpage blocks instead of removing them. So, round the start offset

[PATCH 16/25] vfs: make remapping to source file eof more explicit

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Create a RFR_TO_SRC_EOF flag to explicitly declare that the caller wants the remap implementation to remap to the end of the source file, once the files are locked. Signed-off-by: Darrick J. Wong Reviewed-by: Amir Goldstein --- fs/ioctl.c |3 ++- fs/nfsd/vfs.

[PATCH 19/25] vfs: implement opportunistic short dedupe

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong For a given dedupe request, the bytes_deduped field in the control structure tells userspace if we managed to deduplicate some, but not all of, the requested regions starting from the file offsets supplied. However, due to sloppy coding, the current dedupe code returns FILE_

[PATCH 18/25] vfs: hide file range comparison function

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong There are no callers of vfs_dedupe_file_range_compare, so we might as well make it a static helper and remove the export. Signed-off-by: Darrick J. Wong Reviewed-by: Amir Goldstein --- fs/read_write.c| 191 ++-- includ

[PATCH 20/25] ocfs2: truncate page cache for clone destination file before remapping

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong When cloning blocks into another file, truncate the page cache before we start remapping blocks so that concurrent reads wait for us to finish. Signed-off-by: Darrick J. Wong --- fs/ocfs2/refcounttree.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-)

[PATCH 11/25] vfs: pass remap flags to generic_remap_file_range_prep

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Plumb the remap flags through the filesystem from the vfs function dispatcher all the way to the prep function to prepare for behavior changes in subsequent patches. Signed-off-by: Darrick J. Wong Reviewed-by: Amir Goldstein --- fs/ocfs2/file.c |2 +- fs/ocfs

[PATCH 17/25] vfs: enable remap callers that can handle short operations

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Plumb in a remap flag that enables the filesystem remap handler to shorten remapping requests for callers that can handle it. Now copy_file_range can report partial success (in case we run up against alignment problems, resource limits, etc.). We also enable CAN_SHORTEN fo

[PATCH 12/25] vfs: pass remap flags to generic_remap_checks

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Pass the same remap flags to generic_remap_checks for consistency. Signed-off-by: Darrick J. Wong Reviewed-by: Amir Goldstein --- fs/read_write.c|2 +- include/linux/fs.h |2 +- mm/filemap.c |4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-)

[PATCH 13/25] vfs: make remap_file_range functions take and return bytes completed

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Change the remap_file_range functions to take a number of bytes to operate upon and return the number of bytes they operated on. This is a requirement for allowing fs implementations to return short clone/dedupe results to the user, which will enable us to obey resource lim

[PATCH 15/25] vfs: plumb RFR_* remap flags through the vfs dedupe functions

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Plumb a remap_flags argument through the vfs_dedupe_file_range_one functions so that dedupe can take advantage of it. Signed-off-by: Darrick J. Wong Reviewed-by: Amir Goldstein --- fs/overlayfs/file.c |3 ++- fs/read_write.c |9 ++--- include/linux/fs.h

[PATCH 14/25] vfs: plumb RFR_* remap flags through the vfs clone functions

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Plumb a remap_flags argument through the {do,vfs}_clone_file_range functions so that clone can take advantage of it. Signed-off-by: Darrick J. Wong Reviewed-by: Amir Goldstein --- fs/ioctl.c |2 +- fs/nfsd/vfs.c |2 +- fs/overlayfs/copy_up.c

[PATCH 09/25] vfs: rename clone_verify_area to remap_verify_area

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Since we use clone_verify_area for both clone and dedupe range checks, rename the function to make it clear that it's for both. Signed-off-by: Darrick J. Wong Reviewed-by: Amir Goldstein --- fs/read_write.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(

[PATCH 08/25] vfs: rename vfs_clone_file_prep to be more descriptive

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong The vfs_clone_file_prep is a generic function to be called by filesystem implementations only. Rename the prefix to generic_ and make it more clear that it applies to remap operations, not just clones. Signed-off-by: Darrick J. Wong Reviewed-by: Amir Goldstein --- fs/oc

[PATCH 06/25] vfs: skip zero-length dedupe requests

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Don't bother calling the filesystem for a zero-length dedupe request; we can return zero and exit. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Amir Goldstein --- fs/read_write.c |5 + 1 file changed, 5 insertions(+) diff --git a/

[PATCH 03/25] vfs: check file ranges before cloning files

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Move the file range checks from vfs_clone_file_prep into a separate generic_remap_checks function so that all the checks are collected in a central location. This forms the basis for adding more checks from generic_write_checks that will make cloning's input checking more c

[PATCH 10/25] vfs: create generic_remap_file_range_touch to update inode metadata

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Create a new VFS helper to handle inode metadata updates when remapping into a file. If the operation can possibly alter the file contents, we must update the ctime and mtime and remove security privileges, just like we do for regular file writes. Wire up ocfs2 to ensure c

[PATCH 05/25] vfs: avoid problematic remapping requests into partial EOF block

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong A deduplication data corruption is exposed in XFS and btrfs. It is caused by extending the block match range to include the partial EOF block, but then allowing unknown data beyond EOF to be considered a "match" to data in the destination file because the comparison is only

[PATCH 07/25] vfs: combine the clone and dedupe into a single remap_file_range

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Combine the clone_file_range and dedupe_file_range operations into a single remap_file_range file operation dispatch since they're fundamentally the same operation. The differences between the two can be made in the prep functions. Signed-off-by: Darrick J. Wong Reviewed-

[PATCH 04/25] vfs: strengthen checking of file range inputs to generic_remap_checks

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong File range remapping, if allowed to run past the destination file's EOF, is an optimization on a regular file write. Regular file writes that extend the file length are subject to various constraints which are not checked by range cloning. This is a correctness problem bec

[PATCH 02/25] vfs: vfs_clone_file_prep_inodes should return EINVAL for a clone from beyond EOF

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong vfs_clone_file_prep_inodes cannot return 0 if it is asked to remap from a zero byte file because that's what btrfs does. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/read_write.c |3 --- 1 file changed, 3 deletions(-) diff --git a/fs/read_wr

[PATCH v4 00/25] fs: fixes for serious clone/dedupe problems

2018-10-12 Thread Darrick J. Wong
Hi all, Dave, Eric, and I have been chasing a stale data exposure bug in the XFS reflink implementation, and tracked it down to reflink forgetting to do some of the file-extending activities that must happen for regular writes. We then started auditing the clone, dedupe, and copyfile code and rea

[PATCH 01/25] xfs: add a per-xfs trace_printk macro

2018-10-12 Thread Darrick J. Wong
From: Darrick J. Wong Add a "xfs_tprintk" macro so that developers can use trace_printk to print out arbitrary debugging information with the XFS device name attached to the trace output. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_error.h |6 ++ 1 file changed, 6 insertions(+) dif

[PATCH v2] Btrfs: fix null pointer dereference on compressed write path error

2018-10-12 Thread fdmanana
From: Filipe Manana At inode.c:compress_file_range(), under the "free_pages_out" label, we can end up dereferencing the "pages" pointer when it has a NULL value. This case happens when "start" has a value of 0 and we fail to allocate memory for the "pages" pointer. When that happens we jump to th

Re: [PATCH v7 2/6] mm: export add_swap_extent()

2018-10-12 Thread Andrew Morton
On Tue, 11 Sep 2018 15:34:45 -0700 Omar Sandoval wrote: > From: Omar Sandoval > > Btrfs will need this for swap file support. > Acked-by: Andrew Morton

Re: [PATCH v7 1/6] mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS

2018-10-12 Thread Andrew Morton
On Tue, 11 Sep 2018 15:34:44 -0700 Omar Sandoval wrote: > From: Omar Sandoval > > The SWP_FILE flag serves two purposes: to make swap_{read,write}page() > go through the filesystem, and to make swapoff() call > ->swap_deactivate(). For Btrfs, we want the latter but not the former, > so split th

Re: [PATCH 42/42] btrfs: don't run delayed_iputs in commit

2018-10-12 Thread Filipe Manana
On Fri, Oct 12, 2018 at 8:35 PM Josef Bacik wrote: > > This could result in a really bad case where we do something like > > evict > evict_refill_and_join > btrfs_commit_transaction > btrfs_run_delayed_iputs > evict > evict_refill_and_join > btrfs_commit_t

[PATCH] Btrfs: fix null pointer dereference on compressed write path error

2018-10-12 Thread fdmanana
From: Filipe Manana At inode.c:compress_file_range(), under the "free_pages_out" label, we can end up dereferencing the "pages" pointer when it has a NULL value. This case happens when "start" has a value of 0 and we fail to allocate memory for the "pages" pointer. When that happens we jump to th

Re: [PATCH 05/25] vfs: avoid problematic remapping requests into partial EOF block

2018-10-12 Thread Filipe Manana
On Thu, Oct 11, 2018 at 5:13 AM Darrick J. Wong wrote: > > From: Darrick J. Wong > > A deduplication data corruption is exposed by fstests generic/505 on > XFS. (and btrfs) Btw, the generic test I wrote was indeed numbered 505, however it was never committed and there's now a generic/505 which

[PATCH 33/42] btrfs: fix insert_reserved error handling

2018-10-12 Thread Josef Bacik
We were not handling the reserved byte accounting properly for data references. Metadata was fine, if it errored out the error paths would free the bytes_reserved count and pin the extent, but it even missed one of the error cases. So instead move this handling up into run_one_delayed_ref so we a

[PATCH 31/42] btrfs: cleanup pending bgs on transaction abort

2018-10-12 Thread Josef Bacik
We may abort the transaction during a commit and not have a chance to run the pending bgs stuff, which will leave block groups on our list and cause us accounting issues and leaked memory. Fix this by running the pending bgs when we cleanup a transaction. Reviewed-by: Omar Sandoval Signed-off-by

[PATCH 41/42] btrfs: reserve extra space during evict()

2018-10-12 Thread Josef Bacik
We could generate a lot of delayed refs in evict but never have any left over space from our block rsv to make up for that fact. So reserve some extra space and give it to the transaction so it can be used to refill the delayed refs rsv every loop through the truncate path. Signed-off-by: Josef B

[PATCH 27/42] btrfs: make btrfs_destroy_delayed_refs use btrfs_delete_ref_head

2018-10-12 Thread Josef Bacik
Instead of open coding this stuff use the helper instead. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik --- fs/btrfs/disk-io.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 121ab180a78a..fe1f229320ef 100644 ---

[PATCH 40/42] btrfs: drop min_size from evict_refill_and_join

2018-10-12 Thread Josef Bacik
We don't need it, rsv->size is set once and never changes throughout its lifetime, so just use that for the reserve size. Reviewed-by: David Sterba Signed-off-by: Josef Bacik --- fs/btrfs/inode.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/inod

[PATCH 39/42] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue

2018-10-12 Thread Josef Bacik
The throttle path doesn't take cleaner_delayed_iput_mutex, which means we could think we're done flushing iputs in the data space reservation path when we could have a throttler doing an iput. There's no real reason to serialize the delayed iput flushing, so instead of taking the cleaner_delayed_i

[PATCH 34/42] btrfs: wait on ordered extents on abort cleanup

2018-10-12 Thread Josef Bacik
If we flip read-only before we initiate writeback on all dirty pages for ordered extents we've created then we'll have ordered extents left over on umount, which results in all sorts of bad things happening. Fix this by making sure we wait on ordered extents if we have to do the aborted transactio

[PATCH 42/42] btrfs: don't run delayed_iputs in commit

2018-10-12 Thread Josef Bacik
This could result in a really bad case where we do something like evict evict_refill_and_join btrfs_commit_transaction btrfs_run_delayed_iputs evict evict_refill_and_join btrfs_commit_transaction ... forever We have plenty of other places where we run del

[PATCH 26/42] btrfs: make btrfs_destroy_delayed_refs use btrfs_delayed_ref_lock

2018-10-12 Thread Josef Bacik
We have this open coded in btrfs_destroy_delayed_refs, use the helper instead. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik --- fs/btrfs/disk-io.c | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 39bd158466c

[PATCH 22/42] btrfs: only run delayed refs if we're committing

2018-10-12 Thread Josef Bacik
I noticed in a giant dbench run that we spent a lot of time on lock contention while running transaction commit. This is because dbench results in a lot of fsync()'s that do a btrfs_transaction_commit(), and they all run the delayed refs first thing, so they all contend with each other. This lead

[PATCH 38/42] btrfs: be more explicit about allowed flush states

2018-10-12 Thread Josef Bacik
For FLUSH_LIMIT flushers we really can only allocate chunks and flush delayed inode items, everything else is problematic. I added a bunch of new states and it lead to weirdness in the FLUSH_LIMIT case because I forgot about how it worked. So instead explicitly declare the states that are ok for

[PATCH 35/42] MAINTAINERS: update my email address for btrfs

2018-10-12 Thread Josef Bacik
My work email is completely useless, switch it to my personal address so I get emails on a account I actually pay attention to. Signed-off-by: Josef Bacik --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 32fbc6f732d4..7723dc958e9

[PATCH 36/42] btrfs: wait on caching when putting the bg cache

2018-10-12 Thread Josef Bacik
While testing my backport I noticed there was a panic if I ran generic/416 generic/417 generic/418 all in a row. This just happened to uncover a race where we had outstanding IO after we destroy all of our workqueues, and then we'd go to queue the endio work on those free'd workqueues. This is be

[PATCH 37/42] btrfs: wakeup cleaner thread when adding delayed iput

2018-10-12 Thread Josef Bacik
The cleaner thread usually takes care of delayed iputs, with the exception of the btrfs_end_transaction_throttle path. The cleaner thread only gets woken up every 30 seconds, so instead wake it up to do it's work so that we can free up that space as quickly as possible. Reviewed-by: Filipe Manana

[PATCH 30/42] btrfs: just delete pending bgs if we are aborted

2018-10-12 Thread Josef Bacik
We still need to do all of the accounting cleanup for pending block groups if we abort. So set the ret to trans->aborted so if we aborted the cleanup happens and everybody is happy. Reviewed-by: Omar Sandoval Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 8 +++- 1 file changed, 7

[PATCH 19/42] btrfs: set max_extent_size properly

2018-10-12 Thread Josef Bacik
From: Josef Bacik We can't use entry->bytes if our entry is a bitmap entry, we need to use entry->max_extent_size in that case. Fix up all the logic to make this consistent. Signed-off-by: Josef Bacik --- fs/btrfs/free-space-cache.c | 30 -- 1 file changed, 20 inse

[PATCH 25/42] btrfs: pass delayed_refs_root to btrfs_delayed_ref_lock

2018-10-12 Thread Josef Bacik
We don't need the trans except to get the delayed_refs_root, so just pass the delayed_refs_root into btrfs_delayed_ref_lock and call it a day. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik --- fs/btrfs/delayed-ref.c | 5 + fs/btrfs/delayed-ref.h | 2 +- fs/btrfs/extent-tree.c | 2

[PATCH 16/42] btrfs: loop in inode_rsv_refill

2018-10-12 Thread Josef Bacik
With severe fragmentation we can end up with our inode rsv size being huge during writeout, which would cause us to need to make very large metadata reservations. However we may not actually need that much once writeout is complete. So instead try to make our reservation, and if we couldn't make

[PATCH 32/42] btrfs: only free reserved extent if we didn't insert it

2018-10-12 Thread Josef Bacik
When we insert the file extent once the ordered extent completes we free the reserved extent reservation as it'll have been migrated to the bytes_used counter. However if we error out after this step we'll still clear the reserved extent reservation, resulting in a negative accounting of the reser

[PATCH 21/42] btrfs: reset max_extent_size on clear in a bitmap

2018-10-12 Thread Josef Bacik
From: Josef Bacik We need to clear the max_extent_size when we clear bits from a bitmap since it could have been from the range that contains the max_extent_size. Reviewed-by: Liu Bo Signed-off-by: Josef Bacik --- fs/btrfs/free-space-cache.c | 2 ++ 1 file changed, 2 insertions(+) diff --git

[PATCH 28/42] btrfs: handle delayed ref head accounting cleanup in abort

2018-10-12 Thread Josef Bacik
We weren't doing any of the accounting cleanup when we aborted transactions. Fix this by making cleanup_ref_head_accounting global and calling it from the abort code, this fixes the issue where our accounting was all wrong after the fs aborts. Signed-off-by: Josef Bacik --- fs/btrfs/ctree.h

[PATCH 29/42] btrfs: call btrfs_create_pending_block_groups unconditionally

2018-10-12 Thread Josef Bacik
The first thing we do is loop through the list, this if (!list_empty()) btrfs_create_pending_block_groups(); thing is just wasted space. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 3 +-- fs/btrfs/transaction.c | 6 ++ 2 files changed, 3 in

[PATCH 24/42] btrfs: assert on non-empty delayed iputs

2018-10-12 Thread Josef Bacik
I ran into an issue where there was some reference being held on an inode that I couldn't track. This assert wasn't triggered, but it at least rules out we're doing something stupid. Reviewed-by: Omar Sandoval Reviewed-by: David Sterba Signed-off-by: Josef Bacik --- fs/btrfs/disk-io.c | 1 +

[PATCH 23/42] btrfs: make sure we create all new bgs

2018-10-12 Thread Josef Bacik
Allocating new chunks modifies both the extent and chunk tree, which can trigger new chunk allocations. So instead of doing list_for_each_safe, just do while (!list_empty()) so we make sure we don't exit with other pending bg's still on our list. Reviewed-by: Omar Sandoval Reviewed-by: Liu Bo R

[PATCH 01/42] btrfs: add btrfs_delete_ref_head helper

2018-10-12 Thread Josef Bacik
From: Josef Bacik We do this dance in cleanup_ref_head and check_ref_cleanup, unify it into a helper and cleanup the calling functions. Signed-off-by: Josef Bacik Reviewed-by: Omar Sandoval --- fs/btrfs/delayed-ref.c | 14 ++ fs/btrfs/delayed-ref.h | 3 ++- fs/btrfs/extent-tree.c

[PATCH 00/42][v5] My current patch queue

2018-10-12 Thread Josef Bacik
v3->v4: - added stacktraces to all the changelogs - added the various reviewed-by's. - fixed the loop in inode_rsv_refill to not use goto again; v2->v3: - reworked the truncate/evict throttling, we were still occasionally hitting enospc aborts in production in these paths because we were too agg

[PATCH 05/42] btrfs: only count ref heads run in __btrfs_run_delayed_refs

2018-10-12 Thread Josef Bacik
We pick the number of ref's to run based on the number of ref heads, and only make the decision to stop once we've processed entire ref heads, so only count the ref heads we've run and bail once we've hit the number of ref heads we wanted to process. Signed-off-by: Josef Bacik --- fs/btrfs/exten

[PATCH 14/42] btrfs: reset max_extent_size properly

2018-10-12 Thread Josef Bacik
If we use up our block group before allocating a new one we'll easily get a max_extent_size that's set really really low, which will result in a lot of fragmentation. We need to make sure we're resetting the max_extent_size when we add a new chunk or add new space. Reviewed-by: Filipe Manana Sig

[PATCH 11/42] btrfs: fix truncate throttling

2018-10-12 Thread Josef Bacik
We have a bunch of magic to make sure we're throttling delayed refs when truncating a file. Now that we have a delayed refs rsv and a mechanism for refilling that reserve simply use that instead of all of this magic. Signed-off-by: Josef Bacik --- fs/btrfs/inode.c | 79 -

[PATCH 06/42] btrfs: introduce delayed_refs_rsv

2018-10-12 Thread Josef Bacik
From: Josef Bacik Traditionally we've had voodoo in btrfs to account for the space that delayed refs may take up by having a global_block_rsv. This works most of the time, except when it doesn't. We've had issues reported and seen in production where sometimes the global reserve is exhausted du

[PATCH 03/42] btrfs: cleanup extent_op handling

2018-10-12 Thread Josef Bacik
From: Josef Bacik The cleanup_extent_op function actually would run the extent_op if it needed running, which made the name sort of a misnomer. Change it to run_and_cleanup_extent_op, and move the actual cleanup work to cleanup_extent_op so it can be used by check_ref_cleanup() in order to unify

[PATCH 13/42] btrfs: add ALLOC_CHUNK_FORCE to the flushing code

2018-10-12 Thread Josef Bacik
With my change to no longer take into account the global reserve for metadata allocation chunks we have this side-effect for mixed block group fs'es where we are no longer allocating enough chunks for the data/metadata requirements. To deal with this add a ALLOC_CHUNK_FORCE step to the flushing st

[PATCH 08/42] btrfs: dump block_rsv whe dumping space info

2018-10-12 Thread Josef Bacik
For enospc_debug having the block rsvs is super helpful to see if we've done something wrong. Signed-off-by: Josef Bacik Reviewed-by: Omar Sandoval Reviewed-by: David Sterba --- fs/btrfs/extent-tree.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/fs/btrfs/extent-tree.c b

[PATCH 02/42] btrfs: add cleanup_ref_head_accounting helper

2018-10-12 Thread Josef Bacik
From: Josef Bacik We were missing some quota cleanups in check_ref_cleanup, so break the ref head accounting cleanup into a helper and call that from both check_ref_cleanup and cleanup_ref_head. This will hopefully ensure that we don't screw up accounting in the future for other things that we a

[PATCH 07/42] btrfs: check if free bgs for commit

2018-10-12 Thread Josef Bacik
may_commit_transaction will skip committing the transaction if we don't have enough pinned space or if we're trying to find space for a SYSTEM chunk. However if we have pending free block groups in this transaction we still want to commit as we may be able to allocate a chunk to make our reservati

[PATCH 10/42] btrfs: protect space cache inode alloc with nofs

2018-10-12 Thread Josef Bacik
If we're allocating a new space cache inode it's likely going to be under a transaction handle, so we need to use memalloc_nofs_save() in order to avoid deadlocks, and more importantly lockdep messages that make xfstests fail. Reviewed-by: Omar Sandoval Signed-off-by: Josef Bacik Reviewed-by: Da

[PATCH 04/42] btrfs: only track ref_heads in delayed_ref_updates

2018-10-12 Thread Josef Bacik
From: Josef Bacik We use this number to figure out how many delayed refs to run, but __btrfs_run_delayed_refs really only checks every time we need a new delayed ref head, so we always run at least one ref head completely no matter what the number of items on it. Fix the accounting to only be ad

[PATCH 18/42] btrfs: move the dio_sem higher up the callchain

2018-10-12 Thread Josef Bacik
We're getting a lockdep splat because we take the dio_sem under the log_mutex. What we really need is to protect fsync() from logging an extent map for an extent we never waited on higher up, so just guard the whole thing with dio_sem. == WARNIN

[PATCH 09/42] btrfs: release metadata before running delayed refs

2018-10-12 Thread Josef Bacik
We want to release the unused reservation we have since it refills the delayed refs reserve, which will make everything go smoother when running the delayed refs if we're short on our reservation. Reviewed-by: Omar Sandoval Reviewed-by: Liu Bo Reviewed-by: Nikolay Borisov Signed-off-by: Josef B

[PATCH 12/42] btrfs: don't use global rsv for chunk allocation

2018-10-12 Thread Josef Bacik
We've done this forever because of the voodoo around knowing how much space we have. However we have better ways of doing this now, and on normal file systems we'll easily have a global reserve of 512MiB, and since metadata chunks are usually 1GiB that means we'll allocate metadata chunks more rea

[PATCH 20/42] btrfs: don't use ctl->free_space for max_extent_size

2018-10-12 Thread Josef Bacik
From: Josef Bacik max_extent_size is supposed to be the largest contiguous range for the space info, and ctl->free_space is the total free space in the block group. We need to keep track of these separately and _only_ use the max_free_space if we don't have a max_extent_size, as that means our o

[PATCH 17/42] btrfs: run delayed iputs before committing

2018-10-12 Thread Josef Bacik
Delayed iputs means we can have final iputs of deleted inodes in the queue, which could potentially generate a lot of pinned space that could be free'd. So before we decide to commit the transaction for ENOPSC reasons, run the delayed iputs so that any potential space is free'd up. If there is and

[PATCH 15/42] btrfs: don't enospc all tickets on flush failure

2018-10-12 Thread Josef Bacik
With the introduction of the per-inode block_rsv it became possible to have really really large reservation requests made because of data fragmentation. Since the ticket stuff assumed that we'd always have relatively small reservation requests it just killed all tickets if we were unable to satisf

Re: [PATCH 33/42] btrfs: fix insert_reserved error handling

2018-10-12 Thread David Sterba
On Thu, Oct 11, 2018 at 03:54:22PM -0400, Josef Bacik wrote: > We were not handling the reserved byte accounting properly for data > references. Metadata was fine, if it errored out the error paths would > free the bytes_reserved count and pin the extent, but it even missed one > of the error case

Re: [PATCH 19/42] btrfs: set max_extent_size properly

2018-10-12 Thread David Sterba
On Thu, Oct 11, 2018 at 03:54:08PM -0400, Josef Bacik wrote: > From: Josef Bacik > > We can't use entry->bytes if our entry is a bitmap entry, we need to use > entry->max_extent_size in that case. Fix up all the logic to make this > consistent. > > Signed-off-by: Josef Bacik > --- > fs/btrfs/

Re: [PATCH 07/42] btrfs: check if free bgs for commit

2018-10-12 Thread David Sterba
On Thu, Oct 11, 2018 at 02:33:55PM -0400, Josef Bacik wrote: > On Thu, Oct 04, 2018 at 01:24:24PM +0200, David Sterba wrote: > > On Fri, Sep 28, 2018 at 07:17:46AM -0400, Josef Bacik wrote: > > > may_commit_transaction will skip committing the transaction if we don't > > > have enough pinned space

Re: [PATCH v2 1/2] btrfs: relocation: Cleanup while() loop using rbtree_postorder_for_each_entry_safe()

2018-10-12 Thread David Sterba
On Fri, Sep 21, 2018 at 03:20:29PM +0800, Qu Wenruo wrote: > And add one line comment explaining what we're doing for each loop. > > Signed-off-by: Qu Wenruo > --- > changelog: > v2: > Use rbtree_postorder_for_each_entry_safe() to replace for() loop. 1-2 reviewed and added to 4.20 queue, thank

Re: [PATCH 05/25] vfs: avoid problematic remapping requests into partial EOF block

2018-10-12 Thread Darrick J. Wong
On Fri, Oct 12, 2018 at 11:16:16AM +1100, Dave Chinner wrote: > On Wed, Oct 10, 2018 at 09:12:54PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong > > > > A deduplication data corruption is exposed by fstests generic/505 on > > XFS. It is caused by extending the block match range to inclu

Re: [PATCH 24/25] xfs: support returning partial reflink results

2018-10-12 Thread Darrick J. Wong
On Fri, Oct 12, 2018 at 12:22:26PM +1100, Dave Chinner wrote: > On Wed, Oct 10, 2018 at 09:15:19PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong > > > > Back when the XFS reflink code only supported clone_file_range, we were > > only able to return zero or negative error codes to usersp

Re: [PATCH v3 4/4] btrfs: Refactor find_free_extent() loops update into find_free_extent_update_loop()

2018-10-12 Thread Qu Wenruo
On 2018/10/12 下午9:52, Josef Bacik wrote: > On Fri, Oct 12, 2018 at 02:18:19PM +0800, Qu Wenruo wrote: >> We have a complex loop design for find_free_extent(), that has different >> behavior for each loop, some even includes new chunk allocation. >> >> Instead of putting such a long code into find

Re: [PATCH v3 3/4] btrfs: Refactor unclustered extent allocation into find_free_extent_unclustered()

2018-10-12 Thread Josef Bacik
On Fri, Oct 12, 2018 at 02:18:18PM +0800, Qu Wenruo wrote: > This patch will extract unclsutered extent allocation code into > find_free_extent_unclustered(). > > And this helper function will use return value to indicate what to do > next. > > This should make find_free_extent() a little easier

Re: [PATCH v3 2/4] btrfs: Refactor clustered extent allocation into find_free_extent_clustered()

2018-10-12 Thread Josef Bacik
On Fri, Oct 12, 2018 at 02:18:17PM +0800, Qu Wenruo wrote: > We have two main methods to find free extents inside a block group: > 1) clustered allocation > 2) unclustered allocation > > This patch will extract the clustered allocation into > find_free_extent_clustered() to make it a little easier

Re: [PATCH v3 1/4] btrfs: Introduce find_free_extent_ctl structure for later rework

2018-10-12 Thread Josef Bacik
On Fri, Oct 12, 2018 at 02:18:16PM +0800, Qu Wenruo wrote: > Instead of tons of different local variables in find_free_extent(), > extract them into find_free_extent_ctl structure, and add better > explanation for them. > > Some modification may looks redundant, but will later greatly simplify > f

Re: [PATCH v3 4/4] btrfs: Refactor find_free_extent() loops update into find_free_extent_update_loop()

2018-10-12 Thread Josef Bacik
On Fri, Oct 12, 2018 at 02:18:19PM +0800, Qu Wenruo wrote: > We have a complex loop design for find_free_extent(), that has different > behavior for each loop, some even includes new chunk allocation. > > Instead of putting such a long code into find_free_extent() and makes it > harder to read, ju

Re: [PATCH] Btrfs: fix deadlock when writing out free space caches

2018-10-12 Thread Josef Bacik
On Fri, Oct 12, 2018 at 10:03:55AM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > When writing out a block group free space cache we can end deadlocking > with ourseves on an extent buffer lock resulting in a warning like the > following: > > [245043.379979] WARNING: CPU: 4 PID: 2

Re: [PATCH] Btrfs: fix use-after-free during inode eviction

2018-10-12 Thread Qu Wenruo
On 2018/10/12 下午8:02, fdman...@kernel.org wrote: > From: Filipe Manana > > At inode.c:evict_inode_truncate_pages(), when we iterate over the inode's > extent states, we access an extent state record's "state" field after we > unlocked the inode's io tree lock. This can lead to a use-after-free

[PATCH] Btrfs: fix use-after-free during inode eviction

2018-10-12 Thread fdmanana
From: Filipe Manana At inode.c:evict_inode_truncate_pages(), when we iterate over the inode's extent states, we access an extent state record's "state" field after we unlocked the inode's io tree lock. This can lead to a use-after-free issue because after we unlock the io tree that extent state r

Re: errors reported by btrfs-check

2018-10-12 Thread Qu Wenruo
On 2018/10/12 下午6:35, Jürgen Herrmann wrote: > Am 12.10.2018 10:19, schrieb Qu Wenruo: > > [snip] > >> Please run the following command: >> >> # btrfs ins dump-tree --follow -b 166456229888 >> >> It could be caused by the fact that btrfs-progs --repair doesn't handle >> log tree well. >> >> If

Re: errors reported by btrfs-check

2018-10-12 Thread Jürgen Herrmann
Am 12.10.2018 10:19, schrieb Qu Wenruo: [snip] Please run the following command: # btrfs ins dump-tree --follow -b 166456229888 It could be caused by the fact that btrfs-progs --repair doesn't handle log tree well. If that's the case, "btrfs rescue zero-log" should help. But anyway, feel fr

Re: errors reported by btrfs-check

2018-10-12 Thread Qu Wenruo
[snip] > > Hi there! > > I ran btrfs check --repair on the filesystem. I dont' have this log > anymore, > as it was then sitting on the repaired fs), which is now dead. > after repairing it I could still mount the fs. > > as my btrfs send problem still persists (another thread), I decided to >

Re: errors reported by btrfs-check

2018-10-12 Thread Jürgen Herrmann
Am 12.10.2018 01:56, schrieb Qu Wenruo: On 2018/10/12 上午4:30, Jürgen Herrmann wrote: Hi! I just did a btrfs check on my laptop's btrfs filesystem while i was on the usb stick rescue system. the following errors where reported: root@mint:/home/mint# btrfs check /dev/mapper/sda3crypt Checking fi

Re: [PATCH 2/2] btrfs-progs: Deprecate unused super block member log_root_transid

2018-10-12 Thread Qu Wenruo
On 2018/10/12 下午5:13, Nikolay Borisov wrote: > > > On 12.10.2018 11:46, Qu Wenruo wrote: >> >> >> On 2018/10/12 下午2:53, Nikolay Borisov wrote: >>> >>> >>> On 12.10.2018 09:42, Qu Wenruo wrote: The only user of it is "btrfs inspect dump-super". Signed-off-by: Qu Wenruo ---

Re: [PATCH 2/2] btrfs-progs: Deprecate unused super block member log_root_transid

2018-10-12 Thread Nikolay Borisov
On 12.10.2018 11:46, Qu Wenruo wrote: > > > On 2018/10/12 下午2:53, Nikolay Borisov wrote: >> >> >> On 12.10.2018 09:42, Qu Wenruo wrote: >>> The only user of it is "btrfs inspect dump-super". >>> >>> Signed-off-by: Qu Wenruo >>> --- >>> cmds-inspect-dump-super.c | 4 ++-- >>> ctree.h

[PATCH] Btrfs: fix deadlock when writing out free space caches

2018-10-12 Thread fdmanana
From: Filipe Manana When writing out a block group free space cache we can end deadlocking with ourseves on an extent buffer lock resulting in a warning like the following: [245043.379979] WARNING: CPU: 4 PID: 2608 at fs/btrfs/locking.c:251 btrfs_tree_lock+0x1be/0x1d0 [btrfs] [245043.392792

Re: [PATCH] btrfs-progs: add cli to forget one or all scanned devices

2018-10-12 Thread Nikolay Borisov
On 12.10.2018 07:06, Anand Jain wrote: > This patch adds cli > btrfs device forget [dev] > to remove the given device structure in the kernel if the device > is unmounted. If no argument is given it shall remove all stale > (device which are not mounted) from the kernel. > > Signed-off-by: An

Re: [PATCH 14/42] btrfs: reset max_extent_size properly

2018-10-12 Thread Filipe Manana
On Thu, Oct 11, 2018 at 8:57 PM Josef Bacik wrote: > > If we use up our block group before allocating a new one we'll easily > get a max_extent_size that's set really really low, which will result in > a lot of fragmentation. We need to make sure we're resetting the > max_extent_size when we add

Re: [PATCH 20/42] btrfs: don't use ctl->free_space for max_extent_size

2018-10-12 Thread Filipe Manana
On Thu, Oct 11, 2018 at 8:57 PM Josef Bacik wrote: > > From: Josef Bacik > > max_extent_size is supposed to be the largest contiguous range for the > space info, and ctl->free_space is the total free space in the block > group. We need to keep track of these separately and _only_ use the > max_f

  1   2   >