[PATCH] btrfs: fix race when picking most recent mod log operation for an old root

2021-04-20 Thread fdmanana
From: Filipe Manana Commit dbcc7d57bffc0c ("btrfs: fix race when cloning extent buffer during rewind of an old root"), fixed a race when we need to rewind the extent buffer of an old root. It was caused by picking a new mod log operation for the extent buffer while getting a cloned extent buffer

[PATCH] btrfs: fix metadata extent leak after failure to create subvolume

2021-04-20 Thread fdmanana
From: Filipe Manana When creating a subvolume we allocate an extent buffer for its root node after starting a transaction. We setup a root item for the subvolume that points to that extent buffer and then attempt to insert the root item into the root tree - however if that fails, due to -ENOMEM f

[PATCH v2] btrfs: zoned: fix unpaired block group unfreeze during device replace

2021-04-14 Thread fdmanana
From: Filipe Manana When doing a device replace on a zoned filesystem, if we find a block group with ->to_copy == 0, we jump to the label 'done', which will result in later calling btrfs_unfreeze_block_group(), even though at this point we never called btrfs_freeze_block_group(). Since at this p

[PATCH] btrfs: zoned: fix unpaired block group unfreeze during device replace

2021-04-14 Thread fdmanana
From: Filipe Manana When doing a device replace on a zoned filesystem, if we find a block group with ->to_copy == 0, we jump to the label 'done', which will result in later calling btrfs_unfreeze_block_group(), even though at this point we never called btrfs_freeze_block_group(). Since at this p

[PATCH] btrfs: fix race between transaction aborts and fsyncs leading to use-after-free

2021-04-05 Thread fdmanana
From: Filipe Manana There is a race between a task aborting a transaction during a commit, a task doing an fsync and the transaction kthread, which leads to an use-after-free of the log root tree. When this happens, it results in a stack trace like the following: [99678.547335] BTRFS info (devic

[PATCH] btrfs: improve btree readahead for full send operations

2021-03-31 Thread fdmanana
From: Filipe Manana Currently a full send operation uses the standard btree readahead when iterating over the subvolume/snapshot btree, which despite bringing good performance benefits, it could be improved in a few aspects for use cases such as full send operations, which are guaranteed to visit

[PATCH] btrfs: fix exhaustion of the system chunk array due to concurrent allocations

2021-03-31 Thread fdmanana
From: Filipe Manana When we are running out of space for updating the chunk tree, that is, when we are low on available space in the system space info, if we have many task concurrently allocating block groups, via fallocate for example, many of them can end up all allocating new system chunks wh

[PATCH] btrfs: update outdated comment at btrfs_replace_file_extents()

2021-03-26 Thread fdmanana
From: Filipe Manana There is a comment at btrfs_replace_file_extents() that mentions that we set the full sync flag on an inode when cloning into a file with a size greater than or equals to 16MiB, through try_release_extent_mapping() when we truncate the page cache after replacing file extents d

[PATCH] btrfs: add test for send/receive with file capabilities set

2021-03-26 Thread fdmanana
From: Filipe Manana Test that if we set a capability on a file but not on the next files we create, send/receive operations only apply the capability to the first file, the one for which we have set a capability. This is motivated by a regression that started to happen with kernel 5.8, caused by

[PATCH] btrfs: make reflinks respect O_SYNC O_DSYNC and S_SYNC flags

2021-03-23 Thread fdmanana
From: Filipe Manana If we reflink to or from a file opened with O_SYNC/O_DSYNC or to/from a file that has the S_SYNC attribute set, we totally ignore that and do not durably persist the reflink changes. Since a reflink can change the data readable from a file (and mtime/ctime, or a file size), it

[PATCH] btrfs/232: fix umount failure due to fsstress still running

2021-03-18 Thread fdmanana
test failure: btrfs/232 1s ... umount: /home/fdmanana/btrfs-tests/scratch_1: target is busy. _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see /home/fdmanana/git/hub/xfstests/results//btrfs/232.full for details) Fix that by adding a trap to the writer() function. Signed-off

[PATCH] btrfs: fix sleep while in non-sleep context during qgroup removal

2021-03-18 Thread fdmanana
From: Filipe Manana While removing a qgroup's sysfs entry we end up taking the kernfs_mutex, through kobject_del(), while holding the fs_info->qgroup_lock spinlock, producing the following trace: [ 821.843637] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [

[PATCH v2] btrfs: test delayed subvolume deletion on mount and remount

2021-03-16 Thread fdmanana
From: Filipe Manana Test that subvolume deletion is resumed on RW mounts, that it is not performed on RO mounts and that after remounting a filesystem from RO to RW mode, it is also performed. This triggers a regression introduced in kernel 5.11 which is fixed by a patch that has the following s

[PATCH] btrfs: update outdated comment at btrfs_orphan_cleanup()

2021-03-16 Thread fdmanana
From: Filipe Manana btrfs_orphan_cleanup() has a comment referring to find_dead_roots, but function does not exists since commit cb517eabba4f10 ("Btrfs: cleanup the similar code of the fs root read"). What we use now to find and load dead roots is btrfs_find_orphan_roots(). So update the comment

[PATCH] btrfs: fix subvolume/snapshot deletion not triggered on mount

2021-03-16 Thread fdmanana
From: Filipe Manana During the mount procedure we are calling btrfs_orphan_cleanup() against the root tree, which will find all orphans items in this tree. When an orphan item corresponds to a deleted subvolume/snapshot (instead of an inode space cache), it must not delete the orphan item, becaus

[PATCH v2] btrfs: add test for cases when a dio write has to fallback to a buffered write

2021-03-16 Thread fdmanana
From: Filipe Manana Test cases where a direct IO write, with O_DSYNC, can not be done and has to fallback to a buffered write. This is motivated by the fact we don't have existing tests for these cases and in fact we had a regression for one case in the 5.10 kernel. This was the case when doing

[PATCH] btrfs: test delayed subvolume deletion on mount and remount

2021-03-16 Thread fdmanana
From: Filipe Manana Test that subvolume deletion is resumed on RW mounts, that it is not performed on RO mounts and that after remounting a filesystem from RO to RW mode, it is performed. This currently passes on btrfs and it is not motivated by any recent regression. This test is being added ju

[PATCH] btrfs: zoned: fix linked list corruption after log root tree allocation failure

2021-03-11 Thread fdmanana
From: Filipe Manana When using a zoned filesystem, while syncing the log, if we fail to allocate the root node for the log root tree, we are not removing the log context we allocated on stack from the list of log contextes of the log root tree. This means after the return from btrfs_sync_log() we

[PATCH 9/9] btrfs: update debug message when checking seq number of a delayed ref

2021-03-11 Thread fdmanana
From: Filipe Manana We used to encode two different numbers in the tree mod log counter used for sequence numbers, one in the upper 32 bits and the other one in the lower 32 bits. However that is no longer the case, we stopped doing that since commit fcebe4562dec83 ("Btrfs: rework qgroup accounti

[PATCH 7/9] btrfs: remove unnecessary leaf check at btrfs_tree_mod_log_free_eb()

2021-03-11 Thread fdmanana
From: Filipe Manana At btrfs_tree_mod_log_free_eb() we check if we are dealing with a leaf, and if so, return immediately and do nothing. However this check can be removed, because after it we call tree_mod_need_log(), which returns false when given an extent buffer that corresponds to a leaf. S

[PATCH 4/9] btrfs: use booleans where appropriate for the tree mod log functions

2021-03-11 Thread fdmanana
From: Filipe Manana Several functions of the tree modification log use integers as booleans, so change them to use booleans instead, making their use more clear. Signed-off-by: Filipe Manana --- fs/btrfs/ctree.c| 6 +++--- fs/btrfs/tree-mod-log.c | 42 -

[PATCH 3/9] btrfs: move the tree mod log code into its own file

2021-03-11 Thread fdmanana
From: Filipe Manana The tree modification log, which records modifications done to btrees, is quite large and currently spread all over ctree.c, which is a huge file already. To make things better organized, move all that code into its own separate source and header files. Functions and definiti

[PATCH 8/9] btrfs: add and use helper to get lowest sequence number for the tree mod log

2021-03-11 Thread fdmanana
From: Filipe Manana There are two places outside the tree mod log module that extract the lowest sequence number of the tree mod log. These places end up duplicating code and open coding the logic and internal implementation details of the tree mod log. So add a helper to the tree mod log module

[PATCH 6/9] btrfs: use the new bit BTRFS_FS_TREE_MOD_LOG_USERS at btrfs_free_tree_block()

2021-03-11 Thread fdmanana
From: Filipe Manana Instead of exposing implementation details of the tree mod log to check if there are active tree mod log users at btrfs_free_tree_block(), use the new bit BTRFS_FS_TREE_MOD_LOG_USERS for fs_info->flags instead. This way extent-tree.c does not need to known about any of the int

[PATCH 5/9] btrfs: use a bit to track the existence of tree mod log users

2021-03-11 Thread fdmanana
From: Filipe Manana The tree modification log functions are called very frequently, basically they are called everytime a btree is modified (a pointer added or removed to a node, a new root for a btree is set, etc). Because of that, to avoid heavy lock contention on the lock that protects the lis

[PATCH 1/9] btrfs: fix race when cloning extent buffer during rewind of an old root

2021-03-11 Thread fdmanana
From: Filipe Manana While resolving backreferences, as part of a logical ino ioctl call or fiemap, we can end up hitting a BUG_ON() when replaying tree mod log operations of a root, triggering a stack trace like the following: [ cut here ] kernel BUG at fs/btrfs/ctree

[PATCH 0/9] btrfs: bug fixes for the tree mod log and small refactorings

2021-03-11 Thread fdmanana
From: Filipe Manana This patchset fixes a couple bugs, in the two first patches, with the tree mod log code. The remaining patches just move all that code into a separate file, since it's quite large and ctree.c is huge as well, and do some small refactorings and cleanups. One of the bugs in par

[PATCH 2/9] btrfs: always pin deleted leaves when there are active tree mod log users

2021-03-11 Thread fdmanana
From: Filipe Manana When freeing a tree block we may end up adding its extent back to the free space cache/tree, as long as there are no more references for it, it was created in the current transaction and writeback for it never happened. This is generally fine, however when we have tree mod log

[PATCH v2 2/2] btrfs: add btree read ahead for incremental send operations

2021-03-01 Thread fdmanana
From: Filipe Manana Currently we do not do btree read ahead when doing an incremental send, however we know that we will read and process any node or leaf in the send root that has a generation greater than the generation of the parent root. So triggering read ahead for such nodes and leafs is be

[PATCH v2 1/2] btrfs: add btree read ahead for full send operations

2021-03-01 Thread fdmanana
From: Filipe Manana When doing a full send we know that we are going to be reading every node and leaf of the send root, so we benefit from enabling read ahead for the btree. This change enables read ahead for full send operations only, incremental sends will have read ahead enabled in a differe

[PATCH v2 0/2] btrfs: add btree read ahead for send operations

2021-03-01 Thread fdmanana
From: Filipe Manana This patchset adds btree read ahead for full and incremental send operations, which results in some nice speedups. Test and results are mentioned in the change log of each patch. V2: Updated second patch, for incremental sends, to limit readahead to avoid too many reads i

[PATCH] btrfs: fix warning when creating a directory with smack enabled

2021-02-26 Thread fdmanana
From: Filipe Manana When we have smack enabled, during the creation of a directory smack may attempt to add a "smack transmute" xattr on the inode, which results in the following warning and trace: [ 220.732359] [ cut here ] [ 220.732398] WARNING: CPU: 3 PID: 2548 at fs

[PATCH 2/2] btrfs: add btree read ahead for incremental send operations

2021-02-26 Thread fdmanana
From: Filipe Manana Currently we do not do btree read ahead when doing an incremental send, however we know that we will read and process any node or leaf in the send root that has a generation greater than the generation of the parent root. So triggering read ahead for such nodes and leafs is be

[PATCH 1/2] btrfs: add btree read ahead for full send operations

2021-02-26 Thread fdmanana
From: Filipe Manana When doing a full send we know that we are going to be reading every node and leaf of the send root, so we benefit from enabling read ahead for the btree. This change enables read ahead for full send operations only, incremental sends will have read ahead enabled in a differe

[PATCH 0/2] btrfs: add btree read ahead for send operations

2021-02-26 Thread fdmanana
From: Filipe Manana This patchset adds btree read ahead for full and incremental send operations, which results in some nice speedups. Test and results are mentioned in the change log of each patch. Filipe Manana (2): btrfs: add btree read ahead for full send operations btrfs: add btree read

[PATCH 0/3] btrfs: fix a couple races between fsync and other code

2021-02-23 Thread fdmanana
From: Filipe Manana The first patch fixes a race between fsync and memory mapped writes, which can result in corruptions. The second one fixes a different race that in practice should be "impossible" to happen, but in case it's triggered somehow, results in not logging an inode when it has new ex

[PATCH 3/3] btrfs: remove stale comment and logic from btrfs_inode_in_log()

2021-02-23 Thread fdmanana
From: Filipe Manana Currently btrfs_inode_in_log() checks the list of modified extents of the inode, and has a comment mentioning why, as it used to be necessary to make sure if we did something like the following: mmap write range A mmap write range B msync range A (ranged fsync) msync

[PATCH 2/3] btrfs: fix race between marking inode needs to be logged and log syncing

2021-02-23 Thread fdmanana
From: Filipe Manana We have a race between marking that an inode needs to be logged, either at btrfs_set_inode_last_trans() or at btrfs_page_mkwrite(), and between btrfs_sync_log(). The following steps describe how the race happens. 1) We are at transaction N; 2) Inode I was previously fsynced

[PATCH 1/3] btrfs: fix race between memory mapped writes and fsync

2021-02-23 Thread fdmanana
From: Filipe Manana When doing an fsync we flush all delalloc, lock the inode (vfs lock), flush any new delalloc that might have been created before taking the lock and then wait either for the ordered extents to complete or just for the writeback to complete (depending on whether the full sync f

[PATCH] btrfs: add test for cloning a hole post eof when using NO_HOLES feature

2021-02-16 Thread fdmanana
From: Filipe Manana Test that when using the NO_HOLES feature, if we truncate down a file, clone a file range covering only a hole into an offset beyond the current file size, and then fsync the file, after a power failure we get the expected file content and we do not get stale data correspondin

[PATCH] btrfs: fix stale data exposure after cloning a hole with NO_HOLES enabled

2021-02-16 Thread fdmanana
From: Filipe Manana When using the NO_HOLES feature, if we clone a file range that spans only a hole into a range that is at or beyond the current i_size of the destination file, we end up not setting the full sync runtime flag on the inode. As a result, if we then fsync the destination file and

[PATCH] btrfs: add test for cases when a dio write has to fallback to a buffered write

2021-02-11 Thread fdmanana
From: Filipe Manana Test cases where a direct IO write, with O_DSYNC, can not be done and has to fallback to a buffered write. This is motivated by a regression that was introduced in kernel 5.10 by commit 0eb79294dbe328 ("btrfs: dio iomap DSYNC workaround")) and was fixed in kernel 5.11 by comm

[PATCH 5.10.x] btrfs: fix crash after non-aligned direct IO write with O_DSYNC

2021-02-11 Thread fdmanana
From: Filipe Manana Whenever we attempt to do a non-aligned direct IO write with O_DSYNC, we end up triggering an assertion and crashing. Example reproducer: $ cat test.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV > /dev/null mount $DEV $MNT # Do a direct IO write

[PATCH] btrfs-progs: remove workaround for setting capabilities in the receive command

2021-02-09 Thread fdmanana
From: Filipe Manana We had a few bugs on the kernel side of send/receive where capabilities ended up being lost after receiving a send stream. They all stem from the fact that the kernel used to send all xattrs before issuing the chown command, and the later clears any existing capabilities in a

[PATCH v2 3/3] btrfs: fix race between swap file activation and snapshot creation

2021-02-05 Thread fdmanana
From: Filipe Manana When creating a snapshot we check if the current number of swap files, in the root, is non-zero, and if it is, we error out and warn that we can not create the snapshot because there are active swap files. However this is racy because when a task started activation of a swap

[PATCH v2 0/3] btrfs: fix a couple swapfile support bugs

2021-02-05 Thread fdmanana
From: Filipe Manana The following patchset fixes 2 bugs with the swapfile support, where we can end up falling back to COW when writing to an active swapfile. The first patch is actually independent and just makes the nocow buffered IO path more efficient by eliminating a repeated check for a rea

[PATCH v2 1/3] btrfs: avoid checking for RO block group twice during nocow writeback

2021-02-05 Thread fdmanana
From: Filipe Manana During the nocow writeback path, we currently iterate the rbtree of block groups twice: once for checking if the target block group is RO with the call to btrfs_extent_readonly()), and once again for getting a nocow reference on the block group with a call to btrfs_inc_nocow_w

[PATCH v2 2/3] btrfs: fix race between writes to swap files and scrub

2021-02-05 Thread fdmanana
From: Filipe Manana When we active a swap file, at btrfs_swap_activate(), we acquire the exclusive operation lock to prevent the physical location of the swap file extents to be changed by operations such as balance and device replace/resize/remove. We also call there can_nocow_extent() which, am

[PATCH] btrfs: fix extent buffer leak on failure to copy root

2021-02-04 Thread fdmanana
From: Filipe Manana At btrfs_copy_root(), if the call to btrfs_inc_ref() fails we end up returning without unlocking and releasing our reference on the extent buffer named "cow" we previously allocated with btrfs_alloc_tree_block(). So fix that by unlocking the extent buffer and dropping our ref

[PATCH 2/4] btrfs: fix race between writes to swap files and scrub

2021-02-03 Thread fdmanana
From: Filipe Manana When we active a swap file, at btrfs_swap_activate(), we acquire the exclusive operation lock to prevent the physical location of the swap file extents to be changed by operations such as balance and device replace/resize/remove. We also call there can_nocow_extent() which, am

[PATCH 4/4] btrfs: fix race between swap file activation and snapshot creation

2021-02-03 Thread fdmanana
From: Filipe Manana When creating a snapshot we check if the current number of swap files, in the root, is non-zero, and if it is, we error out and warn that we can not create the snapshot because there are active swap files. However this is racy because when a task started activation of a swap

[PATCH 1/4] btrfs: avoid checking for RO block group twice during nocow writeback

2021-02-03 Thread fdmanana
From: Filipe Manana During the nocow writeback path, we currently iterate the rbtree of block groups twice: once for checking if the target block group is RO with the call to btrfs_extent_readonly()), and once again for getting a nocow reference on the block group with a call to btrfs_inc_nocow_w

[PATCH 3/4] btrfs: remove no longer used function btrfs_extent_readonly()

2021-02-03 Thread fdmanana
From: Filipe Manana After the two previous patches: btrfs: avoid checking for RO block group twice during nocow writeback btrfs: fix race between writes to swap files and scrub it is no longer used, so just remove it. Signed-off-by: Filipe Manana --- fs/btrfs/ctree.h | 1 - fs/btr

[PATCH 0/4] btrfs: fix a couple swapfile support bugs

2021-02-03 Thread fdmanana
From: Filipe Manana The following patchset fixes 2 bugs with the swapfile support, where we can end up falling back to COW when writing to an active swapfile. As a bonus, it makes the NOCOW write patch, for both buffered and direct IO, more efficient by avoiding doing repeated worked when checkin

[PATCH] btrfs: remove wrong comment for can_nocow_extent()

2021-01-27 Thread fdmanana
From: Filipe Manana The comment for can_nocow_extent() says that the function will flush ordered extents, however that never happens and was never true before the comment was added in commit e4ecaf90bc13 ("btrfs: add comments for btrfs_check_can_nocow() and can_nocow_extent()"). This is true only

[PATCH 6/7] btrfs: remove unnecessary check_parent_dirs_for_sync()

2021-01-27 Thread fdmanana
From: Filipe Manana Whenever we fsync an inode, if it is a directory, a regular file that was created in the current transaction or has last_unlink_trans set to the generation of the current transaction, we check if any of its ancestor inodes (and the inode itself if it is a directory) can not be

[PATCH 7/7] btrfs: make concurrent fsyncs wait less when waiting for a transaction commit

2021-01-27 Thread fdmanana
From: Filipe Manana Often an fsync needs to fallback to a transaction commit for several reasons (to ensure consistency after a power failure, a new block group was allocated or a temporary error such as ENOMEM or ENOSPC happened). In that case the log is marked as needing a full commit and any

[PATCH 5/7] btrfs: skip logging inodes already logged when logging new entries

2021-01-27 Thread fdmanana
From: Filipe Manana When logging new directory entries of a directory, we log the inodes of new dentries and the inodes of dentries pointing to directories that may have been created in past transactions. For the case of directories we log in full mode, which can be particularly expensive for lar

[PATCH 0/7] btrfs: more performance improvements for dbench workloads

2021-01-27 Thread fdmanana
From: Filipe Manana The following patchset brings one more batch of performance improvements with dbench workloads, or anything that mixes file creation, file writes, renames, unlinks, etc with fsync like dbench does. This patchset is mostly based on avoiding logging directory inodes multiple tim

[PATCH 3/7] btrfs: avoid logging new ancestor inodes when logging new inode

2021-01-27 Thread fdmanana
From: Filipe Manana When we fsync a new file, created in the current transaction, we check all its ancestor inodes and always log them if they were created in the current transaction - even if we have already logged them before, which is a waste of time. So avoid logging new ancestor inodes if t

[PATCH 4/7] btrfs: skip logging directories already logged when logging all parents

2021-01-27 Thread fdmanana
From: Filipe Manana Some times when we fsync an inode we need to do a full log of all its ancestors (due to unlink, link or rename operations), which can be an expensive operation, specially if the directories are large. However if we find an ancestor directory inode that is already logged in th

[PATCH 1/7] btrfs: remove unnecessary directory inode item update when deleting dir entry

2021-01-27 Thread fdmanana
From: Filipe Manana When we remove a directory entry, as part of an unlink operation, if the directory was logged before we must remove the directory index items from the log. We are also updating the inode item of the directory to update its i_size, but that is not necessary because during log r

[PATCH 2/7] btrfs: stop setting nbytes when filling inode item for logging

2021-01-27 Thread fdmanana
From: Filipe Manana When we fill an inode item for logging we are setting its nbytes field with the value returned by inode_get_bytes() (a VFS API), however we do not need it because it is not used during log replay. In fact, for fast fsyncs, when we call inode_get_bytes() we may even get an outd

[PATCH v3] btrfs: fix log replay failure due to race with space cache rebuild

2021-01-22 Thread fdmanana
From: Filipe Manana After a sudden power failure we may end up with a space cache on disk that is not valid and needs to be rebuilt from scratch. If that happens, during log replay when we attempt to pin an extent buffer from a log tree, at btrfs_pin_extent_for_log_replay(), we do not wait for t

[PATCH] btrfs: fix log replay failure due to race with space cache rebuild

2021-01-22 Thread fdmanana
From: Filipe Manana After a sudden power failure we may end up with a space cache on disk that is not valid and needs to be rebuilt from scratch. If that happens, during log replay when we attempt to pin an extent buffer from a log tree, at btrfs_pin_extent_for_log_replay(), we do not wait for t

[PATCH 1/2] btrfs: fix log replay failure due to race with space cache rebuild

2021-01-22 Thread fdmanana
From: Filipe Manana After a sudden power failure we may end up with a space cache on disk that is not valid and needs to be rebuilt from scratch. If that happens, during log replay when we attempt to pin an extent buffer from a log tree, at btrfs_pin_extent_for_log_replay(), we do not wait for t

[PATCH 2/2] btrfs: fix log replay failure when space cache needs to be rebuilt

2021-01-22 Thread fdmanana
From: Filipe Manana During log replay we first start by walking the log trees and pin the ranges for their extent buffers, through calls to the function btrfs_pin_extent_for_log_replay(). However if the space cache for a block group is invalid and needs to be rebuilt, we can fail the log replay

[PATCH 0/2] btrfs: a couple bug fixes for failures of log replay and mount

2021-01-22 Thread fdmanana
From: Filipe Manana This small patchset fixes two bugs that lead to an -EINVAL failure during log replay, causing the filesystem mount to fail. They are relatively new regressions, one caused by the recent change to make space cache loading asynchronous and the other caused by the refactoring tha

[PATCH] btrfs: send, remove stale code when checking for shared extents

2021-01-11 Thread fdmanana
From: Filipe Manana After commit 040ee6120cb670 ("Btrfs: send, improve clone range") we do not use anymore the data_offset field of struct backref_ctx, as after that we do all the necessary checks for the data offset of file extent items at clone_range(). Since there are no more users of data_off

[PATCH] btrfs: test incremental send after cloning extents from the same file

2021-01-11 Thread fdmanana
From: Filipe Manana Test that an incremental send operation correctly issues clone operations for a file that had different parts of one of its extents cloned into itself, at different offsets, and a large part of that extent was overwritten, so all the reflinks only point to subranges of the ext

[PATCH] btrfs: send, fix invalid clone operations when cloning from the same file and root

2021-01-11 Thread fdmanana
From: Filipe Manana When an incremental send finds an extent that is shared, it checks which file extent items in the range refer to that extent, and for those it emits clone operations, while for others it emits regular write operations to avoid corruption at the destination (as described and fi

[PATCH] Btrfs: check for the full sync flag while holding the inode lock during fsync

2019-10-16 Thread fdmanana
From: Filipe Manana We were checking for the full fsync flag in the inode before locking the inode, which is racy, since at that that time it might not be set but after we acquire the inode lock some other task set it. One case where this can happen is on a system low on memory and some concurren

[PATCH] Btrfs: fix qgroup double free after failure to reserve metadata for delalloc

2019-10-15 Thread fdmanana
From: Filipe Manana If we fail to reserve metadata for delalloc operations we end up releasing the previously reserved qgroup amount twice, once explicitly under the 'out_qgroup' label by calling btrfs_qgroup_free_meta_prealloc() and once again, under label 'out_fail', by calling btrfs_inode_rsv_

[PATCH v2] Btrfs: fix negative subv_writers counter and data space leak after buffered write

2019-10-11 Thread fdmanana
From: Filipe Manana When doing a buffered write it's possible to leave the subv_writers counter of the root, used for synchronization between buffered nocow writers and snapshotting. This happens in an exceptional case like the following: 1) We fail to allocate data space for the write, since th

[PATCH] Btrfs: fix negative subv_writers counter and data space leak after buffered write

2019-10-09 Thread fdmanana
From: Filipe Manana When doing a buffered write it's possible to leave the subv_writers counter of the root, used for synchronization between buffered nocow writers and snapshotting. This happens in an exceptional case like the following: 1) We fail to allocate data space for the write, since th

[PATCH] Btrfs: fix metadata space leak on fixup worker failure to set range as delalloc

2019-10-09 Thread fdmanana
From: Filipe Manana In the fixup worker, if we fail to mark the range as delalloc in the io tree, we must release the previously reserved metadata, as well as update the outstanding extents counter for the inode, otherwise we leak metadata space. In pratice we can't return an error from btrfs_se

[PATCH] Btrfs: add missing extents release on file extent cluster relocation error

2019-10-09 Thread fdmanana
From: Filipe Manana If we error out when finding a page at relocate_file_extent_cluster(), we need to release the outstanding extents counter on the relocation inode, set by the previous call to btrfs_delalloc_reserve_metadata(), otherwise the inode's block reserve size can never decrease to zero

[PATCH] Btrfs: fix memory leak due to concurrent append writes with fiemap

2019-09-30 Thread fdmanana
From: Filipe Manana When we have a buffered write that starts at an offset greater than or equals to the file's size happening concurrently with a full ranged fiemap, we can end up leaking an extent state structure. Suppose we have a file with a size of 1Mb, and before the buffered write and fie

[PATCH] Btrfs: fix race setting up and completing qgroup rescan workers

2019-09-24 Thread fdmanana
ately without waiting for the new rescan worker to complete, because fs_info->qgroup_rescan_running is set to false by CPU 2. This race is making test case btrfs/171 (from fstests) to fail often: btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.

[PATCH] btrfs/036: fix sporadic failures when unmounting scratch filesystem

2019-09-24 Thread fdmanana
From: Filipe Manana Often this test can fail on unmount because a 'btrfs subvolume snapshot' command is still running and using the scratch the mount point: btrfs/036 168s ... umount: /home/fdmanana/btrfs-tests/scratch_1: target is busy (In some cases useful info about

[PATCH] Btrfs: fix selftests failure due to uninitialized i_mode in test inodes

2019-09-18 Thread fdmanana
From: Filipe Manana Some of the self tests create a test inode, setup some extents and then do calls to btrfs_get_extent() to test that the corresponding extent maps exist and are correct. However btrfs_get_extent(), since the 5.2 merge window, now errors out when it finds a regular or prealloc e

[PATCH] Btrfs: fix missing error return if writeback for extent buffer never started

2019-09-11 Thread fdmanana
From: Filipe Manana If lock_extent_buffer_for_io() fails, it returns a negative value, but its caller btree_write_cache_pages() ignores such error. This means that a call to flush_write_bio(), from lock_extent_buffer_for_io(), might have failed. We should make btree_write_cache_pages() notice suc

[PATCH] Btrfs: make btrfs_wait_extents() static

2019-09-11 Thread fdmanana
From: Filipe Manana It's not used ouside of transaction.c Signed-off-by: Filipe Manana --- fs/btrfs/transaction.c | 2 +- fs/btrfs/transaction.h | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index e3adb714c04b..84a42e388aa

[PATCH v2] Btrfs: fix unwritten extent buffers and hangs on future writeback attempts

2019-09-11 Thread fdmanana
From: Filipe Manana The lock_extent_buffer_io() returns 1 to the caller to tell it everything went fine and the callers needs to start writeback for the extent buffer (submit a bio, etc), 0 to tell the caller everything went fine but it does not need to start writeback for the extent buffer, and

[PATCH] Btrfs: fix unwritten extent buffers and hangs on future writeback attempts

2019-09-11 Thread fdmanana
From: Filipe Manana The lock_extent_buffer_io() returns 1 to the caller to tell it everything went fine and the callers needs to start writeback for the extent buffer (submit a bio, etc), 0 to tell the caller everything went fine but it does not need to start writeback for the extent buffer, and

[PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction

2019-09-10 Thread fdmanana
From: Filipe Manana Sometimes when fsync'ing a file we need to log that other inodes exist and when we need to do that we acquire a reference on the inodes and then drop that reference using iput() after logging them. That generally is not a problem except if we end up doing the final iput() (dr

[PATCH] btrfs/079: fix failure to umount scratch fs due to running filefrag process

2019-09-10 Thread fdmanana
process to complete first. We need to set a trap for the SIGTERM signal on the subshell so that it waits for any filefrag process before exitting. The failure resulted in error messages like the following: btrfs/079 57s ... umount: /home/fdmanana/btrfs-tests/scratch_1: target is busy (In

[PATCH] btrfs/048: fix test failure when fs mounted with v2 space cache option

2019-09-05 Thread fdmanana
From: Filipe Manana In order to check that the filesystem generation does not change after failure to set a property, the test expects a specific generation number of 7 in its golden output. That currently works except when using the v2 space cache mount option (MOUNT_OPTIONS="-o space_cache=v2")

[PATCH] generic/517: make test work on filesystems with block size greater than 4Kb

2019-08-13 Thread fdmanana
From: Filipe Manana The test currently fails on filesystems with a block size greater than 4Kb, as dedupe operations fail with -EINVAL because the file offsets used are not multiples of such block sizes (but they are multiples of 4Kb, 2Kb and 1Kb). So update the test to use offsets that are mult

[PATCH] Btrfs: fix use-after-free when using the tree modification log

2019-08-12 Thread fdmanana
From: Filipe Manana At ctree.c:get_old_root(), we are accessing a root's header owner field after we have freed the respective extent buffer. This results in an use-after-free that can lead to crashes, and when CONFIG_DEBUG_PAGEALLOC is set, results in a stack trace like the following: [ 3876.

[PATCH] Btrfs: fix sysfs warning and missing raid sysfs directories

2019-08-07 Thread fdmanana
From: Filipe Manana In the 5.3 merge window, commit 7c7e301406d0a9 ("btrfs: sysfs: Replace default_attrs in ktypes with groups"), we started using the member "defaults_groups" for the kobject type "btrfs_raid_ktype". That leads to a series of warnings when running some test cases of fstests, such

[PATCH] Btrfs: make test_find_first_clear_extent_bit fail on incorrect results

2019-08-05 Thread fdmanana
From: Filipe Manana If any call to find_first_clear_extent_bit() returns an unexpected result, the test should fail and not just print an error message, otherwise it makes detection of regressions much harder to notice. Fixes: 1eaebb341d2b41 ("btrfs: Don't trim returned range based on input valu

[PATCH] Btrfs: fix memory leaks in the test test_find_first_clear_extent_bit

2019-08-03 Thread fdmanana
From: Filipe Manana The test creates an extent io tree and sets several ranges with the CHUNK_ALLOCATED and CHUNK_TRIMMED bits, resulting in the allocation of several extent state structures. However the test never clears those ranges, resulting in memory leaks of the extent state structures. Th

[PATCH] Btrfs: fix deadlock between fiemap and transaction commits

2019-07-29 Thread fdmanana
From: Filipe Manana The fiemap handler locks a file range that can have unflushed delalloc, and after locking the range, it tries to attach to a running transaction. If the running transaction started its commit, that is, it is in state TRANS_STATE_COMMIT_START, and either the filesystem was moun

[PATCH] Btrfs-progs: mkfs, fix metadata corruption when using mixed mode

2019-07-25 Thread fdmanana
From: Filipe Manana When creating a filesystem with mixed block groups, we are creating two space info objects to track used/reserved/pinned space, one only for data and another one only for metadata. This is making fstests test case generic/416 fail, with btrfs' check reporting over an hundred

[PATCH] Btrfs: fix race leading to fs corruption after transaction abortion

2019-07-25 Thread fdmanana
From: Filipe Manana When one transaction is finishing its commit, it is possible for another transaction to start and enter its initial commit phase as well. If the first ends up getting aborted, we have a small time window where the second transaction commit does not notice that the previous tra

[PATCH] btrfs: test incremental send after deduplication on both snapshots

2019-07-17 Thread fdmanana
From: Filipe Manana Test that an incremental send operation works after deduplicating into the same file in both the parent and send snapshots. This currently fails on btrfs and a kernel patch to fix it was submitted with the subject: Btrfs: fix incremental send failure after deduplication S

[PATCH] Btrfs: fix incremental send failure after deduplication

2019-07-17 Thread fdmanana
From: Filipe Manana When doing an incremental send operation we can fail if we previously did deduplication operations against a file that exists in both snapshots. In that case we will fail the send operation with -EIO and print a message to dmesg/syslog like the following: BTRFS error (devic

[PATCH] btrfs/189: make the test work on systems with a page size greater than 4Kb

2019-07-05 Thread fdmanana
From: Filipe Manana The test currently uses offsets and lengths which are multiples of 4K, but not multiples of 64K (or any other page size between 4Kb and 64Kb). This makes the reflink calls fail with -EINVAL because reflink only operates on ranges that are aligned to the the filesystem's block

[PATCH v3 2/2] Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents

2019-07-05 Thread fdmanana
From: Filipe Manana When cloning extents (or deduplicating) we create a transaction with a space reservation that considers we will drop or update a single file extent item of the destination inode (that we modify a single leaf). That is fine for the vast majority of scenarios, however it might h

  1   2   3   4   5   6   7   >