Re: [PATCH 3/3] btrfs-progs: do not merge tree block refs have different root_id

2018-04-24 Thread Su Yue
On 04/24/2018 02:50 PM, Qu Wenruo wrote: On 2018年04月24日 14:43, Su Yue wrote: On 04/24/2018 02:17 PM, Qu Wenruo wrote: On 2018年04月24日 13:52, Su Yue wrote: For an extent item which contains many tree block backrefs, like === In

[PATCH V2.1] btrfs-progs: do not merge tree block refs have different root_id

2018-04-24 Thread Su Yue
For an extent item which contains many tree block backrefs, like = In 020-extent-ref-cases/keyed_block_ref.img item 10 key (29470720 METADATA_ITEM 0) itemoff 3450 itemsize 222 refs 23 gen 10 flags TREE_BLOCK

[PATCH V2] btrfs-progs: do not merge tree block refs have different root_id

2018-04-24 Thread Su Yue
For an extent item which contains many tree block backrefs, like = In 020-extent-ref-cases/keyed_block_ref.img item 10 key (29470720 METADATA_ITEM 0) itemoff 3450 itemsize 222 refs 23 gen 10 flags TREE_BLOCK

Re: Inconsistent behavior of fsync in btrfs

2018-04-24 Thread Vijaychidambaram Velayudhan Pillai
Hi Chris, On Tue, Apr 24, 2018 at 10:07 PM, Chris Murphy wrote: > I don't have answer to your question, but I'm curious exactly how you > simulate a crash? For my own really rudimentary testing I've been doing > crazy things like: > > # grub-mkconfig -o /boot/efi && echo

Re: Inconsistent behavior of fsync in btrfs

2018-04-24 Thread Vijaychidambaram Velayudhan Pillai
Hi Chris, We are using software we developed called CrashMonkey [1]. It simulates the state on storage after a crash (taking into accounts FLUSH and FUA flags). Talk slides on how it works can be found here [2]. It is similar to dm-log-writes if you have used that in the past. [1]

Re: Inconsistent behavior of fsync in btrfs

2018-04-24 Thread Chris Murphy
On Tue, Apr 24, 2018 at 8:35 PM, Jayashree Mohan wrote: > Hi, > > While investigating crash consistency bugs on btrfs, we came across > workloads that demonstrate inconsistent behavior of fsync. > > Consider the following workload where fsync on the directory did not

Inconsistent behavior of fsync in btrfs

2018-04-24 Thread Jayashree Mohan
Hi, While investigating crash consistency bugs on btrfs, we came across workloads that demonstrate inconsistent behavior of fsync. Consider the following workload where fsync on the directory did not persist it. Workload 1: mkdir A Sync rename (A, B) creat B/foo fsync B/foo fsync B ---crash---

[PATCH] Btrfs: set keep_lock when necessary in btrfs_defrag_leaves

2018-04-24 Thread Liu Bo
path->keep_lock is set but @path immediatley gets released, this sets ->keep_lock only when it's necessary. Signed-off-by: Liu Bo --- fs/btrfs/tree-defrag.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/tree-defrag.c

[PATCH V4] Btrfs: enchanse raid1/10 balance heuristic

2018-04-24 Thread Timofey Titovets
Currently btrfs raid1/10 balancer bаlance requests to mirrors, based on pid % num of mirrors. Make logic understood: - if one of underline devices are non rotational - Queue leght to underline devices By default try use pid % num_mirrors guessing, but: - If one of mirrors are non rotational,

Re: [PATCH v2] btrfs: print-tree: Add locking status output for debug build

2018-04-24 Thread Qu Wenruo
On 2018年04月24日 22:44, David Sterba wrote: > On Tue, Apr 24, 2018 at 01:03:13PM +0800, Qu Wenruo wrote: >> It's pretty handy if we can get debug output for locking status of an >> extent buffer, specially for race related debugging. >> >> So add the following output for btrfs_print_tree() and >>

[PATCH 2/4] [RESEND] Btrfs: make should_defrag_range() understood compressed extents

2018-04-24 Thread Timofey Titovets
Both, defrag ioctl and autodefrag - call btrfs_defrag_file() for file defragmentation. Kernel default target extent size - 256KiB. Btrfs progs default - 32MiB. Both bigger then maximum size of compressed extent - 128KiB. That lead to rewrite all compressed data on disk. Fix that by check

[PATCH 4/4] [RESEND] Btrfs: reduce size of struct btrfs_inode

2018-04-24 Thread Timofey Titovets
Currently btrfs_inode have size equal 1136 bytes. (On x86_64). struct btrfs_inode store several vars releated to compression code, all states use 1 or 2 bits. Lets declare bitfields for compression releated vars, to reduce sizeof btrfs_inode to 1128 bytes. Signed-off-by: Timofey Titovets

[PATCH 1/4] [RESEND] Btrfs: btrfs_dedupe_file_range() ioctl, remove 16MiB restriction

2018-04-24 Thread Timofey Titovets
At now btrfs_dedupe_file_range() restricted to 16MiB range for limit locking time and memory requirement for dedup ioctl() For too big input range code silently set range to 16MiB Let's remove that restriction by do iterating over dedup range. That's backward compatible and will not change

[PATCH 0/4] [RESEND] Btrfs: just bunch of patches to ioctl.c

2018-04-24 Thread Timofey Titovets
1st patch, remove 16MiB restriction from extent_same ioctl(), by doing iterations over passed range. I did not see much difference in performance, so it's just remove logic restriction. 2-3 pathes, update defrag ioctl(): - Fix bad behaviour with full rewriting all compressed extents in

[PATCH 3/4] [RESEND] Btrfs: allow btrfs_defrag_file() uncompress files on defragmentation

2018-04-24 Thread Timofey Titovets
Currently defrag ioctl only support recompress files with specified compression type. Allow set compression type to none, while call defrag, and use BTRFS_DEFRAG_RANGE_COMPRESS as flag, that user request change of compression type. Signed-off-by: Timofey Titovets ---

Re: [PATCH] btrfs: push relocation recovery into a helper thread

2018-04-24 Thread Jeff Mahoney
On 4/23/18 5:43 PM, David Sterba wrote: > On Tue, Apr 17, 2018 at 02:45:33PM -0400, Jeff Mahoney wrote: >> On a file system with many snapshots and qgroups enabled, an interrupted >> balance can end up taking a long time to mount due to recovering the >> relocations during mount. It does this in

Re: [PATCH] btrfs: update uuid_mutex and device_list_mutex comments

2018-04-24 Thread David Sterba
On Wed, Apr 18, 2018 at 05:56:31PM +0800, Anand Jain wrote: > @@ -155,29 +155,26 @@ static int __btrfs_map_block(struct btrfs_fs_info > *fs_info, > * > * uuid_mutex (global lock) > * > - * protects the fs_uuids list that tracks all per-fs fs_devices, resulting >

Re: [PATCH v2] btrfs: print-tree: Add locking status output for debug build

2018-04-24 Thread David Sterba
On Tue, Apr 24, 2018 at 01:03:13PM +0800, Qu Wenruo wrote: > It's pretty handy if we can get debug output for locking status of an > extent buffer, specially for race related debugging. > > So add the following output for btrfs_print_tree() and > btrfs_print_leaf(): > - refs > - write_locks (as

Re: [PATCH v2] btrfs: Unexport btrfs_alloc_delalloc_work

2018-04-24 Thread David Sterba
On Tue, Apr 24, 2018 at 05:23:59PM +0300, Nikolay Borisov wrote: > It's used only in inode.c so makes no sense to have it exported. Also > move the definition of btrfs_delalloc_work to inode.c since it's used > only this file. > > Signed-off-by: Nikolay Borisov Reviewed-by:

[PATCH v2] btrfs: Unexport btrfs_alloc_delalloc_work

2018-04-24 Thread Nikolay Borisov
It's used only in inode.c so makes no sense to have it exported. Also move the definition of btrfs_delalloc_work to inode.c since it's used only this file. Signed-off-by: Nikolay Borisov --- fs/btrfs/ctree.h | 9 - fs/btrfs/inode.c | 8 2 files changed, 8

Re: Directory entry not persisted on a fsync

2018-04-24 Thread Vijay Chidambaram
Hi all, Any thoughts on this? We completely understand you are all busy and might be traveling, so we only need a simple ack from you: that when we fsync a directory in btrfs, we can expect the contents to get persisted. We understand that is not your highest priority item, and that you will fix

[PATCH v2 4/8] btrfs: Open-code add_delayed_tree_ref

2018-04-24 Thread Nikolay Borisov
Now that the initialization part and the critical section code have been split it's a lot easier to open code add_delayed_tree_ref. Do so in the following manner: 1. The commin init code is put immediately after memory-to-be-init is allocate, followed by the ref-specific member initialization.

[PATCH v2 1/8] btrfs: Factor out common delayed refs init code

2018-04-24 Thread Nikolay Borisov
THe majority of the init code for struct btrfs_delayed_ref_node is duplicated in add_delayed_data_ref and add_delayed_tree_ref. Factor out the common bits in init_delayed_ref_common. This function is going to be used in future patches to clean that up. No functional changes Signed-off-by: Nikolay

[PATCH v2 7/8] btrfs: Use init_delayed_ref_head in add_delayed_ref_head

2018-04-24 Thread Nikolay Borisov
Use the newly introduced function when initialising the head_ref in add_delayed_ref_head. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/delayed-ref.c | 63 -- 1 file changed, 4 insertions(+), 59 deletions(-)

[PATCH v2 2/8] btrfs: Use init_delayed_ref_common in add_delayed_tree_ref

2018-04-24 Thread Nikolay Borisov
Use the newly introduced common helper. No functional changes Signed-off-by: Nikolay Borisov --- fs/btrfs/delayed-ref.c | 35 +++ 1 file changed, 11 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c

[PATCH v2 3/8] btrfs: Use init_delayed_ref_common in add_delayed_data_ref

2018-04-24 Thread Nikolay Borisov
Use the newly introduced helper and remove the duplicate code. No functional changes Signed-off-by: Nikolay Borisov --- fs/btrfs/delayed-ref.c | 34 ++ 1 file changed, 10 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/delayed-ref.c

[PATCH v2 5/8] btrfs: Open-code add_delayed_data_ref

2018-04-24 Thread Nikolay Borisov
Now that the initialization part and the critical section code have been split it's a lot easier to open code add_delayed_data_ref. Do so in the following manner: 1. The common init function is put immediately after memory-to-be-init is allocated, followed by the specific data ref initialization.

[PATCH v2 8/8] btrfs: split delayed ref head initialization and addition

2018-04-24 Thread Nikolay Borisov
add_delayed_ref_head really performed 2 independent operations - initialisting the ref head and adding it to a list. Now that the init part is in a separate function let's complete the separation between both operations. This results in a lot simpler interface for add_delayed_ref_head since the

[PATCH v2 6/8] btrfs: Introduce init_delayed_ref_head

2018-04-24 Thread Nikolay Borisov
add_delayed_ref_head implements the logic to both initialize a head_ref structure as well as perform the necessary operations to add it to the delayed ref machinery. This has resulted in a very cumebrsome interface with loads of parameters and code, which at first glance, looks very unwieldy.

Re: [PATCH v2 4/4] btrfs: Do super block verification before writing it to disk

2018-04-24 Thread David Sterba
On Tue, Apr 24, 2018 at 12:48:09PM +0800, Qu Wenruo wrote: > -static int btrfs_validate_super(struct btrfs_fs_info *fs_info) > +/* > + * Check the validation of btrfs super block. > + * > + * @sb: super block to check > + * @super_mirror:the super block number to check its

Re: [PATCH 0/5] Remove delay_iput parameter when running delalloc work

2018-04-24 Thread David Sterba
On Mon, Apr 23, 2018 at 12:31:17PM +0300, Nikolay Borisov wrote: > > > On 23.04.2018 12:27, Qu Wenruo wrote: > > > > > > On 2018年04月23日 15:54, Nikolay Borisov wrote: > >> While trying to make sense of the lifecycle of delayed iputs it became > >> apparent > >> that the delay_iput parameter of

Re: [PATCH 3/5] btrfs: Remove delay_iput parameter from __start_delalloc_inodes

2018-04-24 Thread David Sterba
On Mon, Apr 23, 2018 at 10:54:15AM +0300, Nikolay Borisov wrote: > It's always set to 0 so remove it > > Signed-off-by: Nikolay Borisov > --- > fs/btrfs/inode.c | 14 +- > 1 file changed, 5 insertions(+), 9 deletions(-) > > diff --git a/fs/btrfs/inode.c

Re: [PATCH 5/5] btrfs: Unexport btrfs_alloc_delalloc_work

2018-04-24 Thread Nikolay Borisov
On 24.04.2018 16:22, David Sterba wrote: > On Mon, Apr 23, 2018 at 10:54:17AM +0300, Nikolay Borisov wrote: >> It's used only in inode.c so makes no sense to have it exported. >> >> Signed-off-by: Nikolay Borisov >> --- >> fs/btrfs/ctree.h | 2 -- >> 1 file changed, 2

Re: [PATCH 2/3] btrfs: pass only eb to num_extent_pages

2018-04-24 Thread Nikolay Borisov
On 24.04.2018 02:03, David Sterba wrote: > Almost all callers pass the start and len as 2 arguments but this is not > necessary, all the information is provided by the eb. By reordering the > calls to num_extent_pages, we don't need the local variables with > start/len. > > Signed-off-by: David

Re: [PATCH 3/3] btrfs: switch types to int when counting eb pages

2018-04-24 Thread Nikolay Borisov
On 24.04.2018 02:03, David Sterba wrote: > The loops iterating eb pages use unsigned long, that's an overkill as > we know that there are at most 16 pages (64k / 4k), and 4 by default > (with nodesize 16k). > > Signed-off-by: David Sterba Reviewed-by: Nikolay Borisov

Re: [PATCH 5/5] btrfs: Unexport btrfs_alloc_delalloc_work

2018-04-24 Thread David Sterba
On Mon, Apr 23, 2018 at 10:54:17AM +0300, Nikolay Borisov wrote: > It's used only in inode.c so makes no sense to have it exported. > > Signed-off-by: Nikolay Borisov > --- > fs/btrfs/ctree.h | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/fs/btrfs/ctree.h

[PATCH v2 3/3] btrfs: replace waitqueue_actvie with cond_wake_up

2018-04-24 Thread David Sterba
Use the wrappers and reduce the amount of low-level details about the waitqueue management. Signed-off-by: David Sterba --- fs/btrfs/compression.c | 7 +-- fs/btrfs/delayed-inode.c | 9 +++-- fs/btrfs/dev-replace.c | 10 -- fs/btrfs/extent-tree.c | 7

[PATCH v2 0/3] Cleanup waitqueue_active and barriers

2018-04-24 Thread David Sterba
Reduce number of standalone barriers before waitqueue_active calls. Changes v2: * add 2 barriers to btrfs_sync_log and do not assume they're implied, (pointed out by Nikolay) git://github.com/kdave/btrfs-devel.git cleanup/cond-wake David Sterba (3): btrfs: introduce conditional wakeup

[PATCH v2 2/3] btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups

2018-04-24 Thread David Sterba
Currently the code assumes that there's an implied barrier by the sequence of code preceding the wakeup, namely the mutex unlock. As Nikolay pointed out: I think this is wrong (not your code) but the original assumption that the RELEASE semantics provided by mutex_unlock is sufficient. According

[PATCH v2 1/3] btrfs: introduce conditional wakeup helpers

2018-04-24 Thread David Sterba
Add convenience wrappers for the waitqueue management that involves memory barriers to prevent deadlocks. The helpers will let us remove barriers and the necessary comments in several places. Reviewed-by: Nikolay Borisov Signed-off-by: David Sterba ---

Re: [PATCH v2 2/4] btrfs: Add incompat flags check for btrfs_check_super_valid()

2018-04-24 Thread Qu Wenruo
On 2018年04月24日 19:30, David Sterba wrote: > On Tue, Apr 24, 2018 at 07:28:27PM +0800, Qu Wenruo wrote: >>> I've read the discussion under previous version again, IMHO the best way >>> to report what's going on is to use 2 functions for mount ant pre-commit >>> time. >> >> OK, next version will

Btrfs progs release 4.16.1

2018-04-24 Thread David Sterba
Hi, btrfs-progs version 4.16.1 have been released. This is a bugfix release. Changes: * remove obsolete tools: btrfs-debug-tree, btrfs-zero-log, btrfs-show-super, btrfs-calc-size * sb-mod: new debugging tool to edit superblock items * mkfs: detect if thin-provisioned device does not

Re: [PATCH v2 2/4] btrfs: Add incompat flags check for btrfs_check_super_valid()

2018-04-24 Thread David Sterba
On Tue, Apr 24, 2018 at 07:28:27PM +0800, Qu Wenruo wrote: > > I've read the discussion under previous version again, IMHO the best way > > to report what's going on is to use 2 functions for mount ant pre-commit > > time. > > OK, next version will go that direction. > > Although it may still be

Re: [PATCH v2 2/4] btrfs: Add incompat flags check for btrfs_check_super_valid()

2018-04-24 Thread Qu Wenruo
On 2018年04月24日 18:48, David Sterba wrote: > On Tue, Apr 24, 2018 at 12:48:07PM +0800, Qu Wenruo wrote: >> Although we have already checked incompat flags manually before really >> mounting it, we could still enhance btrfs_check_super_valid() to check >> incompat flags for later write time super

Re: [PATCH] btrfs-progs: treewide: Replace strerror(errno) with %m.

2018-04-24 Thread David Sterba
On Tue, Apr 24, 2018 at 10:52:41AM +0800, Su Yue wrote: > > > On 01/24/2018 03:42 AM, David Sterba wrote: > > On Sun, Jan 07, 2018 at 01:54:21PM -0800, Rosen Penev wrote: > >> As btrfs is specific to Linux, %m can be used instead of strerror(errno) > >> in format strings. This has some size

Re: [PATCH] btrfs: add verify chattr support for send/receive test

2018-04-24 Thread Eryu Guan
[adding linux-btrfs list to cc] On Tue, Apr 17, 2018 at 04:44:42PM -0700, Howard McLauchlan wrote: > This test aims to verify correct behaviour with chattr operations and > btrfs send/receive. The intent is to check general correctness as well > as special interactions with troublesome

Re: [PATCH v2 2/4] btrfs: Add incompat flags check for btrfs_check_super_valid()

2018-04-24 Thread David Sterba
On Tue, Apr 24, 2018 at 12:48:07PM +0800, Qu Wenruo wrote: > Although we have already checked incompat flags manually before really > mounting it, we could still enhance btrfs_check_super_valid() to check > incompat flags for later write time super block validation check. But the calls are in

Re: [PATCH 1/3] btrfs: simplify counting number of eb pages

2018-04-24 Thread Qu Wenruo
On 2018年04月24日 18:29, David Sterba wrote: > On Tue, Apr 24, 2018 at 02:22:15PM +0800, Qu Wenruo wrote: >> >> >> On 2018年04月24日 13:59, Nikolay Borisov wrote: >>> >>> >>> On 24.04.2018 02:03, David Sterba wrote: The eb length is nodesize, as initialized in __alloc_extent_buffer.

Re: [PATCH 1/3] btrfs: simplify counting number of eb pages

2018-04-24 Thread David Sterba
On Tue, Apr 24, 2018 at 02:22:15PM +0800, Qu Wenruo wrote: > > > On 2018年04月24日 13:59, Nikolay Borisov wrote: > > > > > > On 24.04.2018 02:03, David Sterba wrote: > >> The eb length is nodesize, as initialized in __alloc_extent_buffer. > >> Regardless of start, we should always get the same

Re: [PATCH 1/3] btrfs-progs: remove comments about delayed ref in backref.c

2018-04-24 Thread Qu Wenruo
On 2018年04月24日 14:48, Su Yue wrote: > > > On 04/24/2018 02:02 PM, Qu Wenruo wrote: >> >> >> On 2018年04月24日 13:52, Su Yue wrote: >>> There is no delayed ref in btrfs-progs, so remove related comments. >>> >> >> Indeed. >> Delayed ref is only used to speed up extent tree modification with the >>

Re: [PATCH 3/3] btrfs-progs: do not merge tree block refs have different root_id

2018-04-24 Thread Qu Wenruo
On 2018年04月24日 14:43, Su Yue wrote: > > > On 04/24/2018 02:17 PM, Qu Wenruo wrote: >> >> >> On 2018年04月24日 13:52, Su Yue wrote: >>> For an extent item which contains many tree block backrefs, like >>> === >>> In

Re: [PATCH 1/3] btrfs-progs: remove comments about delayed ref in backref.c

2018-04-24 Thread Su Yue
On 04/24/2018 02:02 PM, Qu Wenruo wrote: On 2018年04月24日 13:52, Su Yue wrote: There is no delayed ref in btrfs-progs, so remove related comments. Indeed. Delayed ref is only used to speed up extent tree modification with the cost of code complexity. Thanks for your explanation :). For

Re: [PATCH 3/3] btrfs-progs: do not merge tree block refs have different root_id

2018-04-24 Thread Su Yue
On 04/24/2018 02:17 PM, Qu Wenruo wrote: On 2018年04月24日 13:52, Su Yue wrote: For an extent item which contains many tree block backrefs, like = In 020-extent-ref-cases/keyed_block_ref.img item 10 key (29470720 METADATA_ITEM 0)

Re: [PATCH 1/3] btrfs: simplify counting number of eb pages

2018-04-24 Thread Qu Wenruo
On 2018年04月24日 13:59, Nikolay Borisov wrote: > > > On 24.04.2018 02:03, David Sterba wrote: >> The eb length is nodesize, as initialized in __alloc_extent_buffer. >> Regardless of start, we should always get the same number of pages, so >> use that fact. >> >> Signed-off-by: David Sterba

Re: [PATCH 3/3] btrfs-progs: do not merge tree block refs have different root_id

2018-04-24 Thread Qu Wenruo
On 2018年04月24日 13:52, Su Yue wrote: > For an extent item which contains many tree block backrefs, like > = > In 020-extent-ref-cases/keyed_block_ref.img > > item 10 key (29470720 METADATA_ITEM 0) itemoff 3450 itemsize 222 >

Re: [PATCH 2/3] btrfs-progs: remove useless branch in __merge_refs

2018-04-24 Thread Qu Wenruo
On 2018年04月24日 13:52, Su Yue wrote: > After call of ref_for_same_block, ref1->parent must equals to > ref2->parent, the block of exchange is never reached. > > So remove the block of exchange. Reviewed-by: Qu Wenruo The patch looks good, but considering how much difference the

Re: [PATCH 1/3] btrfs-progs: remove comments about delayed ref in backref.c

2018-04-24 Thread Qu Wenruo
On 2018年04月24日 13:52, Su Yue wrote: > There is no delayed ref in btrfs-progs, so remove related comments. > Indeed. Delayed ref is only used to speed up extent tree modification with the cost of code complexity. For btrfs-progs we don't need to worry about it at all. Thanks, Qu >