Re: [PATCH] Btrfs: use nofs context when initializing security xattrs to avoid deadlock

2018-12-11 Thread Nikolay Borisov
On 10.12.18 г. 19:53 ч., fdman...@kernel.org wrote: > From: Filipe Manana > > When initializing the security xattrs, we are holding a transaction handle > therefore we need to use a GFP_NOFS context in order to avoid a deadlock > with reclaim in case it's triggered. > > Fixes: 39a27ec1004e8 (

RE: [PATCH RESEND 0/8] btrfs-progs: sub: Relax the privileges of "subvolume list/show"

2018-12-11 Thread misono.tomoh...@fujitsu.com
> -Original Message- > From: Omar Sandoval [mailto:osan...@osandov.com] > Sent: Friday, December 7, 2018 10:02 AM > To: Misono, Tomohiro/味曽野 智礼 > Cc: linux-btrfs@vger.kernel.org > Subject: Re: [PATCH RESEND 0/8] btrfs-progs: sub: Relax the privileges of > "subvolume list/show" > > On Tue,

Re: [PATCH 3/8] btrfs: don't use global rsv for chunk allocation

2018-12-11 Thread Nikolay Borisov
On 3.12.18 г. 17:24 ч., Josef Bacik wrote: > The should_alloc_chunk code has math in it to decide if we're getting > short on space and if we should go ahead and pre-emptively allocate a > new chunk. Previously when we did not have the delayed_refs_rsv, we had > to assume that the global block

Re: [PATCH 4/8] btrfs: add ALLOC_CHUNK_FORCE to the flushing code

2018-12-11 Thread Nikolay Borisov
On 3.12.18 г. 17:24 ч., Josef Bacik wrote: > With my change to no longer take into account the global reserve for > metadata allocation chunks we have this side-effect for mixed block > group fs'es where we are no longer allocating enough chunks for the > data/metadata requirements. To deal wit

[PATCH v2] Btrfs: send, fix race with transaction commits that create snapshots

2018-12-11 Thread fdmanana
From: Filipe Manana If we create a snapshot of a snapshot currently being used by a send operation, we can end up with send failing unexpectedly (returning -ENOENT error to user space for example). The following diagram shows how this happens. CPU 1

Re: Kernel traces

2018-12-11 Thread Stephen R. van den Berg
Chris Murphy wrote: >I suggest reproducing the problem and issuing sysrq+w and then post >the entire resulting output for a developer to evaluate. I find it's I'll give that a try. >I see this is btrfs-receive workload, so I wouldn't guess it's >suvolume lock contention unless the contention is h

Re: [PATCH 5/8] btrfs: don't enospc all tickets on flush failure

2018-12-11 Thread Nikolay Borisov
On 3.12.18 г. 17:24 ч., Josef Bacik wrote: > With the introduction of the per-inode block_rsv it became possible to > have really really large reservation requests made because of data > fragmentation. Since the ticket stuff assumed that we'd always have > relatively small reservation requests

[PATCH v3] btrfs: improve error handling of btrfs_add_link()

2018-12-11 Thread Johannes Thumshirn
err holds the return value of either btrfs_del_root_ref() or btrfs_del_inode_ref() but it hasn't been checked since it's introduction with commit fe66a05a0679 (Btrfs: improve error handling for btrfs_insert_dir_item callers) in 2012. To quote David: "If the error handling in the error handling fa

Re: Kernel traces

2018-12-11 Thread Tomasz Chmielewski
(The most recent ones are from v4.19.7): Please, don't run 4.19.x lower than v4.19.8. It will likely eat your filesystem. Reference: https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.19.8-Released With btrfs, the chances are that data and metadata checksumming will detect the in

Re: [PATCH] btrfs: extent-tree: cleanup one-shot usage of @blocksize in do_walk_down

2018-12-11 Thread David Sterba
On Mon, Dec 10, 2018 at 03:01:03PM +0800, Qu Wenruo wrote: > @blocksize variable in do_walk_down() is only used once, really no need > to declare it. > > Signed-off-by: Qu Wenruo Added to misc-next, thanks.

Re: [PATCH 4/8] btrfs: add ALLOC_CHUNK_FORCE to the flushing code

2018-12-11 Thread David Sterba
On Tue, Dec 11, 2018 at 12:08:23PM +0200, Nikolay Borisov wrote: > > > On 3.12.18 г. 17:24 ч., Josef Bacik wrote: > > With my change to no longer take into account the global reserve for > > metadata allocation chunks we have this side-effect for mixed block > > group fs'es where we are no longer

Re: [PATCH 4/8] btrfs: add ALLOC_CHUNK_FORCE to the flushing code

2018-12-11 Thread Nikolay Borisov
On 11.12.18 г. 18:47 ч., David Sterba wrote: > On Tue, Dec 11, 2018 at 12:08:23PM +0200, Nikolay Borisov wrote: >> >> >> On 3.12.18 г. 17:24 ч., Josef Bacik wrote: >>> With my change to no longer take into account the global reserve for >>> metadata allocation chunks we have this side-effect for

Re: [PATCH 7/8] btrfs: be more explicit about allowed flush states

2018-12-11 Thread David Sterba
On Mon, Dec 03, 2018 at 10:24:58AM -0500, Josef Bacik wrote: > For FLUSH_LIMIT flushers (think evict, truncate) we can deadlock when > running delalloc because we may be holding a tree lock. We can also > deadlock with delayed refs rsv's that are running via the committing > mechanism. The only s

Re: [PATCH 4/8] btrfs: add ALLOC_CHUNK_FORCE to the flushing code

2018-12-11 Thread David Sterba
On Tue, Dec 11, 2018 at 06:51:34PM +0200, Nikolay Borisov wrote: > > > On 11.12.18 г. 18:47 ч., David Sterba wrote: > > On Tue, Dec 11, 2018 at 12:08:23PM +0200, Nikolay Borisov wrote: > >> > >> > >> On 3.12.18 г. 17:24 ч., Josef Bacik wrote: > >>> With my change to no longer take into account th

[PATCH] btrfs: raid56: data corruption on a device removal

2018-12-11 Thread Dmitriy Gorokh
I found that RAID5 or RAID6 filesystem might be got corrupted in the following scenario: 1. Create 4 disks RAID6 filesystem 2. Preallocate 16 10Gb files 3. Run fio: 'fio --name=testload --directory=./ --size=10G --numjobs=16 --bs=64k --iodepth=64 --rw=randrw --verify=sha256 --time_based --runtim

[PATCH v3 2/7] btrfs: qgroup: Don't trigger backref walk at delayed ref insert time

2018-12-11 Thread Qu Wenruo
[BUG] Since fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans"), kernel may lockup with quota enabled. There is one backref trace triggered by snapshot dropping along with write operation in the source subvolume. The example can be stably reproduced. btrf

[PATCH v3 4/7] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap()

2018-12-11 Thread Qu Wenruo
Refactor btrfs_qgroup_trace_subtree_swap() into qgroup_trace_subtree_swap(), which only needs two extent buffer and some other bool to control the behavior. This provides the basis for later delayed subtree scan work. Signed-off-by: Qu Wenruo --- fs/btrfs/qgroup.c | 78 +

[PATCH v3 1/7] btrfs: qgroup: Move reserved data account from btrfs_delayed_ref_head to btrfs_qgroup_extent_record

2018-12-11 Thread Qu Wenruo
[BUG] Btrfs/139 will fail with a pretty high possibility if the testing machine (VM) only has 2G ram. Resulting the final write success while it should fail due to EDQUOT, and the result fs will has quota exceeding the limit by 16K. The simplified reproducer will be: (needs a 2G ram VM) mkfs.b

[PATCH v3 3/7] btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots()

2018-12-11 Thread Qu Wenruo
Relocation code will drop btrfs_root::reloc_root as soon as merge_reloc_root() finishes. However later qgroup code will need to access btrfs_root::reloc_root after merge_reloc_root() for delayed subtree rescan. So alter the timming of resetting btrfs_root:::reloc_root, make it happens after trans

[PATCH v3 0/7] btrfs: qgroup: Delay subtree scan to reduce overhead

2018-12-11 Thread Qu Wenruo
This patchset can be fetched from github: https://github.com/adam900710/linux/tree/qgroup_delayed_subtree Which is based on v4.20-rc1. This patch address the heavy load subtree scan, but delaying it until we're going to modify the swapped tree block. The overall workflow is: 1) Record the subtr

[PATCH v3 6/7] btrfs: qgroup: Use delayed subtree rescan for balance

2018-12-11 Thread Qu Wenruo
Before this patch, qgroup code trace the whole subtree of file and reloc trees unconditionally. This makes qgroup numbers consistent, but it could cause tons of unnecessary extent trace, which cause a lot of overhead. However for subtree swap of balance, since both subtree contains the same conte

[PATCH v3 7/7] btrfs: qgroup: Cleanup old subtree swap code

2018-12-11 Thread Qu Wenruo
Since it's replaced by new delayed subtree swap code, remove the original code. The cleanup is small since most of its core function is still used by delayed subtree swap trace. Signed-off-by: Qu Wenruo --- fs/btrfs/qgroup.c | 94 --- fs/btrfs/qgroup.

[PATCH v3 5/7] btrfs: qgroup: Introduce per-root swapped blocks infrastructure

2018-12-11 Thread Qu Wenruo
To allow delayed subtree swap rescan, btrfs needs to record per-root info about which tree blocks get swapped. So this patch introduces per-root btrfs_qgroup_swapped_blocks structure, which records which tree blocks get swapped. The designed workflow will be: 1) Record the subtree root block get

About enum convert (Old "Re: [PATCH 1/8] btrfs: delayed-ref: Introduce better documented delayed ref structures")

2018-12-11 Thread Qu Wenruo
On 2018/12/10 下午5:48, Nikolay Borisov wrote: > > [snip] > > IMO it makes sense in this series to have a patch which converts the > action defines to an enum and subsequently modify functions/structs to > actually be of enum type. > I'm completely OK to convert it to enum, for its conflict f

Re: [PATCH 3/8] btrfs: delayed-ref: Use btrfs_ref to refactor btrfs_add_delayed_tree_ref()

2018-12-11 Thread Qu Wenruo
On 2018/12/10 下午5:21, Nikolay Borisov wrote: > > > On 6.12.18 г. 8:58 ч., Qu Wenruo wrote: >> btrfs_add_delayed_tree_ref() has a longer and longer parameter list, and >> some caller like btrfs_inc_extent_ref() are using @owner as level for >> delayed tree ref. >> >> Instead of making the param

Re: Kernel traces

2018-12-11 Thread Chris Murphy
Also, what scheduler are you using? And do you get different results with a different one (better or worse)? Chris Murphy

Re: Kernel traces

2018-12-11 Thread Stephen R. van den Berg
Chris Murphy wrote: >Also, what scheduler are you using? And do you get different results >with a different one (better or worse)? I'm using CFQ, and I don't think I ever tried a different one. But, btrfs should be compatible with all schedulers. -- Stephen.

[PATCH v2 0/8] btrfs: Refactor delayed ref parameter list

2018-12-11 Thread Qu Wenruo
This patchset can be fetched from github: https://github.com/adam900710/linux/tree/refactor_delayed_ref_parameter Which is based on previous delayed subtree scan patchset. (https://github.com/adam900710/linux/tree/qgroup_delayed_subtree) Current delayed ref interface has several problems: - Long

[PATCH v2 1/8] btrfs: delayed-ref: Introduce better documented delayed ref structures

2018-12-11 Thread Qu Wenruo
Current delayed ref interface has several problems: - Longer and longer parameter lists bytenr num_bytes parent -- so far so good ref_root owner offset -- I don't feel good now - Different interpretation for the same parameter Above @owner for data ref is inode nu

[PATCH v2 2/8] btrfs: extent-tree: Open-code process_func in __btrfs_mod_ref

2018-12-11 Thread Qu Wenruo
The process_func is never a function hook used anywhere else. Open code it to make later delayed ref refactor easier, so we can refactor btrfs_inc_extent_ref() and btrfs_free_extent() in different patches. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- fs/btrfs/extent-tree.c | 30 ++

[PATCH v2 3/8] btrfs: delayed-ref: Use btrfs_ref to refactor btrfs_add_delayed_tree_ref()

2018-12-11 Thread Qu Wenruo
btrfs_add_delayed_tree_ref() has a longer and longer parameter list, and some caller like btrfs_inc_extent_ref() are using @owner as level for delayed tree ref. Instead of making the parameter list longer and longer, use btrfs_ref to refactor it, so each parameter assignment should be self-explain

[PATCH v2 8/8] btrfs: extent-tree: Use btrfs_ref to refactor btrfs_free_extent()

2018-12-11 Thread Qu Wenruo
Similar to btrfs_inc_extent_ref(), just use btrfs_ref to replace the long parameter list and the confusing @owner parameter. Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 5 +--- fs/btrfs/extent-tree.c | 52 +++--- fs/btrfs/file.c| 22

[PATCH v2 6/8] btrfs: extent-tree: Use btrfs_ref to refactor add_pinned_bytes()

2018-12-11 Thread Qu Wenruo
Since add_pinned_bytes() only needs to know if the extent is metadata and if it's a chunk tree extent, btrfs_ref is a perfect match for it, as we don't need various owner/level trick to determine extent type. Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- fs/btrfs/extent-tree.c | 26

[PATCH v2 5/8] btrfs: ref-verify: Use btrfs_ref to refactor btrfs_ref_tree_mod()

2018-12-11 Thread Qu Wenruo
It's a perfect match for btrfs_ref_tree_mod() to use btrfs_ref, as btrfs_ref describes a metadata/data reference update comprehensively. Now we have one less function use confusing owner/level trick. Signed-off-by: Qu Wenruo --- fs/btrfs/extent-tree.c | 27 +++-- fs/btrfs/ref-ve

[PATCH v2 4/8] btrfs: delayed-ref: Use btrfs_ref to refactor btrfs_add_delayed_data_ref()

2018-12-11 Thread Qu Wenruo
Just like btrfs_add_delayed_tree_ref(), use btrfs_ref to refactor btrfs_add_delayed_data_ref(). Signed-off-by: Qu Wenruo --- fs/btrfs/delayed-ref.c | 18 +- fs/btrfs/delayed-ref.h | 7 +++ fs/btrfs/extent-tree.c | 23 ++- 3 files changed, 26 insertions(+)

[PATCH v2 7/8] btrfs: extent-tree: Use btrfs_ref to refactor btrfs_inc_extent_ref()

2018-12-11 Thread Qu Wenruo
Now we don't need to play the dirty game of reusing @owner for tree block level. Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 5 ++-- fs/btrfs/extent-tree.c | 57 -- fs/btrfs/file.c| 17 + fs/btrfs/inode.c | 10 +---

[PATCH 1/3] btrfs: Remove unused arguments from btrfs_get_extent_fiemap

2018-12-11 Thread Nikolay Borisov
This function is a simple wrapper over btrfs_get_extent that returns either: a) A real extent in the passed range or b) Adjusted extent based on whether delalloc bytes are found backing up a hole. To support these semantics it doesn't need the page/pg_offset/create arguments which are passed to b

[PATCH 3/3] btrfs: Remove redundant assignment

2018-12-11 Thread Nikolay Borisov
hole_len is only used if the hole falls within the requested range. Make that explicitly clear by only assigning in the corresponding branch. Signed-off-by: Nikolay Borisov --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.

[PATCH 0/3] Cleanups around btrfs_get_extent_fiemap

2018-12-11 Thread Nikolay Borisov
Following a conversation with Johaness this is what fell out. Turns out the signature of btrfs_get_extent_fiemap is needlessly complext. So the first patch fixes it by removing the unnecessary arguments. Patch 2 is a bit of a "catch-all" mainly renaming variables, thus helping recognise what the

[PATCH 2/3] btrfs: Refactor btrfs_get_extent_fiemap

2018-12-11 Thread Nikolay Borisov
Make btrfs_get_extent_fiemap a bit more friendly. First step is to rename the closely related, yet arbitrary named range_start/found_end/found variables. They define the delalloc range that is found in case a real extent wasn't found. Subsequently remove an unnecessary check for hole_em since it's