Re: [PATCH RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount

2019-02-11 Thread Qu Wenruo
On 2019/2/12 下午3:55, Remi Gauvin wrote: > On 2019-02-12 2:47 a.m., Qu Wenruo wrote: >> >> >> Consider this use case: >> >> One btrfs with 2 devices, RAID1 for data and metadata. >> >> One day devid 2 got failure, and before replacement arrives, user can >> only use devid 1 alone. (Maybe that's th

Re: [PATCH RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount

2019-02-11 Thread Remi Gauvin
On 2019-02-12 2:47 a.m., Qu Wenruo wrote: > > > Consider this use case: > > One btrfs with 2 devices, RAID1 for data and metadata. > > One day devid 2 got failure, and before replacement arrives, user can > only use devid 1 alone. (Maybe that's the root fs). > > Then new disk arrived, user repl

Re: [PATCH RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount

2019-02-11 Thread Qu Wenruo
On 2019/2/12 下午3:43, Remi Gauvin wrote: > On 2019-02-12 2:22 a.m., Qu Wenruo wrote: > >>> Does this mean you would rely on scrub/CSUM to repair the missing data >>> if device is restored? >> >> Yes, just as btrfs usually does. >> > > I don't really understand the implications of the problems wi

Re: [PATCH RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount

2019-02-11 Thread Remi Gauvin
On 2019-02-12 2:22 a.m., Qu Wenruo wrote: >> Does this mean you would rely on scrub/CSUM to repair the missing data >> if device is restored? > > Yes, just as btrfs usually does. > I don't really understand the implications of the problems with mounting fs when single/dup data chunk are allocat

Re: [PATCH RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount

2019-02-11 Thread Qu Wenruo
On 2019/2/12 下午3:20, Remi Gauvin wrote: > On 2019-02-12 2:03 a.m., Qu Wenruo wrote: > >> So we only need to consider missing devices as writable, and calculate >> our chunk allocation profile with missing devices too. >> >> Then every thing should work as expected, without annoying SINGLE/DUP >>

Re: [PATCH RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount

2019-02-11 Thread Remi Gauvin
On 2019-02-12 2:03 a.m., Qu Wenruo wrote: > So we only need to consider missing devices as writable, and calculate > our chunk allocation profile with missing devices too. > > Then every thing should work as expected, without annoying SINGLE/DUP > chunks blocking later degraded mount. > > Does

[PATCH RFC] btrfs: Don't create SINGLE or DUP chunks for degraded rw mount

2019-02-11 Thread Qu Wenruo
[PROBLEM] The following script can easily create unnecessary SINGLE or DUP chunks: #!/bin/bash dev1="/dev/test/scratch1" dev2="/dev/test/scratch2" dev3="/dev/test/scratch3" mnt="/mnt/btrfs" umount $dev1 $dev2 $dev3 $mnt &> /dev/null mkfs.btrfs -f $dev1 $dev2 -d raid1 -m raid1 mo

Re: corruption with multi-device btrfs + single bcache, won't mount

2019-02-11 Thread Qu Wenruo
On 2019/2/12 下午2:22, Steve Leung wrote: > > > - Original Message - >> From: "Qu Wenruo" >> To: "STEVE LEUNG" , linux-btrfs@vger.kernel.org >> Sent: Sunday, February 10, 2019 6:52:23 AM >> Subject: Re: corruption with multi-device btrfs + single bcache, won't mount > >> - Original

Re: corruption with multi-device btrfs + single bcache, won't mount

2019-02-11 Thread Steve Leung
- Original Message - > From: "Qu Wenruo" > To: "STEVE LEUNG" , linux-btrfs@vger.kernel.org > Sent: Sunday, February 10, 2019 6:52:23 AM > Subject: Re: corruption with multi-device btrfs + single bcache, won't mount > - Original Message - > From: "Qu Wenruo" > On 2019/2/10 下午2:

Re: [PATCH v2 2/2] fstests: btrfs: Introduce stress test for deadlock between snapshot delete and other read-write operations

2019-02-11 Thread Qu Wenruo
On 2019/1/11 下午1:01, Qu Wenruo wrote: [snip] > +# FS QA Test 179 > +# > +# Test if btrfs will lockup at subvolume deletion when qgroups are enabled. > +# > +# This bug is going to be fixed by a patch for the kernel titled > +# "btrfs: qgroup: Don't trigger backref walk at delayed ref insert time"

Re: [PATCH] btrfs: honor path->skip_locking in backref code

2019-02-11 Thread Qu Wenruo
On 2019/1/17 上午12:00, Josef Bacik wrote: > qgroups will do the old roots lookup at delayed ref time, which could be > while walking down the extent root while running a delayed ref. This > should be fine, except we specifically lock eb's in the backref walking > code irrespective of path->skip_l

Corrupted filesystem, looking for guidance

2019-02-11 Thread Sébastien Luttringer
Hello, The context is a BTRFS filesystem on top of an md device (raid5 on 6 disks). System is an Arch Linux and the kernel was a vanilla 4.20.2. # btrfs fi us /home Overall: Device size: 27.29TiB Device allocated: 5.01TiB Device unallocated: 22.

Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7

2019-02-11 Thread Zygo Blaxell
Still reproducible on 4.20.7. The behavior is slightly different on current kernels (4.20.7, 4.14.96) which makes the problem a bit more difficult to detect. # repro-hole-corruption-test i: 91, status: 0, bytes_deduped: 131072 i: 92, status: 0, bytes_deduped: 131072

[PATCH v3 3/3] btrfs: trivial, fix c coding style

2019-02-11 Thread Anand Jain
Maintain the lines extented upto 80 char where possible, and indent the argument. Signed-off-by: Anand Jain --- v3: changelog added. fs/btrfs/props.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c index 77a03076b18e..3c15

[PATCH v3 0/3] Misc props.c cleanups

2019-02-11 Thread Anand Jain
v3: Merge patch 2/5 and 3/5 as in v1. Not included 1/5 in v1 as its already integrated in misc-next. While adding the readmirror property found few cleanup things which can be fixed. As these aren't part of upcoming readmirror property I am sending these separately. Anand Jain (3): btrfs: k

[PATCH v3 2/3] btrfs: drop redundant forward declaration in props.c

2019-02-11 Thread Anand Jain
Drop forward declaration of the functions, prop_compression_validate(), prop_compression_apply() and prop_compression_extract(). By moving prop_handlers[], btrfs_props_init() prop_compression_validate(), prop_compression_apply() and prop_compression_extract() appropriately within the file. No funct

[PATCH v3 1/3] btrfs: kill __btrfs_set_prop()

2019-02-11 Thread Anand Jain
btrfs_set_prop() is a redirect to __btrfs_set_prop() with the transaction handler equal to NULL. And __btrfs_set_prop() inturn diectly uses trans to do_setxattr() which when trans is NULL creates a transaction. Instead rename __btrfs_set_prop() to btrfs_set_prop(), and update the caller with NULL

Re: btrfs as / filesystem in RAID1

2019-02-11 Thread Chris Murphy
On Mon, Feb 11, 2019 at 5:17 AM Austin S. Hemmelgarn wrote: > > Last I knew, it was systemd itself doing the pause, because we provide > no real device for udev to wait on appearing. Well there's more than one thing responsible for the net behavior. The most central thing waiting is the kernel. A

Re: [PATCH] btrfs: Silence a static checker locking warning

2019-02-11 Thread Dan Carpenter
On Mon, Feb 11, 2019 at 05:36:13PM +0100, David Sterba wrote: > > I have re-written the code though to make it cleaner and > > to silence the static checkers. > > Maybe there's something new the static checker needs to learn. Gar. Yes. You're right. I hadn't thought about that read locks could

Re: [PATCH 3/5] btrfs: reorg functions to drop forward declaration

2019-02-11 Thread David Sterba
On Fri, Feb 08, 2019 at 09:10:41AM +0200, Nikolay Borisov wrote: > > > On 8.02.19 г. 9:02 ч., Anand Jain wrote: > > In preparation to drop forward declaration of the functions, > > prop_compression_validate(), prop_compression_apply() and > > prop_compression_extract(). Move prop_handlers[], btrf

Re: [PATCH v2 1/5] btrfs: fix comment its device list mutex not volume lock

2019-02-11 Thread David Sterba
On Fri, Feb 08, 2019 at 03:39:37PM +0800, Anand Jain wrote: > We have killed volume mutex (commit: dccdb07bc996 > btrfs: kill btrfs_fs_info::volume_mutex). This a trival one seems to have > escaped. > > Signed-off-by: Anand Jain > --- > v2: Delete the wrong comment instead of fixing it. This pat

[PATCH] btrfs: drop the lock on error in btrfs_dev_replace_cancel()

2019-02-11 Thread Dan Carpenter
We should drop the lock on this error path. This is from static analysis and I don't know if it's possible to hit this error path in real life. Signed-off-by: Dan Carpenter --- fs/btrfs/dev-replace.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-repla

Re: [PATCH] btrfs: Silence a static checker locking warning

2019-02-11 Thread David Sterba
On Mon, Feb 11, 2019 at 05:36:13PM +0100, David Sterba wrote: > On Sat, Feb 09, 2019 at 12:02:55PM +0300, Dan Carpenter wrote: > > Back in the day, before commit 0b246afa62b0 ("btrfs: root->fs_info > > cleanup, add fs_info convenience variables") then we used to take > > different locks. > > Nope,

Re: [PATCH] btrfs: Silence a static checker locking warning

2019-02-11 Thread David Sterba
On Sat, Feb 09, 2019 at 12:02:55PM +0300, Dan Carpenter wrote: > Back in the day, before commit 0b246afa62b0 ("btrfs: root->fs_info > cleanup, add fs_info convenience variables") then we used to take > different locks. Nope, it's the same per-filesystem lock, just the old code got there in two dif

Re: [PATCH v3 1/9] btrfs: delayed-ref: Introduce better documented delayed ref structures

2019-02-11 Thread Qu Wenruo
[snip] >>> Looking at the dev >>> docs and the description for 'offset' field in btrfs_file_extent_item I >>> can sort of deduce that this field will only be different than null if >>> this reference is for an extent which is shared between 2 snapshots. >> >> Don't forget reflink and data CoW. >> >

Re: [PATCH v3 1/9] btrfs: delayed-ref: Introduce better documented delayed ref structures

2019-02-11 Thread Nikolay Borisov
On 11.02.19 г. 15:23 ч., Qu Wenruo wrote: > > > On 2019/2/11 下午8:55, Nikolay Borisov wrote: >> >> >> On 11.02.19 г. 7:16 ч., Qu Wenruo wrote: >>> Current delayed ref interface has several problems: >>> - Longer and longer parameter lists >>> bytenr >>> num_bytes >>> parent >>>

Re: [PATCH v3 1/9] btrfs: delayed-ref: Introduce better documented delayed ref structures

2019-02-11 Thread Qu Wenruo
On 2019/2/11 下午8:55, Nikolay Borisov wrote: > > > On 11.02.19 г. 7:16 ч., Qu Wenruo wrote: >> Current delayed ref interface has several problems: >> - Longer and longer parameter lists >> bytenr >> num_bytes >> parent >> -- so far so good >> ref_root >> owner >> offset >>

Re: [PATCH v3 8/9] btrfs: extent-tree: Use btrfs_ref to refactor btrfs_free_extent()

2019-02-11 Thread Nikolay Borisov
On 11.02.19 г. 7:16 ч., Qu Wenruo wrote: > Similar to btrfs_inc_extent_ref(), just use btrfs_ref to replace the > long parameter list and the confusing @owner parameter. > > Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov > --- > fs/btrfs/ctree.h | 5 +--- > fs/btrfs/extent-t

Re: [PATCH v3 7/9] btrfs: extent-tree: Use btrfs_ref to refactor btrfs_inc_extent_ref()

2019-02-11 Thread Nikolay Borisov
On 11.02.19 г. 7:16 ч., Qu Wenruo wrote: > Now we don't need to play the dirty game of reusing @owner for tree block > level. > > Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov > --- > fs/btrfs/ctree.h | 5 ++-- > fs/btrfs/extent-tree.c | 57 -

Re: [PATCH v3 5/9] btrfs: ref-verify: Use btrfs_ref to refactor btrfs_ref_tree_mod()

2019-02-11 Thread Nikolay Borisov
On 11.02.19 г. 7:16 ч., Qu Wenruo wrote: > It's a perfect match for btrfs_ref_tree_mod() to use btrfs_ref, as > btrfs_ref describes a metadata/data reference update comprehensively. > > Now we have one less function use confusing owner/level trick. > > Signed-off-by: Qu Wenruo Reviewed-by: Ni

Re: [PATCH v3 4/9] btrfs: delayed-ref: Use btrfs_ref to refactor btrfs_add_delayed_data_ref()

2019-02-11 Thread Nikolay Borisov
On 11.02.19 г. 7:16 ч., Qu Wenruo wrote: > Just like btrfs_add_delayed_tree_ref(), use btrfs_ref to refactor > btrfs_add_delayed_data_ref(). > > Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov > --- > fs/btrfs/delayed-ref.c | 20 ++-- > fs/btrfs/delayed-ref.h | 7 ++

Re: [PATCH v3 3/9] btrfs: delayed-ref: Use btrfs_ref to refactor btrfs_add_delayed_tree_ref()

2019-02-11 Thread Nikolay Borisov
On 11.02.19 г. 7:16 ч., Qu Wenruo wrote: > btrfs_add_delayed_tree_ref() has a longer and longer parameter list, and > some caller like btrfs_inc_extent_ref() are using @owner as level for > delayed tree ref. > > Instead of making the parameter list longer and longer, use btrfs_ref to > refactor

Re: [PATCH v3 1/9] btrfs: delayed-ref: Introduce better documented delayed ref structures

2019-02-11 Thread Nikolay Borisov
On 11.02.19 г. 7:16 ч., Qu Wenruo wrote: > Current delayed ref interface has several problems: > - Longer and longer parameter lists > bytenr > num_bytes > parent > -- so far so good > ref_root > owner > offset > -- I don't feel good now > > - Different interpret

Re: btrfs as / filesystem in RAID1

2019-02-11 Thread Austin S. Hemmelgarn
On 2019-02-10 13:34, Chris Murphy wrote: On Sat, Feb 9, 2019 at 5:13 AM waxhead wrote: Understood, but that is not quite what I meant - let me rephrase... If BTRFS still can't mount, why would it blindly accept a previously non-existing disk to take part of the pool?! It doesn't do it blindl

Re: btrfs as / filesystem in RAID1

2019-02-11 Thread Anand Jain
On 2/7/19 7:04 PM, Stefan K wrote: Thanks, with degraded as kernel parameter and also ind the fstab it works like expected That should be the normal behaviour, IMO in the long term it will be. But before that we have few items to fix around this, such as the serviceability part. -Anan

[PATCH v5 3/3] btrfs: scrub: convert scrub_workers_refcnt to refcount_t

2019-02-11 Thread Anand Jain
Use the refcount_t for fs_info::scrub_workers_refcnt instead of int. Signed-off-by: Anand Jain --- v5: Fix refcount validation warning. Use refcount_set() instead of refcount_inc() when count is 0. v4: born fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 2 +- fs/btrfs/scrub.c | 10 +

[PATCH v2 07/12] btrfs: replace pending/pinned chunks lists with io tree

2019-02-11 Thread Nikolay Borisov
From: Jeff Mahoney The pending chunks list contains chunks that are allocated in the current transaction but haven't been created yet. The pinned chunks list contains chunks that are being released in the current transaction. Both describe chunks that are not reflected on disk as in use but are u

[PATCH v2 02/12] btrfs: combine device update operations during transaction commit

2019-02-11 Thread Nikolay Borisov
From: Jeff Mahoney We currently overload the pending_chunks list to handle updating btrfs_device->commit_bytes used. We don't actually care about the extent mapping or even the device mapping for the chunk - we just need the device, and we can end up processing it multiple times. The fs_devices

[PATCH v2 11/12] btrfs: Implement find_first_clear_extent_bit

2019-02-11 Thread Nikolay Borisov
This function is very similar to find_first_extent_bit except that it locates the first contiguous span of space which does not have bits set. It's intended use is in the freespace trimming code. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 73 +++

[PATCH v2 03/12] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink

2019-02-11 Thread Nikolay Borisov
During device shrink pinned/pending chunks (i.e those which have been deleted/created respectively, in the current transaction and haven't touched disk) need to be accounted when doing device shrink. Presently this happens after the main relocation loop in btrfs_shrink_device, which could lead to m

[PATCH v2 10/12] btrfs: Optimize unallocated chunks discard

2019-02-11 Thread Nikolay Borisov
Currently unallocated chunks are always trimmed. For example 2 consecutive trims on large storage would trim freespace twice irrespective of whether the space was actually allocated or not between those trims. Optimise this behavior by exploiting the newly introduced alloc_state tree of btrfs_devi

[PATCH v2 12/12] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit

2019-02-11 Thread Nikolay Borisov
Instead of always calling the allocator to search for a free extent, that satisfies the input criteria, switch btrfs_trim_free_extents to using find_first_clear_extent_bit. With this change it's no longer necessary to read the device tree in order to figure out holes in the devices. Now the code a

[PATCH v2 08/12] btrfs: Remove 'trans' argument from find_free_dev_extent(_start)

2019-02-11 Thread Nikolay Borisov
Now that those function no longer require a handle to transaction to inspect pending/pinned chunks the argument can be removed. At the same time also remove any surrounding code which acquired the handle. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent-tree.c | 36 +++-

[PATCH v2 06/12] btrfs: Introduce new bits for device allocation tree

2019-02-11 Thread Nikolay Borisov
Rather than hijacking the existing defines let's just define new bits, with more descriptive names. Instead of using yet more (currently at 18) bits for the new flags, use the fact those flags will be specific to the device allocation tree so define them using existing EXTENT_* flags. Signed-off-b

[PATCH v2 01/12] btrfs: Honour FITRIM range constraints during free space trim

2019-02-11 Thread Nikolay Borisov
Up until know trimming the freespace was done irrespective of what the arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments will be entirely ignored. Fix it by correctly handling those paramter. This requires breaking if the found freespace extent is after the end of the passed

[PATCH v2 09/12] btrfs: Factor out in_range macro

2019-02-11 Thread Nikolay Borisov
This is used in more than one places so let's factor it out in ctree.h. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent-tree.c | 1 - fs/btrfs/volumes.c | 1 - 3 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctr

[PATCH v2 04/12] btrfs: Rename and export clear_btree_io_tree

2019-02-11 Thread Nikolay Borisov
This function is going to be used to clear out the device extent allocation information. Give it a more generic name and export it. This is in preparation to replacing the pending/pinned chunk lists with an extent tree. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io

[PATCH v2 05/12] btrfs: Populate ->orig_block_len during read_one_chunk

2019-02-11 Thread Nikolay Borisov
Chunks read from disk currently don't get their ->orig_block_len member set, in contrast when a new chunk is allocated, the respective extent_map's ->orig_block_len is assigned the size of the stripe of this chunk. Let's apply the same strategy for chunks which are read from disk, not only does thi

[PATCH v2 00/12] FITRIM improvements

2019-02-11 Thread Nikolay Borisov
Here is the second version of the FITRIM patchset. For background information consult the previous [0] post. Changes since v1: * Dropped some cleanup patches as they have been merged in the meantime. * In Patch 2 switched list iteration to list_for_each_entry_safe in btrfs_cleanup_one_tra

BUG ON during btrfs check

2019-02-11 Thread Norbert Scheibner
Hi! I'v hit a BUG ON during btrfs check: - server:~# btrfs check --progress --repair /dev/sde enabling repair mode Opening filesystem to check... Checking filesystem on /dev/sde UUID: d5fa971b-6546-424d-87c1-dcd688eacdac [1/7] checking root items

Re: [PATCH v4 0/3] btrfs: scrub: fix scrub_lock

2019-02-11 Thread Anand Jain
On 2/9/19 1:02 AM, David Sterba wrote: On Wed, Jan 30, 2019 at 02:44:59PM +0800, Anand Jain wrote: Fixes the circular locking dependency warning as in patch 1/3, and patch 2/3 adds lockdep_assert_held() to scrub_workers_get(). Patch 3/3 converts scrub_workers_refcnt into refcount_t. Anand Ja