Re: Qgroup accounting issue on kdave/for-next branch

2016-11-28 Thread Qu Wenruo
At 11/29/2016 02:36 PM, Chandan Rajendra wrote: When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I noticed the following call trace. [ 77.335887] [ cut here ] [ 77.336115] WARNING: CPU: 0 PID: 8325 at /root/repos/linux/fs/btrfs/qgroup.c:2443

[PATCH v2 0/3] Qgroup and inode_cache fix, with small cleanups

2016-11-28 Thread Qu Wenruo
This patchset fixes a qgroup bug when using with inode_cache mount option. The bug reminds us that the design of separate btrfs_qgroup_prepare_account_extents() is a bad practice especially the safest timing is to call prepare just before btrfs_qgroup_account_extents(). So the patchset will

[PATCH v2 3/3] btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function

2016-11-28 Thread Qu Wenruo
Quite a lot of qgroup corruption happens due to wrong timing of calling btrfs_qgroup_prepare_account_extents(). Since the safest timing is calling it just before btrfs_qgroup_account_extents(), there is no need to separate these 2 function. Merging them will make code cleaner and less bug prone.

[PATCH v2 1/3] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option

2016-11-28 Thread Qu Wenruo
[BUG] The easist way to reproduce the bug is: -- # mkfs.btrfs -f $dev -n 16K # mount $dev $mnt -o inode_cache # btrfs quota enable $mnt # btrfs quota rescan -w $mnt # btrfs qgroup show $mnt qgroupid rfer excl 0/5 32.00KiB

[PATCH v2 2/3] btrfs: qgroup: Add quick exit for non-fs extents

2016-11-28 Thread Qu Wenruo
For btrfs_qgroup_account_extent(), modify make it exit quicker for non-fs extents. This will also reduce the noise in trace_btrfs_qgroup_account_extent event. Signed-off-by: Qu Wenruo --- v2: None --- fs/btrfs/qgroup.c | 41 +++--

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Adam Borowski
On Tue, Nov 29, 2016 at 02:52:47AM +0100, Christoph Anton Mitterer wrote: > On Mon, 2016-11-28 at 16:48 -0500, Zygo Blaxell wrote: > > If a drive's embedded controller RAM fails, you get corruption on the > > majority of reads from a single disk, and most writes will be corrupted > > (even if they

[PATCH 09/10] fstests: btrfs/122: Use _btrfs_check_scratch_qgroup to replace open code

2016-11-28 Thread Qu Wenruo
Signed-off-by: Qu Wenruo --- tests/btrfs/122 | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/tests/btrfs/122 b/tests/btrfs/122 index 82252ab..899ede5 100755 --- a/tests/btrfs/122 +++ b/tests/btrfs/122 @@ -77,12 +77,7 @@ _run_btrfs_util_prog

[PATCH 01/10] fstests: common: Introduce function to check qgroup correctness

2016-11-28 Thread Qu Wenruo
Old btrfs qgroup test cases uses fix golden output numbers, which limits the coverage since they can't handle mount options like compress or inode_map, and cause false alert. Introduce _btrfs_check_scratch_qgroup() function to check qgroup correctness using "btrfs check --qgroup-report" function,

[PATCH 10/10] fstests: btrfs/123: Use _btrfs_check_scratch_qgroup to replace open code

2016-11-28 Thread Qu Wenruo
Signed-off-by: Qu Wenruo --- tests/btrfs/123 | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/tests/btrfs/123 b/tests/btrfs/123 index e89d541..36d24c4 100755 --- a/tests/btrfs/123 +++ b/tests/btrfs/123 @@ -76,9 +76,7 @@ _run_btrfs_util_prog quota

[PATCH 06/10] fstests: btrfs/091: Use _btrfs_check_scratch_qgroup other than fixed golden output

2016-11-28 Thread Qu Wenruo
Signed-off-by: Qu Wenruo --- tests/btrfs/091 | 5 ++--- tests/btrfs/091.out | 4 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/tests/btrfs/091 b/tests/btrfs/091 index e3c43c7..ab4ff4e 100755 --- a/tests/btrfs/091 +++ b/tests/btrfs/091 @@ -94,9

[PATCH 02/10] fstests: btrfs/017: Use new _btrfs_check_scratch_qgroup function

2016-11-28 Thread Qu Wenruo
Now this test case can handle inode_map mount option and expose error for that mount option. Signed-off-by: Qu Wenruo --- tests/btrfs/017 | 4 ++-- tests/btrfs/017.out | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/tests/btrfs/017

[PATCH 08/10] fstests: btrfs/104: Use _btrfs_check_scratch_qgroup to replace open codes

2016-11-28 Thread Qu Wenruo
Signed-off-by: Qu Wenruo --- tests/btrfs/104 | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/tests/btrfs/104 b/tests/btrfs/104 index 6afaa02..a5d2070 100755 --- a/tests/btrfs/104 +++ b/tests/btrfs/104 @@ -152,14 +152,7 @@

[PATCH 07/10] fstests: btrfs/099: Add extra verification for qgroup

2016-11-28 Thread Qu Wenruo
Signed-off-by: Qu Wenruo --- tests/btrfs/099 | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tests/btrfs/099 b/tests/btrfs/099 index 70f07b5..9cc9a3d 100755 --- a/tests/btrfs/099 +++ b/tests/btrfs/099 @@ -82,6 +82,9 @@ sync $XFS_IO_PROG -f -c "pwrite -b

[PATCH 03/10] fstests: btrfs/022: Add extra qgroup verification after each work

2016-11-28 Thread Qu Wenruo
The old code is using rescan to verify the number. It will never hurt to add extra qgroup verification using btrfsck. Signed-off-by: Qu Wenruo --- tests/btrfs/022 | 5 + 1 file changed, 5 insertions(+) diff --git a/tests/btrfs/022 b/tests/btrfs/022 index

[PATCH 04/10] fstests: btrfs/028: Use new wrapped _btrfs_check_scratch_qgroup function

2016-11-28 Thread Qu Wenruo
Signed-off-by: Qu Wenruo --- tests/btrfs/028 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/btrfs/028 b/tests/btrfs/028 index 1425609..c4e99c6 100755 --- a/tests/btrfs/028 +++ b/tests/btrfs/028 @@ -87,8 +87,8 @@ _run_btrfs_util_prog

[PATCH 05/10] fstests: btrfs/042: Add extra qgroup verification

2016-11-28 Thread Qu Wenruo
Use newly introduced _btrfs_check_scratch_qgroup() to double check if qgroup numbers are correct. Signed-off-by: Qu Wenruo --- tests/btrfs/042 | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tests/btrfs/042 b/tests/btrfs/042 index 498ccc9..003b7af 100755 ---

[PATCH 00/10] Enhance btrfs qgroup test group coverage to support more mount options

2016-11-28 Thread Qu Wenruo
Old test cases in btrfs qgroup test group use fixed golden output. This limits the coverage, since mount option like compress and inode_cache and populate output easily. On the other hand, "btrfs check" has support for checking qgroup correctness at least from 3 kernel release before. And that

Re: Qgroup accounting issue on kdave/for-next branch

2016-11-28 Thread Qu Wenruo
Thanks for the detailed reports and reproducer. I'll investigate it soon. Thanks, Qu At 11/29/2016 02:36 PM, Chandan Rajendra wrote: When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I noticed the following call trace. [ 77.335887] [ cut here

Re: [PATCH 1/3] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option (fwd)

2016-11-28 Thread Qu Wenruo
-- Date: Tue, 29 Nov 2016 14:16:30 +0800 From: kbuild test robot <fengguang...@intel.com> To: kbu...@01.org Cc: Julia Lawall <julia.law...@lip6.fr> Subject: Re: [PATCH 1/3] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option Hi Qu, [auto build test WARNING on next-201

Re: [PATCH 1/3] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option (fwd)

2016-11-28 Thread Julia Lawall
: [PATCH 1/3] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option Hi Qu, [auto build test WARNING on next-20161128] [also build test WARNING on v4.9-rc7] [cannot apply to btrfs/next v4.9-rc7 v4.9-rc6 v4.9-rc5] [if your patch is applied to the wrong git tree, please drop us a

Qgroup accounting issue on kdave/for-next branch

2016-11-28 Thread Chandan Rajendra
When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I noticed the following call trace. [ 77.335887] [ cut here ] [ 77.336115] WARNING: CPU: 0 PID: 8325 at /root/repos/linux/fs/btrfs/qgroup.c:2443 .btrfs_qgroup_free_refroot+0x188/0x220 [

Re: RFC: raid with a variable stripe size

2016-11-28 Thread Qu Wenruo
At 11/29/2016 01:51 PM, Chris Murphy wrote: On Mon, Nov 28, 2016 at 5:48 PM, Qu Wenruo wrote: At 11/19/2016 02:15 AM, Goffredo Baroncelli wrote: Hello, these are only my thoughts; no code here, but I would like to share it hoping that it could be useful. As

Re: RFC: raid with a variable stripe size

2016-11-28 Thread Chris Murphy
On Mon, Nov 28, 2016 at 5:48 PM, Qu Wenruo wrote: > > > At 11/19/2016 02:15 AM, Goffredo Baroncelli wrote: >> >> Hello, >> >> these are only my thoughts; no code here, but I would like to share it >> hoping that it could be useful. >> >> As reported several times by Zygo

Re: RFC: raid with a variable stripe size

2016-11-28 Thread Qu Wenruo
At 11/29/2016 12:55 PM, Zygo Blaxell wrote: On Tue, Nov 29, 2016 at 12:12:03PM +0800, Qu Wenruo wrote: At 11/29/2016 11:53 AM, Zygo Blaxell wrote: On Tue, Nov 29, 2016 at 08:48:19AM +0800, Qu Wenruo wrote: At 11/19/2016 02:15 AM, Goffredo Baroncelli wrote: Hello, these are only my

Re: [Not TLS] Re: mount option nodatacow for VMs on SSD?

2016-11-28 Thread Duncan
Graham Cobb posted on Mon, 28 Nov 2016 09:49:33 + as excerpted: > On 28/11/16 02:56, Duncan wrote: >> It should still be worth turning on autodefrag on an existing somewhat >> fragmented filesystem. It just might take some time to defrag files >> you do modify, and won't touch those you

Re: mount option nodatacow for VMs on SSD?

2016-11-28 Thread Duncan
Niccolò Belli posted on Mon, 28 Nov 2016 12:11:49 +0100 as excerpted: > On lunedì 28 novembre 2016 09:20:15 CET, Kai Krakow wrote: >> You can, however, use chattr to make the subvolume root directory (that >> one where it is mounted) nodatacow (chattr +C) _before_ placing any >> files or

Re: RFC: raid with a variable stripe size

2016-11-28 Thread Zygo Blaxell
On Tue, Nov 29, 2016 at 12:12:03PM +0800, Qu Wenruo wrote: > > > At 11/29/2016 11:53 AM, Zygo Blaxell wrote: > >On Tue, Nov 29, 2016 at 08:48:19AM +0800, Qu Wenruo wrote: > >>At 11/19/2016 02:15 AM, Goffredo Baroncelli wrote: > >>>Hello, > >>> > >>>these are only my thoughts; no code here, but I

[PATCH 1/3] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option

2016-11-28 Thread Qu Wenruo
[BUG] The easist way to reproduce the bug is: -- # mkfs.btrfs -f $dev -n 16K # mount $dev $mnt -o inode_cache # btrfs quota enable $mnt # btrfs quota rescan -w $mnt # btrfs qgroup show $mnt qgroupid rfer excl 0/5 32.00KiB

[PATCH 2/3] btrfs: qgroup: Add quick exit for non-fs extents

2016-11-28 Thread Qu Wenruo
For btrfs_qgroup_account_extent(), modify make it exit quicker for non-fs extents. This will also reduce the noise in trace_btrfs_qgroup_account_extent event. Signed-off-by: Qu Wenruo --- fs/btrfs/qgroup.c | 41 +++-- 1 file changed,

[PATCH 3/3] btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function

2016-11-28 Thread Qu Wenruo
Quite a lot of qgroup corruption happens due to wrong timing of calling btrfs_qgroup_prepare_account_extents(). Since the safest timing is calling it just before btrfs_qgroup_account_extents(), there is no need to separate these 2 function. Merging them will make code cleaner and less bug prone.

[PATCH 0/3] Qgroup and inode_cache fix, with small cleanups

2016-11-28 Thread Qu Wenruo
This patchset fixes a qgroup bug when using with inode_cache mount option. The bug reminds us that the design of separate btrfs_qgroup_prepare_account_extents() is a bad practice especially the safest timing is to call prepare just before btrfs_qgroup_account_extents(). So the patchset will

Re: RFC: raid with a variable stripe size

2016-11-28 Thread Qu Wenruo
At 11/29/2016 11:53 AM, Zygo Blaxell wrote: On Tue, Nov 29, 2016 at 08:48:19AM +0800, Qu Wenruo wrote: At 11/19/2016 02:15 AM, Goffredo Baroncelli wrote: Hello, these are only my thoughts; no code here, but I would like to share it hoping that it could be useful. As reported several times

Re: RFC: raid with a variable stripe size

2016-11-28 Thread Zygo Blaxell
On Tue, Nov 29, 2016 at 08:48:19AM +0800, Qu Wenruo wrote: > At 11/19/2016 02:15 AM, Goffredo Baroncelli wrote: > >Hello, > > > >these are only my thoughts; no code here, but I would like to share it > >hoping that it could be useful. > > > >As reported several times by Zygo (and others), one of

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Zygo Blaxell
On Tue, Nov 29, 2016 at 02:52:47AM +0100, Christoph Anton Mitterer wrote: > On Mon, 2016-11-28 at 16:48 -0500, Zygo Blaxell wrote: > > If a drive's > > embedded controller RAM fails, you get corruption on the majority of > > reads from a single disk, and most writes will be corrupted (even if > >

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Christoph Anton Mitterer
On Mon, 2016-11-28 at 16:48 -0500, Zygo Blaxell wrote: > If a drive's > embedded controller RAM fails, you get corruption on the majority of > reads from a single disk, and most writes will be corrupted (even if > they > were not before). Administrating a multi-PiB Tier-2 for the LHC Computing

Re: RFC: raid with a variable stripe size

2016-11-28 Thread Qu Wenruo
At 11/19/2016 02:15 AM, Goffredo Baroncelli wrote: Hello, these are only my thoughts; no code here, but I would like to share it hoping that it could be useful. As reported several times by Zygo (and others), one of the problem of raid5/6 is the write hole. Today BTRFS is not capable to

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Zygo Blaxell
On Mon, Nov 28, 2016 at 07:32:38PM +0100, Goffredo Baroncelli wrote: > On 2016-11-28 04:37, Christoph Anton Mitterer wrote: > > I think for safety it's best to repair as early as possible (and thus > > on read when a damage is detected), as further blocks/devices may fail > > till eventually a

Re: [PATCH 6/8] btrfs: calculate end of bio offset properly

2016-11-28 Thread Omar Sandoval
On Fri, Nov 25, 2016 at 09:07:51AM +0100, Christoph Hellwig wrote: > Use the bvec offset and len members to prepare for multipage bvecs. Reviewed-by: Omar Sandoval > Signed-off-by: Christoph Hellwig > --- > fs/btrfs/compression.c | 10 -- > 1 file changed,

[PATCH] btrfs: fix uninitialized variable access after ASSERT

2016-11-28 Thread Arnd Bergmann
In btrfs, ASSERT() has no effect if CONFIG_BTRFS_ASSERT is disabled, and gcc notices that this can lead to using an uninitialized variable: fs/btrfs/inode.c: In function 'run_delalloc_range': fs/btrfs/inode.c:1190:18: error: 'cur_end' may be used uninitialized in this function

Re: btrfs: still lockdep splat for 4.9-rc5+ (btrfs_log_inode)

2016-11-28 Thread Liu Bo
On Sat, Nov 26, 2016 at 08:46:38AM -0500, Chris Mason wrote: > On Fri, Nov 25, 2016 at 10:03:25AM +0100, Christian Borntraeger wrote: > > FWIW, I still see the lockdep splat in btrfs in 4.9-rc5+ > > Filipe reworked the code to avoid taking the same lock twice. As far as I > can tell, this just

Re: [PATCH v2 1/2] Btrfs: add more valid checks for superblock

2016-11-28 Thread Liu Bo
On Fri, Nov 25, 2016 at 05:50:19PM +0100, David Sterba wrote: > On Fri, Jun 03, 2016 at 12:05:14PM -0700, Liu Bo wrote: > > @@ -6648,6 +6648,7 @@ int btrfs_read_chunk_tree(struct btrfs_root *root) > > struct btrfs_key found_key; > > int ret; > > int slot; > > + u64 total_dev = 0; > >

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Austin S. Hemmelgarn
On 2016-11-28 14:01, Christoph Anton Mitterer wrote: On Mon, 2016-11-28 at 19:45 +0100, Goffredo Baroncelli wrote: I am understanding that the status of RAID5/6 code is so badly Just some random thought: If the code for raid56 is really as bad as it's often claimed (I haven't read it, to be

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Christoph Anton Mitterer
On Mon, 2016-11-28 at 19:32 +0100, Goffredo Baroncelli wrote: > I am assuming that a corruption is a quite rare event. So > occasionally it could happens that a page is corrupted and the system > corrects it. This shouldn't  have an impact on the workloads. Probably, but it still make sense to

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Christoph Anton Mitterer
On Mon, 2016-11-28 at 19:45 +0100, Goffredo Baroncelli wrote: > I am understanding that the status of RAID5/6 code is so badly Just some random thought: If the code for raid56 is really as bad as it's often claimed (I haven't read it, to be honest) could it perhaps make sense to consider to

[no subject]

2016-11-28 Thread foss
Hello, I have a multi-device btrfs (with problems, more on that later). I looked into btrfs-image and was surprised to find that "for i in 5 6 7 8 ; do sudo btrfs-image -t2 /dev/sda$i - | md5sum;done" returns a different hash for sda7. The other three hashes are the same, as I believe they

[PATCH] btrfs-progs: man mkfs: warn about RAID5/6 being experimental

2016-11-28 Thread Adam Borowski
Signed-off-by: Adam Borowski --- Documentation/mkfs.btrfs.asciidoc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/mkfs.btrfs.asciidoc b/Documentation/mkfs.btrfs.asciidoc index 9b1d45a..c92d730 100644 --- a/Documentation/mkfs.btrfs.asciidoc +++

[PATCH] btrfs-progs: mkfs, balance convert: warn about RAID5/6 in fiery letters

2016-11-28 Thread Adam Borowski
People who don't frequent IRC nor the mailing list tend to believe RAID 5/6 are stable; this leads to data loss. Thus, let's do warn them. At this point, I think fiery letters that won't be missed are warranted. Kernel 4.9 and its -progs will be a part of LTS of multiple distributions, so

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Goffredo Baroncelli
On 2016-11-28 01:40, Qu Wenruo wrote: > > At 11/27/2016 07:16 AM, Goffredo Baroncelli wrote: >> On 2016-11-26 19:54, Zygo Blaxell wrote: >>> On Sat, Nov 26, 2016 at 02:12:56PM +0100, Goffredo Baroncelli wrote: On 2016-11-25 05:31, Zygo Blaxell wrote: >> [...] BTW Btrfs in RAID1

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Goffredo Baroncelli
On 2016-11-28 04:37, Christoph Anton Mitterer wrote: > I think for safety it's best to repair as early as possible (and thus > on read when a damage is detected), as further blocks/devices may fail > till eventually a scrub(with repair) would be run manually. > > However, there may some

Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents

2016-11-28 Thread Zygo Blaxell
On Mon, Nov 28, 2016 at 05:27:10PM +0500, Roman Mamedov wrote: > On Mon, 28 Nov 2016 00:03:12 -0500 > Zygo Blaxell wrote: > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index 8e3a5a2..b1314d6 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c

Re: [RFC PATCH 0/2] Btrfs: make a source length of 0 imply EOF for dedupe

2016-11-28 Thread Darrick J. Wong
On Thu, Nov 24, 2016 at 11:20:39PM -0500, Zygo Blaxell wrote: > On Wed, Nov 23, 2016 at 05:26:18PM -0800, Darrick J. Wong wrote: > [...] > > Keep in mind that the number of bytes deduped is returned to userspace > > via file_dedupe_range.info[x].bytes_deduped, so a properly functioning > >

Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents

2016-11-28 Thread Roman Mamedov
On Mon, 28 Nov 2016 00:03:12 -0500 Zygo Blaxell wrote: > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 8e3a5a2..b1314d6 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct >

Re: mount option nodatacow for VMs on SSD?

2016-11-28 Thread Niccolò Belli
On lunedì 28 novembre 2016 09:20:15 CET, Kai Krakow wrote: You can, however, use chattr to make the subvolume root directory (that one where it is mounted) nodatacow (chattr +C) _before_ placing any files or directories in there. That way, newly created files and directories will inherit the

Re: [Not TLS] Re: mount option nodatacow for VMs on SSD?

2016-11-28 Thread Graham Cobb
On 28/11/16 02:56, Duncan wrote: > It should still be worth turning on autodefrag on an existing somewhat > fragmented filesystem. It just might take some time to defrag files you > do modify, and won't touch those you don't, which in some cases might > make it worth defragging those manually.

Re: mount option nodatacow for VMs on SSD?

2016-11-28 Thread Kai Krakow
Am Mon, 28 Nov 2016 01:38:29 +0100 schrieb Ulli Horlacher : > On Sat 2016-11-26 (11:27), Kai Krakow wrote: > > > > I have vmware and virtualbox VMs on btrfs SSD. > > > As a side note: I don't think you can use "nodatacow" just for one > > subvolume while the