Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Goffredo Baroncelli
On 2017-06-22 04:12, Qu Wenruo wrote: > > And in that case even device of data stripe 2 is missing, btrfs don't really > need to use parity to rebuild it, as btrfs knows there is no extent in that > stripe, and data csum matches for data stripe 1. You are assuming that there is no data in

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Qu Wenruo
At 06/22/2017 10:53 AM, Marc MERLIN wrote: Ok, first it finished (almost 24H) (...) ERROR: root 3862 EXTENT_DATA[18170706 135168] interrupt ERROR: root 3862 EXTENT_DATA[18170706 1048576] interrupt ERROR: root 3864 EXTENT_DATA[109336 4096] interrupt ERROR: errors found in fs roots found

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Qu Wenruo
At 06/22/2017 10:53 AM, Marc MERLIN wrote: Ok, first it finished (almost 24H) (...) ERROR: root 3862 EXTENT_DATA[18170706 135168] interrupt ERROR: root 3862 EXTENT_DATA[18170706 1048576] interrupt ERROR: root 3864 EXTENT_DATA[109336 4096] interrupt ERROR: errors found in fs roots found

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Qu Wenruo
At 06/22/2017 10:43 AM, Chris Murphy wrote: On Wed, Jun 21, 2017 at 8:12 PM, Qu Wenruo wrote: Well, in fact, thanks to data csum and btrfs metadata CoW, there is quite a high chance that we won't cause any data damage. But we have examples where data does not

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Marc MERLIN
Ok, first it finished (almost 24H) (...) ERROR: root 3862 EXTENT_DATA[18170706 135168] interrupt ERROR: root 3862 EXTENT_DATA[18170706 1048576] interrupt ERROR: root 3864 EXTENT_DATA[109336 4096] interrupt ERROR: errors found in fs roots found 5544779108352 bytes used, error(s) found total csum

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Chris Murphy
On Wed, Jun 21, 2017 at 8:12 PM, Qu Wenruo wrote: > > Well, in fact, thanks to data csum and btrfs metadata CoW, there is quite a > high chance that we won't cause any data damage. But we have examples where data does not COW, we see a partial stripe overwrite. And if

Re: [PATCH] btrfs: Remove false alert when fiemap range is smaller than on-disk extent

2017-06-21 Thread Adam Borowski
On Thu, Jun 22, 2017 at 10:01:21AM +0800, Qu Wenruo wrote: > Commit 4751832da990 ("btrfs: fiemap: Cache and merge fiemap extent before > submit it to user") introduced a warning to catch unemitted cached > fiemap extent. > > However such warning doesn't take the following case into consideration:

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Qu Wenruo
At 06/21/2017 11:13 PM, Marc MERLIN wrote: On Tue, Jun 20, 2017 at 08:43:52PM -0700, Marc MERLIN wrote: On Tue, Jun 20, 2017 at 09:31:42PM -0600, Chris Murphy wrote: On Tue, Jun 20, 2017 at 5:12 PM, Marc MERLIN wrote: I'm now going to remount this with nospace_cache to

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Qu Wenruo
At 06/22/2017 02:24 AM, Chris Murphy wrote: On Wed, Jun 21, 2017 at 2:45 AM, Qu Wenruo wrote: Unlike pure stripe method, one fully functional RAID5/6 should be written in full stripe behavior, which is made up by N data stripes and correct P/Q. Given one example to

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Qu Wenruo
At 06/22/2017 01:03 AM, Goffredo Baroncelli wrote: Hi Qu, On 2017-06-21 10:45, Qu Wenruo wrote: At 06/21/2017 06:57 AM, waxhead wrote: I am trying to piece together the actual status of the RAID5/6 bit of BTRFS. The wiki refer to kernel 3.19 which was released in February 2015 so I assume

commands hang 30-60s during scrubs, includes sysrq t

2017-06-21 Thread Chris Murphy
I'm getting command hangs and service start failures during scrubs. top says CPU idle is 58-64%. Running 'perf top' takes more than 1 minute for results to appear. Connecting to a web management service (cockpit) takes longer, maybe 2 minutes or sometimes the login times out. And just doing an ls

[PATCH] btrfs: Remove false alert when fiemap range is smaller than on-disk extent

2017-06-21 Thread Qu Wenruo
Commit 4751832da990 ("btrfs: fiemap: Cache and merge fiemap extent before submit it to user") introduced a warning to catch unemitted cached fiemap extent. However such warning doesn't take the following case into consideration: 0 4K 8K |< fiemap

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Marc MERLIN
On Wed, Jun 21, 2017 at 05:22:15PM -0600, Chris Murphy wrote: > I don't know what it means. Maybe Qu has some idea. He might want a > btrfs-image of this file system to see if it's a bug. There are still > some bugs found with lowmem mode, so these could be bogus messages. > But the file system

Re: [PATCH] btrfs: DEBUG fiemap: Show more info about extent_fiemap

2017-06-21 Thread Qu Wenruo
At 06/21/2017 08:10 PM, Adam Borowski wrote: On Wed, Jun 21, 2017 at 05:28:50PM +0800, Qu Wenruo wrote: Would you please try this patch based on v4.12-rc5 and try to reproduce the kernel warning? It would be better to eliminate the noisy by ensure there is no other fiemap caller on btrfs.

Re: How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Chris Murphy
On Wed, Jun 21, 2017 at 9:13 AM, Marc MERLIN wrote: > On Tue, Jun 20, 2017 at 08:43:52PM -0700, Marc MERLIN wrote: >> On Tue, Jun 20, 2017 at 09:31:42PM -0600, Chris Murphy wrote: >> > On Tue, Jun 20, 2017 at 5:12 PM, Marc MERLIN wrote: >> > >> > > I'm now

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Chris Murphy
On Wed, Jun 21, 2017 at 2:12 PM, Goffredo Baroncelli wrote: > > Generally speaking, when you write "two failure" this means two failure at > the same time. But the write hole happens even if these two failures are not > at the same time: > > Event #1: power failure between

Re: [PATCH 1/2] btrfs: account for pinned bytes and bytes_may_use in should_alloc_chunk

2017-06-21 Thread Jeff Mahoney
On 6/21/17 5:15 PM, Chris Mason wrote: > > > On 06/21/2017 05:08 PM, Jeff Mahoney wrote: >> On 6/21/17 4:31 PM, Chris Mason wrote: >>> On 06/21/2017 04:14 PM, Jeff Mahoney wrote: On 6/14/17 11:44 AM, je...@suse.com wrote: > From: Jeff Mahoney > > In a heavy

Re: [PATCH 1/2] btrfs: account for pinned bytes and bytes_may_use in should_alloc_chunk

2017-06-21 Thread Chris Mason
On 06/21/2017 05:08 PM, Jeff Mahoney wrote: On 6/21/17 4:31 PM, Chris Mason wrote: On 06/21/2017 04:14 PM, Jeff Mahoney wrote: On 6/14/17 11:44 AM, je...@suse.com wrote: From: Jeff Mahoney In a heavy write scenario, we can end up with a large number of pinned bytes. This

Re: [PATCH 1/2] btrfs: account for pinned bytes and bytes_may_use in should_alloc_chunk

2017-06-21 Thread Jeff Mahoney
On 6/21/17 4:31 PM, Chris Mason wrote: > On 06/21/2017 04:14 PM, Jeff Mahoney wrote: >> On 6/14/17 11:44 AM, je...@suse.com wrote: >>> From: Jeff Mahoney >>> >>> In a heavy write scenario, we can end up with a large number of pinned >>> bytes. This can translate into (very)

Re: [PATCH 1/2] btrfs: account for pinned bytes and bytes_may_use in should_alloc_chunk

2017-06-21 Thread Chris Mason
On 06/21/2017 04:14 PM, Jeff Mahoney wrote: On 6/14/17 11:44 AM, je...@suse.com wrote: From: Jeff Mahoney In a heavy write scenario, we can end up with a large number of pinned bytes. This can translate into (very) premature ENOSPC because pinned bytes must be accounted for

Re: [PATCH 1/2] btrfs: account for pinned bytes and bytes_may_use in should_alloc_chunk

2017-06-21 Thread Jeff Mahoney
On 6/14/17 11:44 AM, je...@suse.com wrote: > From: Jeff Mahoney > > In a heavy write scenario, we can end up with a large number of pinned > bytes. This can translate into (very) premature ENOSPC because pinned > bytes must be accounted for when allowing a reservation but aren't

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Goffredo Baroncelli
On 2017-06-21 20:24, Chris Murphy wrote: > On Wed, Jun 21, 2017 at 2:45 AM, Qu Wenruo wrote: > >> Unlike pure stripe method, one fully functional RAID5/6 should be written in >> full stripe behavior, which is made up by N data stripes and correct P/Q. >> >> Given one

Re: [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version

2017-06-21 Thread David Sterba
On Wed, May 17, 2017 at 10:56:22AM +0800, Qu Wenruo wrote: > The remaining qgroup fixes patches, based on the Chris' for-linus-4.12 > branch with commit 9bcaaea7418d09691f1ffab5c49aacafe3eef9d0 as base. > > Can be fetched from github: >

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Chris Murphy
On Wed, Jun 21, 2017 at 12:51 AM, Marat Khalili wrote: > On 21/06/17 06:48, Chris Murphy wrote: >> >> Another possibility is to ensure a new write is written to a new*not* >> full stripe, i.e. dynamic stripe size. So if the modification is a 50K >> file on a 4 disk raid5; instead of

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Chris Murphy
On Wed, Jun 21, 2017 at 2:45 AM, Qu Wenruo wrote: > Unlike pure stripe method, one fully functional RAID5/6 should be written in > full stripe behavior, which is made up by N data stripes and correct P/Q. > > Given one example to show how write sequence affects the

Re: [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount

2017-06-21 Thread David Sterba
On Tue, Jun 06, 2017 at 04:45:32PM -0700, Omar Sandoval wrote: > From: Omar Sandoval > > Catch any future/remaining leaks or underflows of total_bytes_pinned. > > Signed-off-by: Omar Sandoval This patch received some objections. As it's a debugging aid, I'd

Re: [PATCH 2/7] Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT()

2017-06-21 Thread David Sterba
On Tue, Jun 06, 2017 at 04:45:27PM -0700, Omar Sandoval wrote: > From: Omar Sandoval > > Signed-off-by: Omar Sandoval Reviewed-by: David Sterba Added some changelog. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Austin S. Hemmelgarn
On 2017-06-21 13:20, Andrei Borzenkov wrote: 21.06.2017 16:41, Austin S. Hemmelgarn пишет: On 2017-06-21 08:43, Christoph Anton Mitterer wrote: On Wed, 2017-06-21 at 16:45 +0800, Qu Wenruo wrote: Btrfs is always using device ID to build up its device mapping. And for any multi-device

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Andrei Borzenkov
21.06.2017 16:41, Austin S. Hemmelgarn пишет: > On 2017-06-21 08:43, Christoph Anton Mitterer wrote: >> On Wed, 2017-06-21 at 16:45 +0800, Qu Wenruo wrote: >>> Btrfs is always using device ID to build up its device mapping. >>> And for any multi-device implementation (LVM,mdadam) it's never a >>>

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Andrei Borzenkov
21.06.2017 09:51, Marat Khalili пишет: > On 21/06/17 06:48, Chris Murphy wrote: >> Another possibility is to ensure a new write is written to a new*not* >> full stripe, i.e. dynamic stripe size. So if the modification is a 50K >> file on a 4 disk raid5; instead of writing 3 64K data strips + 1 64K

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Goffredo Baroncelli
Hi Qu, On 2017-06-21 10:45, Qu Wenruo wrote: > At 06/21/2017 06:57 AM, waxhead wrote: >> I am trying to piece together the actual status of the RAID5/6 bit of BTRFS. >> The wiki refer to kernel 3.19 which was released in February 2015 so I assume >> that the information there is a tad outdated

Re: btrfs_wait_ordered_roots warning triggered

2017-06-21 Thread Dave Jones
On Wed, Jun 21, 2017 at 11:52:36AM -0400, Chris Mason wrote: > On 06/21/2017 11:16 AM, Dave Jones wrote: > > WARNING: CPU: 2 PID: 7153 at fs/btrfs/ordered-data.c:753 > > btrfs_wait_ordered_roots+0x1a3/0x220 > > CPU: 2 PID: 7153 Comm: kworker/u8:7 Not tainted 4.12.0-rc6-think+ #4 > >

btrfs_wait_ordered_roots warning triggered

2017-06-21 Thread Dave Jones
WARNING: CPU: 2 PID: 7153 at fs/btrfs/ordered-data.c:753 btrfs_wait_ordered_roots+0x1a3/0x220 CPU: 2 PID: 7153 Comm: kworker/u8:7 Not tainted 4.12.0-rc6-think+ #4 Workqueue: events_unbound btrfs_async_reclaim_metadata_space task: 8804f08d5380 task.stack: c9000895c000 RIP:

[PATCH] btrfs: fix validation of XATTR_ITEM dir items

2017-06-21 Thread David Sterba
The XATTR_ITEM is a type of a directory item so we use the common validator helper. We have to adjust the limits because of potential data_len (ie. the xattr value), which is otherwise 0 for other directory items. Signed-off-by: David Sterba --- fs/btrfs/dir-item.c | 12

Re: btrfs_wait_ordered_roots warning triggered

2017-06-21 Thread Chris Mason
On 06/21/2017 11:16 AM, Dave Jones wrote: WARNING: CPU: 2 PID: 7153 at fs/btrfs/ordered-data.c:753 btrfs_wait_ordered_roots+0x1a3/0x220 CPU: 2 PID: 7153 Comm: kworker/u8:7 Not tainted 4.12.0-rc6-think+ #4 Workqueue: events_unbound btrfs_async_reclaim_metadata_space task: 8804f08d5380

Re: RFC: Compression - calculate entropy for data set

2017-06-21 Thread Timofey Titovets
I've done with heuristic method, so i post some performance test output: (I store test data in /run/user/$UID/, and script just ran programm 2 times) ### # Performance test will measure initialization time # And remove it from run time of tests # This may be inaccurate in some cases # But this

How to fix errors that check --mode lomem finds, but --mode normal doesn't?

2017-06-21 Thread Marc MERLIN
On Tue, Jun 20, 2017 at 08:43:52PM -0700, Marc MERLIN wrote: > On Tue, Jun 20, 2017 at 09:31:42PM -0600, Chris Murphy wrote: > > On Tue, Jun 20, 2017 at 5:12 PM, Marc MERLIN wrote: > > > > > I'm now going to remount this with nospace_cache to see if your guess > > > about > >

Re: [PATCH] btrfs: use new block error code

2017-06-21 Thread Jens Axboe
On 06/21/2017 07:17 AM, David Sterba wrote: > On Mon, Jun 19, 2017 at 01:55:37PM +0300, Dan Carpenter wrote: >> This function is supposed to return blk_status_t error codes now but >> there was a stray -ENOMEM left behind. >> >> Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") >>

Re: [PATCH] btrfs: add cond_resched to btrfs_qgroup_trace_leaf_items

2017-06-21 Thread David Sterba
On Tue, Jun 20, 2017 at 08:15:26AM -0400, je...@suse.com wrote: > From: Jeff Mahoney > > On an uncontended system, we can end up hitting soft lockups while > doing replace_path. At the core, and frequently called is > btrfs_qgroup_trace_leaf_items, so it makes sense to add a

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Austin S. Hemmelgarn
On 2017-06-21 08:43, Christoph Anton Mitterer wrote: On Wed, 2017-06-21 at 16:45 +0800, Qu Wenruo wrote: Btrfs is always using device ID to build up its device mapping. And for any multi-device implementation (LVM,mdadam) it's never a good idea to use device path. Isn't it rather the other

Re: [PATCH] btrfs: use new block error code

2017-06-21 Thread David Sterba
On Mon, Jun 19, 2017 at 01:55:37PM +0300, Dan Carpenter wrote: > This function is supposed to return blk_status_t error codes now but > there was a stray -ENOMEM left behind. > > Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") > Signed-off-by: Dan Carpenter

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Christoph Anton Mitterer
On Wed, 2017-06-21 at 16:45 +0800, Qu Wenruo wrote: > Btrfs is always using device ID to build up its device mapping. > And for any multi-device implementation (LVM,mdadam) it's never a > good  > idea to use device path. Isn't it rather the other way round? Using the ID is bad? Don't you remember

Re: 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean

2017-06-21 Thread Duncan
Marc MERLIN posted on Tue, 20 Jun 2017 16:12:03 -0700 as excerpted: > On Tue, Jun 20, 2017 at 08:44:29AM -0700, Marc MERLIN wrote: >> On Tue, Jun 20, 2017 at 03:36:01PM +, Hugo Mills wrote: >> "space cache will be invalidated " => doesn't that mean that my cache was already cleared

[PATCH] btrfs: DEBUG fiemap: Show more info about extent_fiemap

2017-06-21 Thread Qu Wenruo
Hi Adam, Would you please try this patch based on v4.12-rc5 and try to reproduce the kernel warning? It would be better to eliminate the noisy by ensure there is no other fiemap caller on btrfs. Thanks, Qu Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 23

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Qu Wenruo
At 06/21/2017 06:57 AM, waxhead wrote: I am trying to piece together the actual status of the RAID5/6 bit of BTRFS. The wiki refer to kernel 3.19 which was released in February 2015 so I assume that the information there is a tad outdated (the last update on the wiki page was July 2016)

Re: [PATCH v3] btrfs: fiemap: Cache and merge fiemap extent before submit it to user

2017-06-21 Thread Qu Wenruo
At 06/18/2017 09:42 PM, Adam Borowski wrote: On Sun, Jun 18, 2017 at 07:23:00PM +0800, Qu Wenruo wrote: [ 39.726215] BTRFS warning (device sda1): unhandled fiemap cache detected: offset=phys$35798867968 len1072 flags=0x2008 [ 151.882586] BTRFS warning (device sda1): unhandled fiemap

[PATCH v2] btrfs/146: Test various btrfs operations rounding behavior

2017-06-21 Thread Nikolay Borisov
When changing the size of disks/filesystem we should always be rounding down to a multiple of sectorsize Signed-off-by: Nikolay Borisov --- Changes since v1: - Worked on incorporated feedback by Eryu - Changed test number to 146 to avoid clashes tests/btrfs/146 |

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Peter Grandi
> [ ... ] This will make some filesystems mostly RAID1, negating > all space savings of RAID5, won't it? [ ... ] RAID5/RAID6/... don't merely save space, more precisely they trade lower resilience and a more anisotropic and smaller performance envelope to gain lower redundancy (= save space). --

Re: Exactly what is wrong with RAID5/6

2017-06-21 Thread Marat Khalili
On 21/06/17 06:48, Chris Murphy wrote: Another possibility is to ensure a new write is written to a new*not* full stripe, i.e. dynamic stripe size. So if the modification is a 50K file on a 4 disk raid5; instead of writing 3 64K data strips + 1 64K parity strip (a full stripe write); write out 1