Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Pierre Couderc
On 08/31/2018 04:29 AM, Duncan wrote: Chris Murphy posted on Thu, 30 Aug 2018 11:08:28 -0600 as excerpted: My purpose is a simple RAID1 main fs, with bootable flag on the 2 disks in prder to start in degraded mode Good luck with this. The Btrfs archives are full of various limitations o

Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Pierre Couderc
On 08/30/2018 07:08 PM, Chris Murphy wrote: On Thu, Aug 30, 2018 at 3:13 AM, Pierre Couderc wrote: Trying to install a RAID1 on a debian stretch, I made some mistake and got this, after installing on disk1 and trying to add second disk : root@server:~# fdisk -l Disk /dev/sda: 1.8 TiB, 200

Re: [patch] file dedupe (and maybe clone) data corruption (was Re: [PATCH] generic: test for deduplication between different files)

2018-08-30 Thread Zygo Blaxell
On Thu, Aug 30, 2018 at 04:27:43PM +1000, Dave Chinner wrote: > On Thu, Aug 23, 2018 at 08:58:49AM -0400, Zygo Blaxell wrote: > > On Mon, Aug 20, 2018 at 08:33:49AM -0700, Darrick J. Wong wrote: > > > On Mon, Aug 20, 2018 at 11:09:32AM +1000, Dave Chinner wrote: > > > > - is documenting rej

Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Duncan
Chris Murphy posted on Thu, 30 Aug 2018 11:08:28 -0600 as excerpted: >> My purpose is a simple RAID1 main fs, with bootable flag on the 2 disks >> in prder to start in degraded mode > > Good luck with this. The Btrfs archives are full of various limitations > of Btrfs raid1. There is no autom

[PATCH 3/3] btrfs: qgroup: Remove deprecated feature support in btrfs_qgorup_inhert()

2018-08-30 Thread Qu Wenruo
Since btrfs_validate_inherit() will not allow features like copy rfer/excl and limit set, remove these dead code. Signed-off-by: Qu Wenruo --- fs/btrfs/qgroup.c | 57 +-- 1 file changed, 1 insertion(+), 56 deletions(-) diff --git a/fs/btrfs/qgroup.c b

[PATCH 2/3] btrfs: qgroup: Validate btrfs_qgroup_inherit structure before passing it to qgroup

2018-08-30 Thread Qu Wenruo
btrfs_qgroup_inherit structure doesn't goes through much validation check. Now do a comprehensive check for it, including: 1) inherit size Should not exceeding SZ_4K and its num_qgroups should not exceed its size passed in btrfs_ioctl_vol_args_v2. 2) flags Should not include any unknown

[PATCH 1/3] btrfs: Set qgroup inherit size limit to SZ_4K instead of page size

2018-08-30 Thread Qu Wenruo
Change btrfs_qgroup_inherit maximum size from PAGE_SIZE to SZ_4K to make it consistent across different architectures. Although in theory this could lead to incompatibility, but considering how rare btrfs_qgroup_inherit is used, it's still not too late to change it without impacting a large user b

[PATCH 0/3] btrfs: qgroup: Deprecate unused features for btrfs_qgroup_inherit()

2018-08-30 Thread Qu Wenruo
This patchset can be fetched from github: https://github.com/adam900710/linux/tree/qgroup_inherit_check Which is based on v4.19-rc1 tag. This patchset will first set btrfs_qgroup_inherit structure size limit from PAGE_SIZE to fixed SZ_4K. I understand this normally will cause compatibility problem

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-08-30 Thread Su Yue
Thank for the report. On 08/31/2018 12:47 AM, Christoph Anton Mitterer wrote: Hey. I've the following on a btrfs that's basically the system fs for my notebook: When booting from a USB stick with: # uname -a Linux heisenberg 4.17.0-3-amd64 #1 SMP Debian 4.17.17-1 (2018-08-18) x86_64 GNU/Linux

Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Chris Murphy
And also, I'll argue this might have been a btrfs-progs bug as well, depending on what version was used and the command. Both mkfs and dev add should not be able to add type code 0x05. At least libblkid correctly shows that it's 1KiB in size, so really Btrfs should not succeed at adding this device

[PATCH 34/35] btrfs: wait on ordered extents on abort cleanup

2018-08-30 Thread Josef Bacik
If we flip read-only before we initiate writeback on all dirty pages for ordered extents we've created then we'll have ordered extents left over on umount, which results in all sorts of bad things happening. Fix this by making sure we wait on ordered extents if we have to do the aborted transactio

[PATCH 11/35] btrfs: don't use global rsv for chunk allocation

2018-08-30 Thread Josef Bacik
We've done this forever because of the voodoo around knowing how much space we have. However we have better ways of doing this now, and on normal file systems we'll easily have a global reserve of 512MiB, and since metadata chunks are usually 1GiB that means we'll allocate metadata chunks more rea

[PATCH 17/35] btrfs: move the dio_sem higher up the callchain

2018-08-30 Thread Josef Bacik
We're getting a lockdep splat because we take the dio_sem under the log_mutex. What we really need is to protect fsync() from logging an extent map for an extent we never waited on higher up, so just guard the whole thing with dio_sem. Signed-off-by: Josef Bacik --- fs/btrfs/file.c | 12 +++

[PATCH 31/35] btrfs: clear delayed_refs_rsv for dirty bg cleanup

2018-08-30 Thread Josef Bacik
We keep track of dirty bg's as a reservation in the delayed_refs_rsv, so when we abort and we cleanup those dirty bgs we need to drop their reservation so we don't have accounting issues and lots of scary messages on umount. Signed-off-by: Josef Bacik --- fs/btrfs/disk-io.c | 1 + 1 file changed

[PATCH 13/35] btrfs: reset max_extent_size properly

2018-08-30 Thread Josef Bacik
If we use up our block group before allocating a new one we'll easily get a max_extent_size that's set really really low, which will result in a lot of fragmentation. We need to make sure we're resetting the max_extent_size when we add a new chunk or add new space. Signed-off-by: Josef Bacik ---

[PATCH 18/35] btrfs: set max_extent_size properly

2018-08-30 Thread Josef Bacik
From: Josef Bacik We can't use entry->bytes if our entry is a bitmap entry, we need to use entry->max_extent_size in that case. Fix up all the logic to make this consistent. Signed-off-by: Josef Bacik --- fs/btrfs/free-space-cache.c | 29 +++-- 1 file changed, 19 inser

[PATCH 29/35] btrfs: just delete pending bgs if we are aborted

2018-08-30 Thread Josef Bacik
We still need to do all of the accounting cleanup for pending block groups if we abort. So set the ret to trans->aborted so if we aborted the cleanup happens and everybody is happy. Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff

[PATCH 30/35] btrfs: cleanup pending bgs on transaction abort

2018-08-30 Thread Josef Bacik
We may abort the transaction during a commit and not have a chance to run the pending bgs stuff, which will leave block groups on our list and cause us accounting issues and leaked memory. Fix this by running the pending bgs when we cleanup a transaction. Signed-off-by: Josef Bacik --- fs/btrfs

[PATCH 12/35] btrfs: add ALLOC_CHUNK_FORCE to the flushing code

2018-08-30 Thread Josef Bacik
With my change to no longer take into account the global reserve for metadata allocation chunks we have this side-effect for mixed block group fs'es where we are no longer allocating enough chunks for the data/metadata requirements. To deal with this add a ALLOC_CHUNK_FORCE step to the flushing st

[PATCH 27/35] btrfs: handle delayed ref head accounting cleanup in abort

2018-08-30 Thread Josef Bacik
We weren't doing any of the accounting cleanup when we aborted transactions. Fix this by making cleanup_ref_head_accounting global and calling it from the abort code, this fixes the issue where our accounting was all wrong after the fs aborts. Signed-off-by: Josef Bacik --- fs/btrfs/ctree.h

[PATCH 32/35] btrfs: only free reserved extent if we didn't insert it

2018-08-30 Thread Josef Bacik
When we insert the file extent once the ordered extent completes we free the reserved extent reservation as it'll have been migrated to the bytes_used counter. However if we error out after this step we'll still clear the reserved extent reservation, resulting in a negative accounting of the reser

[PATCH 28/35] btrfs: call btrfs_create_pending_block_groups unconditionally

2018-08-30 Thread Josef Bacik
The first thing we do is loop through the list, this if (!list_empty()) btrfs_create_pending_block_groups(); thing is just wasted space. Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 3 +-- fs/btrfs/transaction.c | 6 ++ 2 files changed, 3 insertions(+), 6 deletions(-) d

[PATCH 35/35] MAINTAINERS: update my email address for btrfs

2018-08-30 Thread Josef Bacik
My work email is completely useless, switch it to my personal address so I get emails on a account I actually pay attention to. Signed-off-by: Josef Bacik --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 32fbc6f732d4..7723dc958e9

[PATCH 10/35] btrfs: fix truncate throttling

2018-08-30 Thread Josef Bacik
We have a bunch of magic to make sure we're throttling delayed refs when truncating a file. Now that we have a delayed refs rsv and a mechanism for refilling that reserve simply use that instead of all of this magic. Signed-off-by: Josef Bacik --- fs/btrfs/inode.c | 78 -

[PATCH 08/35] btrfs: release metadata before running delayed refs

2018-08-30 Thread Josef Bacik
We want to release the unused reservation we have since it refills the delayed refs reserve, which will make everything go smoother when running the delayed refs if we're short on our reservation. Signed-off-by: Josef Bacik --- fs/btrfs/transaction.c | 6 +++--- 1 file changed, 3 insertions(+),

[PATCH 33/35] btrfs: fix insert_reserved error handling

2018-08-30 Thread Josef Bacik
We were not handling the reserved byte accounting properly for data references. Metadata was fine, if it errored out the error paths would free the bytes_reserved count and pin the extent, but it even missed one of the error cases. So instead move this handling up into run_one_delayed_ref so we a

[PATCH 24/35] btrfs: pass delayed_refs_root to btrfs_delayed_ref_lock

2018-08-30 Thread Josef Bacik
We don't need the trans except to get the delayed_refs_root, so just pass the delayed_refs_root into btrfs_delayed_ref_lock and call it a day. Signed-off-by: Josef Bacik --- fs/btrfs/delayed-ref.c | 5 + fs/btrfs/delayed-ref.h | 2 +- fs/btrfs/extent-tree.c | 2 +- 3 files changed, 3 inserti

[PATCH 25/35] btrfs: make btrfs_destroy_delayed_refs use btrfs_delayed_ref_lock

2018-08-30 Thread Josef Bacik
We have this open coded in btrfs_destroy_delayed_refs, use the helper instead. Signed-off-by: Josef Bacik --- fs/btrfs/disk-io.c | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 11ea2ea7439e..c72ab2ca7627 100644 --- a/f

[PATCH 26/35] btrfs: make btrfs_destroy_delayed_refs use btrfs_delete_ref_head

2018-08-30 Thread Josef Bacik
Instead of open coding this stuff use the helper instead. Signed-off-by: Josef Bacik --- fs/btrfs/disk-io.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c72ab2ca7627..1d3f5731d616 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/

[PATCH 21/35] btrfs: only run delayed refs if we're committing

2018-08-30 Thread Josef Bacik
I noticed in a giant dbench run that we spent a lot of time on lock contention while running transaction commit. This is because dbench results in a lot of fsync()'s that do a btrfs_transaction_commit(), and they all run the delayed refs first thing, so they all contend with each other. This lead

[PATCH 23/35] btrfs: assert on non-empty delayed iputs

2018-08-30 Thread Josef Bacik
I ran into an issue where there was some reference being held on an inode that I couldn't track. This assert wasn't triggered, but it at least rules out we're doing something stupid. Signed-off-by: Josef Bacik --- fs/btrfs/disk-io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/d

[PATCH 19/35] btrfs: don't use ctl->free_space for max_extent_size

2018-08-30 Thread Josef Bacik
From: Josef Bacik max_extent_size is supposed to be the largest contiguous range for the space info, and ctl->free_space is the total free space in the block group. We need to keep track of these separately and _only_ use the max_free_space if we don't have a max_extent_size, as that means our o

[PATCH 22/35] btrfs: make sure we create all new bgs

2018-08-30 Thread Josef Bacik
We can actually allocate new chunks while we're creating our bg's, so instead of doing list_for_each_safe, just do while (!list_empty()) so we make sure to catch any new bg's that get added to the list. Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 7 +-- 1 file changed, 5 insertio

[PATCH 20/35] btrfs: reset max_extent_size on clear in a bitmap

2018-08-30 Thread Josef Bacik
From: Josef Bacik We need to clear the max_extent_size when we clear bits from a bitmap since it could have been from the range that contains the max_extent_size. Signed-off-by: Josef Bacik --- fs/btrfs/free-space-cache.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/free-spac

[PATCH 16/35] btrfs: loop in inode_rsv_refill

2018-08-30 Thread Josef Bacik
With severe fragmentation we can end up with our inode rsv size being huge during writeout, which would cause us to need to make very large metadata reservations. However we may not actually need that much once writeout is complete. So instead try to make our reservation, and if we couldn't make

[PATCH 14/35] btrfs: don't enospc all tickets on flush failure

2018-08-30 Thread Josef Bacik
With the introduction of the per-inode block_rsv it became possible to have really really large reservation requests made because of data fragmentation. Since the ticket stuff assumed that we'd always have relatively small reservation requests it just killed all tickets if we were unable to satisf

[PATCH 15/35] btrfs: run delayed iputs before committing

2018-08-30 Thread Josef Bacik
We want to have a complete picture of any delayed inode updates before we make the decision to commit or not, so make sure we run delayed iputs before making the decision to commit or not. Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 4 1 file changed, 4 insertions(+) diff --git

[PATCH 02/35] btrfs: add cleanup_ref_head_accounting helper

2018-08-30 Thread Josef Bacik
From: Josef Bacik We were missing some quota cleanups in check_ref_cleanup, so break the ref head accounting cleanup into a helper and call that from both check_ref_cleanup and cleanup_ref_head. This will hopefully ensure that we don't screw up accounting in the future for other things that we a

[PATCH 09/35] btrfs: protect space cache inode alloc with nofs

2018-08-30 Thread Josef Bacik
If we're allocating a new space cache inode it's likely going to be under a transaction handle, so we need to use memalloc_nofs_save() in order to avoid deadlocks, and more importantly lockdep messages that make xfstests fail. Signed-off-by: Josef Bacik --- fs/btrfs/free-space-cache.c | 4

[PATCH 05/35] btrfs: introduce delayed_refs_rsv

2018-08-30 Thread Josef Bacik
From: Josef Bacik Traditionally we've had voodoo in btrfs to account for the space that delayed refs may take up by having a global_block_rsv. This works most of the time, except when it doesn't. We've had issues reported and seen in production where sometimes the global reserve is exhausted du

[PATCH 01/35] btrfs: add btrfs_delete_ref_head helper

2018-08-30 Thread Josef Bacik
From: Josef Bacik We do this dance in cleanup_ref_head and check_ref_cleanup, unify it into a helper and cleanup the calling functions. Signed-off-by: Josef Bacik --- fs/btrfs/delayed-ref.c | 14 ++ fs/btrfs/delayed-ref.h | 3 ++- fs/btrfs/extent-tree.c | 24 --

[PATCH 00/35] My current patch queue

2018-08-30 Thread Josef Bacik
This is the current queue of things that I've been working on. The main thing these patches are doing is separating out the delayed refs reservations from the global reserve into their own block rsv. We have been consistently hitting issues in production where we abort a transaction because we ru

[PATCH 06/35] btrfs: check if free bgs for commit

2018-08-30 Thread Josef Bacik
may_commit_transaction will skip committing the transaction if we don't have enough pinned space or if we're trying to find space for a SYSTEM chunk. However if we have pending free block groups in this transaction we still want to commit as we may be able to allocate a chunk to make our reservati

[PATCH 04/35] btrfs: only track ref_heads in delayed_ref_updates

2018-08-30 Thread Josef Bacik
From: Josef Bacik We use this number to figure out how many delayed refs to run, but __btrfs_run_delayed_refs really only checks every time we need a new delayed ref head, so we always run at least one ref head completely no matter what the number of items on it. So instead track only the ref he

[PATCH 03/35] btrfs: use cleanup_extent_op in check_ref_cleanup

2018-08-30 Thread Josef Bacik
From: Josef Bacik Unify the extent_op handling as well, just add a flag so we don't actually run the extent op from check_ref_cleanup and instead return a value so that we can skip cleaning up the ref head. Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 17 + 1 file ch

[PATCH 07/35] btrfs: dump block_rsv whe dumping space info

2018-08-30 Thread Josef Bacik
For enospc_debug having the block rsvs is super helpful to see if we've done something wrong. Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 80615a579b18..df826f71303

Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Chris Murphy
On Thu, Aug 30, 2018 at 9:21 AM, Alberto Bursi wrote: > > On 8/30/2018 11:13 AM, Pierre Couderc wrote: >> Trying to install a RAID1 on a debian stretch, I made some mistake and >> got this, after installing on disk1 and trying to add second disk : >> >> >> root@server:~# fdisk -l >> Disk /dev/sda:

Re: [RFC PATCH 0/6] btrfs-progs: build distinct binaries for specific btrfs subcommands

2018-08-30 Thread Austin S. Hemmelgarn
On 2018-08-30 13:13, Axel Burri wrote: On 29/08/2018 21.02, Austin S. Hemmelgarn wrote: On 2018-08-29 13:24, Axel Burri wrote: This patch allows to build distinct binaries for specific btrfs subcommands, e.g. "btrfs-subvolume-show" which would be identical to "btrfs subvolume show". Motivatio

Re: [RFC PATCH 0/6] btrfs-progs: build distinct binaries for specific btrfs subcommands

2018-08-30 Thread Axel Burri
On 29/08/2018 21.02, Austin S. Hemmelgarn wrote: > On 2018-08-29 13:24, Axel Burri wrote: >> This patch allows to build distinct binaries for specific btrfs >> subcommands, e.g. "btrfs-subvolume-show" which would be identical to >> "btrfs subvolume show". >> >> >> Motivation: >> >> While btrfs-prog

Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Chris Murphy
On Thu, Aug 30, 2018 at 3:13 AM, Pierre Couderc wrote: > Trying to install a RAID1 on a debian stretch, I made some mistake and got > this, after installing on disk1 and trying to add second disk : > > > root@server:~# fdisk -l > Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors > U

fsck lowmem mode only: ERROR: errors found in fs roots

2018-08-30 Thread Christoph Anton Mitterer
Hey. I've the following on a btrfs that's basically the system fs for my notebook: When booting from a USB stick with: # uname -a Linux heisenberg 4.17.0-3-amd64 #1 SMP Debian 4.17.17-1 (2018-08-18) x86_64 GNU/Linux # btrfs --version btrfs-progs v4.17 ... a lowmem mode fsck gives no error: #

Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Alberto Bursi
On 8/30/2018 11:13 AM, Pierre Couderc wrote: > Trying to install a RAID1 on a debian stretch, I made some mistake and > got this, after installing on disk1 and trying to add second disk : > > > root@server:~# fdisk -l > Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors > Units: sect

Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Kai Stian Olstad
On Thursday, 30 August 2018 12:01:55 CEST Pierre Couderc wrote: > > On 08/30/2018 11:35 AM, Qu Wenruo wrote: > > > > On 2018/8/30 下午5:13, Pierre Couderc wrote: > >> Trying to install a RAID1 on a debian stretch, I made some mistake and > >> got this, after installing on disk1 and trying to add sec

Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Qu Wenruo
On 2018/8/30 下午6:01, Pierre Couderc wrote: > > > On 08/30/2018 11:35 AM, Qu Wenruo wrote: >> >> On 2018/8/30 下午5:13, Pierre Couderc wrote: >>> Trying to install a RAID1 on a debian stretch, I made some mistake and >>> got this, after installing on disk1 and trying to add second disk  : >>> >>>

Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Pierre Couderc
On 08/30/2018 11:35 AM, Qu Wenruo wrote: On 2018/8/30 下午5:13, Pierre Couderc wrote: Trying to install a RAID1 on a debian stretch, I made some mistake and got this, after installing on disk1 and trying to add second disk  : root@server:~# fdisk -l Disk /dev/sda: 1.8 TiB, 2000398934016 byte

Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Qu Wenruo
On 2018/8/30 下午5:13, Pierre Couderc wrote: > Trying to install a RAID1 on a debian stretch, I made some mistake and > got this, after installing on disk1 and trying to add second disk  : > > > root@server:~# fdisk -l > Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors > Units: sec

Re: How to erase a RAID1 (+++)?

2018-08-30 Thread Pierre Couderc
Trying to install a RAID1 on a debian stretch, I made some mistake and got this, after installing on disk1 and trying to add second disk  : root@server:~# fdisk -l Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical

[PATCH] btrfs-progs: dump-tree: print invalid argument and strerror

2018-08-30 Thread Su Yue
Before this patch: $ ls nothingness ls: cannot access 'nothingness': No such file or directory $ btrfs inspect-internal dump-tree nothingness ERROR: not a block device or regular file: nothingness The confusing error message makes users thinks that nonexistent file is existed but in wrong type. T