System very unresponsive during btrfs device delete - can it be paused?

2016-03-29 Thread Pete
I am deleting a drive using btrfs device delete . The system is very unresponsive, it can be unresponsive for periods of around 10 seconds, windows not refreshing, not responding to the keyboard for example. Suspect it is why printing is not working at present. Is there anyway of pausing or can

Re: RAID Assembly with Missing Empty Drive

2016-03-29 Thread John Marrett
> I think it is best that you just repeat the fixing again on the real > disks and just make sure you have an uptodate/latest kernel+tools when > fixing the few damaged files. > With btrfs inspect-internal inode-resolve 257 > you can see what file(s) are damaged. I inspected the damaged files,

Re: System very unresponsive during btrfs device delete - can it be paused?

2016-03-29 Thread Pete
On 03/29/2016 12:47 PM, Pete wrote: > keyboard for example. Suspect it is why printing is not working at > present. Is there anyway of pausing or cancelling so I can get stuff ...or it could be due to pulling out the usb cable when adding a disk... -- To unsubscribe from this list: send the li

[PATCH 11/12] btrfs: introduce helper functions to perform hot replace

2016-03-29 Thread Anand Jain
Hot replace / auto replace is important volume manager feature and is critical to the data center operations, so that the degraded volume can be brought back to a healthy state at the earliest and without manual intervention. This modifies the existing replace code to suite the need of auto replac

[PATCH 02/12] btrfs: Do per-chunk check for mount time check

2016-03-29 Thread Anand Jain
From: Qu Wenruo Now use the btrfs_check_degraded() to do mount time degraded check. With this patch, now we can mount with the following case: # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc # wipefs -a /dev/sdc # mount /dev/sdb /mnt/btrfs -o degraded As the single data chunk is only in

[PATCH 03/12] btrfs: Do per-chunk degraded check for remount

2016-03-29 Thread Anand Jain
From: Qu Wenruo Just the same for mount time check, use new btrfs_check_degraded() to do per chunk check. Signed-off-by: Qu Wenruo Btrfs: use btrfs_error instead of btrfs_err during remount Signed-off-by: Anand Jain --- fs/btrfs/super.c | 11 +++ 1 file changed, 7 insertions(+), 4 d

[PATCH 08/12] btrfs: add check not to mount a spare device

2016-03-29 Thread Anand Jain
Spare devices can be scanned but shouldn't be mountable. Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 7f02f1766037..b99329e37965 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c

[PATCH 10/12] btrfs: provide framework to get and put a spare device

2016-03-29 Thread Anand Jain
This adds functions to get and put a spare device from the list. So that hot repace code can pick a spare device when needed. Signed-off-by: Anand Jain --- fs/btrfs/super.c | 9 + fs/btrfs/volumes.c | 37 + fs/btrfs/volumes.h | 2 ++ 3 files change

[PATCH 06/12] btrfs: introduce device dynamic state transition to offline or failed

2016-03-29 Thread Anand Jain
Need device forced offline/failed feature for the following reasons, 1) a. it can be reported that device has failed when it does b. close the device when it goes offline so that blocklayer can cleanup 2) identify the candidate for the auto replace 3) avoid further commit error reported ag

[PATCH 09/12] btrfs: support btrfs dev scan for spare device

2016-03-29 Thread Anand Jain
When the user or system calls the BTRFS_IOC_SCAN_DEV, ioctl this patch will make sure it is added to the device list and set it as spare. This operation will be same when BTRFS_IOC_DEVICES_READY as well since BTRFS_IOC_DEVICES_READY ioctl has been doing that by legacy. Signed-off-by: Anand Jain

[PATCH 07/12] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV

2016-03-29 Thread Anand Jain
Add BTRFS_FEATURE_INCOMPAT_SPARE_DEV (400) flag to identify a spare device. Along with this it checks in the mount context that a spare device will fail to mount. As spare devices aren't mountable. Signed-off-by: Anand Jain --- fs/btrfs/ctree.h | 4 +++- 1 file changed, 3 insertions(+), 1 dele

[PATCH 04/12] btrfs: Allow barrier_all_devices to do per-chunk device check

2016-03-29 Thread Anand Jain
From: Qu Wenruo The last user of num_tolerated_disk_barrier_failures is barrier_all_devices(). But it's can be easily changed to new per-chunk degradable check framework. Now btrfs_device will have two extra members, representing send/wait error, set at write_dev_flush() time. And then check it

[PATCH 05/12] btrfs: Cleanup num_tolerated_disk_barrier_failures

2016-03-29 Thread Anand Jain
From: Qu Wenruo As we use per-chunk degradable check, now the global num_tolerated_disk_barrier_failures is of no use. So cleanup it. Signed-off-by: Qu Wenruo [Btrfs: resolve conflict to apply 'btrfs: Cleanup num_tolerated_disk_barrier_failures'] Signed-off-by: Anand Jain --- fs/btrfs/ctree

[PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace

2016-03-29 Thread Anand Jain
Thanks for various comments, tests and feedback. Background: Hot spare and Auto replace: Hot spare is predominately used to mitigate or narrow the time window of a storage in degraded mode during which any further disk failure might lead to a catastrophic data loss. Data center storage general

[PATCH 3/4] btrfs-progs: add fi show for spare

2016-03-29 Thread Anand Jain
Signed-off-by: Anand Jain --- cmds-filesystem.c | 4 1 file changed, 4 insertions(+) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 38404d29026e..0901c47e8679 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -351,6 +351,9 @@ static void print_one_uuid(struct btrfs_fs_devi

[PATCH v2 2/4] btrfs-progs: Introduce btrfs spare subcommand

2016-03-29 Thread Anand Jain
Adds a new sub command so that a global spare device can be added. A sub cli is better so that we can enhance to provide per FSID spare in future. btrfs spare add .. This will create a btrfs on the dev with the newly introduced flag, BTRFS_FEATURE_INCOMPAT_SPARE_DEV. And then calls btrfs_registe

[PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags

2016-03-29 Thread Anand Jain
Signed-off-by: Anand Jain --- ctree.h | 4 +++- volumes.c | 4 volumes.h | 2 ++ 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/ctree.h b/ctree.h index 5ab0f4a45a15..97cbd032fbb1 100644 --- a/ctree.h +++ b/ctree.h @@ -480,6 +480,7 @@ struct btrfs_super_block { #define BTRFS

[PATCH 4/4] btrfs-progs: add global spare device list to filesystem show

2016-03-29 Thread Anand Jain
This patch will add list of spare devices to the filesystem show output, as show in the example below. btrfs fi show Label: none uuid: 17f7d403-17d7-4f0a-b8ba-de673fdd3f56 Total devices 2 FS bytes used 15.88MiB devid1 size 2.00GiB used 417.50MiB path /dev/sdc devid

[PATCH 12/12] btrfs: check device for critical errors and mark failed

2016-03-29 Thread Anand Jain
Write and Flush errors are considered as critical errors, upon which the device will be brought offline and marked as failed. Write and Flush errors are identified using device error statistics. Signed-off-by: Anand Jain btrfs: check for failed device and hot replace This patch creates casualty

[PATCH] btrfs: debug: procfs-devlist: introduce procfs interface for the device list for debugging

2016-03-29 Thread Anand Jain
From: Anand Jain This patch introduces profs interface /proc/fs/btrfs/devlist, which as of now exports all the members of kernel fs_devices. The current /sys/fs/btrfs interface works when the fs is mounted, and is on the file directory hierarchy and also has the sysfs limitation max output of

Re: [PATCH] btrfs: debug: procfs-devlist: introduce procfs interface for the device list for debugging

2016-03-29 Thread Anand Jain
Sorry the commit log should have been.. ** This patch is not for integration but to debug/visualize the btrfs device tree. ** This patch introduces profs interface /proc/fs/btrfs/devlist, which as of now exports all the members of kernel fs_devices including some of the patches in the ML.

[PATCH 01/12] btrfs: Introduce a new function to check if all chunks a OK for degraded mount

2016-03-29 Thread Anand Jain
From: Qu Wenruo Introduce a new function, btrfs_check_degradable(), to judge if all chunks in btrfs is OK for degraded mount. It provides the new basis for accurate btrfs mount/remount and even runtime degraded mount check other than old one-size-fit-all method. Signed-off-by: Qu Wenruo --- f

Re: Global hotspare functionality

2016-03-29 Thread Anand Jain
Hi Yauhen, Thanks ! more below.. On 03/19/2016 03:39 AM, Yauhen Kharuzhy wrote: Hi all, I try to get Anand's patchset for global hotspare functionality working. Now it's working for me but I have met number of issues while applying and patches testing. I took latest versions of patchset

Re: Global hotspare functionality

2016-03-29 Thread Anand Jain
On 03/19/2016 09:17 AM, Yauhen Kharuzhy wrote: Issue 5: Race between close_ctree() and casualty_kthread(): This is fixed in V2, thanks for the report. - Anand close_ctree(): if (fs_info->casualty_kthread) kthread_stop(fs_info->casualty_kthread); casualty_kthrea

Re: [PATCH 11/12] btrfs: introduce helper functions to perform hot replace

2016-03-29 Thread kbuild test robot
Hi Anand, [auto build test ERROR on btrfs/next] [also build test ERROR on v4.6-rc1 next-20160329] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] url: https://github.com/0day-ci/linux/commits/Anand-Jain/btrfs-Introduce-a-new-function-to

Re: [PATCH] btrfs: debug: procfs-devlist: introduce procfs interface for the device list for debugging

2016-03-29 Thread kbuild test robot
Hi Anand, [auto build test ERROR on btrfs/next] [also build test ERROR on v4.6-rc1 next-20160329] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] url: https://github.com/0day-ci/linux/commits/Anand-Jain/btrfs-debug-procfs-devlist

Re: [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework

2016-03-29 Thread Alex Lyakas
Greetings Qu Wenruo, I have reviewed the dedup patchset found in the github account you mentioned. I have several questions. Please note that by all means I am not criticizing your design or code. I just want to make sure that my understanding of the code is proper. 1) You mentioned in several em

Re: [PATCH v8 11/27] btrfs: dedupe: Introduce interfaces to resume and cleanup dedupe info

2016-03-29 Thread Alex Lyakas
Hi Qu, Wang, On Tue, Mar 22, 2016 at 3:35 AM, Qu Wenruo wrote: > Since we will introduce a new on-disk based dedupe method, introduce new > interfaces to resume previous dedupe setup. > > And since we introduce a new tree for status, also add disable handler > for it. > > Signed-off-by: Wang Xiao

Re: [PATCH v2 00/15] Introduce device state 'failed', Hot spare and Auto replace

2016-03-29 Thread Austin S. Hemmelgarn
On 2016-03-29 10:22, Anand Jain wrote: Thanks for various comments, tests and feedback. Background: Hot spare and Auto replace: Hot spare is predominately used to mitigate or narrow the time window of a storage in degraded mode during which any further disk failure might lead to a catastro

Re: Compression causes kernel crashes if there are I/O or checksum errors (was: RE: kernel BUG at fs/btrfs/volumes.c:5519 when hot-removing device in RAID-1)

2016-03-29 Thread Mitch Fossen
Hello, Your experience looks similar to an issue that I've been running into recently. I have a btrfs array in RAID0 with compression=lzo set. The machine runs fine for awhile, then crashes at (seemingly) random with an error message in the journal about a stuck CPU and an issue with the kworker

Re: Global hotspare functionality

2016-03-29 Thread Yauhen Kharuzhy
On Tue, Mar 29, 2016 at 10:41:36PM +0800, Anand Jain wrote: > > >Issue 2. > >At start of autoreplacig drive by hotspare, kernel craches in transaction > >handling code (inside of btrfs_commit_transaction() called by autoreplace > >initiating > >routines). I 'fixed' this by removing of closing of

Re: Global hotspare functionality

2016-03-29 Thread Yauhen Kharuzhy
Hi. I am testing hotspare v2 on kernel v4.4.5 (I will try latest Chris' tree later) now with lockdep debugging enabled. At starting of replacement, lockdep warning is displayed, because kstrdup() is called with GFP_NOFS inside of rcu_read_lock/unlock() block (GFP_NOFS can sleep). [ 1463.470875]

Re: Global hotspare functionality

2016-03-29 Thread Yauhen Kharuzhy
On Tue, Mar 29, 2016 at 10:41:36PM +0800, Anand Jain wrote: > > Hi Yauhen, > > > > >Issue 2. > >At start of autoreplacig drive by hotspare, kernel craches in transaction > >handling code (inside of btrfs_commit_transaction() called by autoreplace > >initiating > >routines). I 'fixed' this by re

Re: Global hotspare functionality

2016-03-29 Thread Austin S. Hemmelgarn
On 2016-03-29 15:24, Yauhen Kharuzhy wrote: On Tue, Mar 29, 2016 at 10:41:36PM +0800, Anand Jain wrote: No. No. No please don't do that, it would lead to trouble in handing slow devices. I purposely didn't do it. Hmm. Can you explain please? Sometimes admins may want to have autoreplaceme

Re: [PATCH] Btrfs: fix crash/invalid memory access on fsync when using overlayfs

2016-03-29 Thread Chris Mason
On Mon, Mar 21, 2016 at 05:38:44PM +, fdman...@kernel.org wrote: > From: Filipe Manana > > If the lower or upper directory of an overlayfs mount belong to a btrfs > file system and we fsync the file through the overlayfs' merged directory > we ended up accessing an inode that didn't belong to

attempt to mount after crash during rebalance hard crashes server

2016-03-29 Thread Warren, Daniel
Greetings all, I'm running 4.4.0 from deb sid My server crashed during a balance after I had added 10 disks to the original 15, I have not been able to bring the FS up since, it causes a system crash btrfs fi sh looks fine, but when I mount , it crashes the server with a NULL pointer dereference

Re: Global hotspare functionality

2016-03-29 Thread Chris Murphy
On Tue, Mar 29, 2016 at 1:59 PM, Austin S. Hemmelgarn wrote: > On 2016-03-29 15:24, Yauhen Kharuzhy wrote: >> >> On Tue, Mar 29, 2016 at 10:41:36PM +0800, Anand Jain wrote: >>> >>> >>> No. No. No please don't do that, it would lead to trouble in handing >>> slow devices. I purposely didn't do

Re: attempt to mount after crash during rebalance hard crashes server

2016-03-29 Thread Chris Murphy
On Tue, Mar 29, 2016 at 2:21 PM, Warren, Daniel wrote: > Greetings all, > > I'm running 4.4.0 from deb sid > > My server crashed during a balance after I had added 10 disks to the > original 15, I have not been able to bring the FS up since, it causes > a system crash > > btrfs fi sh looks fine, b

[PATCH] btrfs: Reset IO error counters before start of device replacing

2016-03-29 Thread Yauhen Kharuzhy
If device replace entry was found on disk at mounting and its num_write_errors stats counter has non-NULL value, then replace operation will never be finished and -EIO error will be reported by btrfs_scrub_dev() because this counter is never reset. # mount -o degraded /media/a4fb5c0a-21c5-4fe7-8d

Re: attempt to mount after crash during rebalance hard crashes server

2016-03-29 Thread Patrik Lundquist
On 29 March 2016 at 22:46, Chris Murphy wrote: > On Tue, Mar 29, 2016 at 2:21 PM, Warren, Daniel > wrote: >> Greetings all, >> >> I'm running 4.4.0 from deb sid >> >> btrfs fi sh http://pastebin.com/QLTqSU8L >> kernel panic http://pastebin.com/aBF6XmzA > > Panic shows: > CPU: 0 PID: 153 Comm: kwo

[PATCH] btrfs-progs: fix unknown type name 'u64' in gccgo

2016-03-29 Thread Julio Montes
From: Julio Montes Signed-off-by: Julio Montes --- ioctl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ioctl.h b/ioctl.h index cab9ec2..7c80807 100644 --- a/ioctl.h +++ b/ioctl.h @@ -94,7 +94,7 @@ struct btrfs_ioctl_vol_args_v2 { }; union {

Re: [PATCH 12/12] btrfs: check device for critical errors and mark failed

2016-03-29 Thread Yauhen Kharuzhy
On Tue, Mar 29, 2016 at 10:22:29PM +0800, Anand Jain wrote: > Write and Flush errors are considered as critical errors, > upon which the device will be brought offline and marked as > failed. Write and Flush errors are identified using device > error statistics. > > Signed-off-by: Anand Jain > >

Re: attempt to mount after crash during rebalance hard crashes server

2016-03-29 Thread Duncan
Warren, Daniel posted on Tue, 29 Mar 2016 16:21:28 -0400 as excerpted: > I'm running 4.4.0 from deb sid Correction. According to the kernel panic you posted at... http://pastebin.com/aBF6XmzA ... you're running kernel 3.16.something. You might be running btrfs-progs userspace 4.4.0, but on mo

Re: Global hotspare functionality

2016-03-29 Thread Yauhen Kharuzhy
Reproduced with mason's for-linus-4.6 branch also. 2016-03-29 12:47 GMT-07:00 Yauhen Kharuzhy : > On Tue, Mar 29, 2016 at 10:41:36PM +0800, Anand Jain wrote: >> >> Hi Yauhen, >> > >> > >> >Issue 2. >> >At start of autoreplacig drive by hotspare, kernel craches in transaction >> >handling code (ins

PaX: size overflow detected

2016-03-29 Thread Alec Blayne
Hi, I got this warning on dmesg: PAX: size overflow detected in function btrfs_get_extent fs/btrfs/inode.c:6690 cicus.1228_386 min, count: 104, decl: len; num: 0; context: extent_map; Which was followed by this Call Trace: [ 354.375166] Call Trace: [ 354.375173] [] ? dump_stack+0x47/0x72 [

[RESEND][PATCH] btrfs: Add qgroup tracing

2016-03-29 Thread Mark Fasheh
This patch adds tracepoints to the qgroup code on both the reporting side (insert_dirty_extents) and the accounting side. Taken together it allows us to see what qgroup operations have happened, and what their result was. Signed-off-by: Mark Fasheh --- fs/btrfs/qgroup.c| 9 + in

Re: [PATCH v8 11/27] btrfs: dedupe: Introduce interfaces to resume and cleanup dedupe info

2016-03-29 Thread Qu Wenruo
Alex Lyakas wrote on 2016/03/29 19:31 +0200: Hi Qu, Wang, On Tue, Mar 22, 2016 at 3:35 AM, Qu Wenruo wrote: Since we will introduce a new on-disk based dedupe method, introduce new interfaces to resume previous dedupe setup. And since we introduce a new tree for status, also add disable han

Re: [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework

2016-03-29 Thread Qu Wenruo
Alex Lyakas wrote on 2016/03/29 19:22 +0200: Greetings Qu Wenruo, I have reviewed the dedup patchset found in the github account you mentioned. I have several questions. Please note that by all means I am not criticizing your design or code. I just want to make sure that my understanding of th

Re: [PATCH 12/12] btrfs: check device for critical errors and mark failed

2016-03-29 Thread Yauhen Kharuzhy
On Tue, Mar 29, 2016 at 10:22:29PM +0800, Anand Jain wrote: > Write and Flush errors are considered as critical errors, > upon which the device will be brought offline and marked as > failed. Write and Flush errors are identified using device > error statistics. > > Signed-off-by: Anand Jain > >

btrfs: page allocation failure

2016-03-29 Thread Jean-Denis Girard
Hi list, I just started to use send / receive for backups to another drive. That's a great feature, but unfortunately I'm getting page allocation failure, see below. My backup script does something like this for 11 sub-volumes: btrfs subvolume snapshot -r vol /snaps btrfs fi sync /snaps btr

Re: "bad metadata" not fixed by btrfs repair

2016-03-29 Thread Marc Haber
On Mon, Mar 28, 2016 at 09:51:43PM +0300, Nazar Mokrynskyi wrote: > However, despite those messages everything seems to work fine. In my case, trying to balance the filesystem results in a the balance not finishing in hours, I/O performance going way down during the balance, and a plethora of kern

Re: "bad metadata" not fixed by btrfs repair

2016-03-29 Thread Marc Haber
On Tue, Mar 29, 2016 at 08:43:51AM +0200, Marc Haber wrote: > On Mon, Mar 28, 2016 at 03:35:32PM -0400, Austin S. Hemmelgarn wrote: > > As far as what the kernel is involved with, the easy way to check is if it's > > operating on a mounted filesystem or not. If it only operates on mounted > > file

Re: "bad metadata" not fixed by btrfs repair

2016-03-29 Thread Marc Haber
On Mon, Mar 28, 2016 at 02:46:54PM -0600, Chris Murphy wrote: > http://git.kernel.org/cgit/linux/kernel/git/kdave/btrfs-progs.git/tree/cmds-check.c > line 7722 discusses this error message and it looks like there's no > repair function for it yet; uncertain what problems can result from > this. Th