Re: [PATCH v2] Btrfs: return failure if btrfs_dev_replace_finishing() failed

2014-10-13 Thread Anand Jain
comments below.. On 10/13/14 12:42, Eryu Guan wrote: device replace could fail due to another running scrub process or any other errors btrfs_scrub_dev() may hit, but this failure doesn't get returned to userspace. The following steps could reproduce this issue mkfs -t btrfs -f

Re: [PATCH v2] Btrfs: return failure if btrfs_dev_replace_finishing() failed

2014-10-13 Thread Eryu Guan
On Mon, Oct 13, 2014 at 02:23:57PM +0800, Anand Jain wrote: comments below.. On 10/13/14 12:42, Eryu Guan wrote: device replace could fail due to another running scrub process or any other errors btrfs_scrub_dev() may hit, but this failure doesn't get returned to userspace. The

Re: [PATCH v2] Btrfs: return failure if btrfs_dev_replace_finishing() failed

2014-10-13 Thread Anand Jain
On 10/13/14 14:59, Eryu Guan wrote: On Mon, Oct 13, 2014 at 02:23:57PM +0800, Anand Jain wrote: comments below.. On 10/13/14 12:42, Eryu Guan wrote: device replace could fail due to another running scrub process or any other errors btrfs_scrub_dev() may hit, but this failure doesn't get

[PATCH 1/3] Btrfs: deal with convert_extent_bit errors to avoid fs corruption

2014-10-13 Thread Filipe Manana
When committing a transaction or a log, we look for btree extents that need to be durably persisted by searching for ranges in a io tree that have some bits set (EXTENT_DIRTY or EXTENT_NEW). We then attempt to clear those bits and set the EXTENT_NEED_WAIT bit, with calls to the function

[PATCH 3/3] Btrfs: avoid returning -ENOMEM in convert_extent_bit() too early

2014-10-13 Thread Filipe Manana
We try to allocate an extent state before acquiring the tree's spinlock just in case we end up needing to split an existing extent state into two. If that allocation failed, we would return -ENOMEM. However, our only single caller (transaction/log commit code), passes in an extent state that was

[PATCH 2/3] Btrfs: make find_first_extent_bit be able to cache any state

2014-10-13 Thread Filipe Manana
Right now the only caller of find_first_extent_bit() that is interested in caching extent states (transaction or log commit), never gets an extent state cached. This is because find_first_extent_bit() only caches states that have at least one of the flags EXTENT_IOBITS or EXTENT_BOUNDARY, and the

Re: What is the vision for btrfs fs repair?

2014-10-13 Thread Austin S Hemmelgarn
On 2014-10-10 18:05, Eric Sandeen wrote: On 10/10/14 2:35 PM, Austin S Hemmelgarn wrote: On 2014-10-10 13:43, Bob Marley wrote: On 10/10/2014 16:37, Chris Murphy wrote: The fail safe behavior is to treat the known good tree root as the default tree root, and bypass the bad tree root if it

Re: What is the vision for btrfs fs repair?

2014-10-13 Thread Austin S Hemmelgarn
On 2014-10-12 06:14, Martin Steigerwald wrote: Am Freitag, 10. Oktober 2014, 10:37:44 schrieb Chris Murphy: On Oct 10, 2014, at 6:53 AM, Bob Marley bobmar...@shiftmail.org wrote: On 10/10/2014 03:58, Chris Murphy wrote: * mount -o recovery Enable autorecovery attempts if a bad tree

Re: What is the vision for btrfs fs repair?

2014-10-13 Thread Rich Freeman
On Sun, Oct 12, 2014 at 6:14 AM, Martin Steigerwald mar...@lichtvoll.de wrote: Am Freitag, 10. Oktober 2014, 10:37:44 schrieb Chris Murphy: On Oct 10, 2014, at 6:53 AM, Bob Marley bobmar...@shiftmail.org wrote: On 10/10/2014 03:58, Chris Murphy wrote: * mount -o recovery Enable

Re: btrfs send and kernel 3.17

2014-10-13 Thread john terragon
Actually it seems strange that a send operation could corrupt the source subvolume or fs. Why would the send modify the source subvolume in any significant way? The only way I can find to reconcile your observations with mine is that maybe the snapshots get corrupted not by the send operation by

Re: btrfs balance segfault, kernel BUG at fs/btrfs/extent-tree.c:7727

2014-10-13 Thread Rich Freeman
On Thu, Oct 9, 2014 at 10:19 AM, Petr Janecek jane...@ucw.cz wrote: I have trouble finishing btrfs balance on five disk raid10 fs. I added a disk to 4x3TB raid10 fs and run btrfs balance start /mnt/b3, which segfaulted after few hours, probably because of the BUG below. btrfs check does not

Re: 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931!

2014-10-13 Thread Rich Freeman
On Thu, Oct 2, 2014 at 3:27 AM, Tomasz Chmielewski t...@virtall.com wrote: Got this when running balance with 3.17.0-rc7: [173475.410717] kernel BUG at fs/btrfs/relocation.c:931! I just started a post on another thread with this exact same issue on 3.17.0. I started a balance after adding a

Re: btrfs send and kernel 3.17

2014-10-13 Thread David Arendt
On 10/13/2014 02:40 PM, john terragon wrote: Actually it seems strange that a send operation could corrupt the source subvolume or fs. Why would the send modify the source subvolume in any significant way? The only way I can find to reconcile your observations with mine is that maybe the

Re: btrfs send and kernel 3.17

2014-10-13 Thread Rich Freeman
On Sun, Oct 12, 2014 at 7:11 AM, David Arendt ad...@prnet.org wrote: This weekend I finally had time to try btrfs send again on the newly created fs. Now I am running into another problem: btrfs send returns: ERROR: send ioctl failed with -12: Cannot allocate memory In dmesg I see only the

Re: what is the best way to monitor raid1 drive failures?

2014-10-13 Thread Suman C
I had progs 3.12 and updated to the latest from git(3.16). With this update, btrfs fi show reports there is a missing device immediately after i pull it out. Thanks! I am using virtualbox to test this. So, I am detaching the drive like so: vboxmanage storageattach vm --storagectl controller

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread David Arendt
From my own experience and based on what other people are saying, I think there is a random btrfs filesystem corruption problem in kernel 3.17 at least related to snapshots, therefore I decided to post using another subject to draw attention from people not concerned about btrfs send to it. More

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread Rich Freeman
On Mon, Oct 13, 2014 at 4:27 PM, David Arendt ad...@prnet.org wrote: From my own experience and based on what other people are saying, I think there is a random btrfs filesystem corruption problem in kernel 3.17 at least related to snapshots, therefore I decided to post using another subject

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread john terragon
I think I just found a consistent simple way to trigger the problem (at least on my system). And, as I guessed before, it seems to be related just to readonly snapshots: 1) I create a readonly snapshot 2) I do some changes on the source subvolume for the snapshot (I'm not sure changes are

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread Rich Freeman
On Mon, Oct 13, 2014 at 4:48 PM, john terragon jterra...@gmail.com wrote: I think I just found a consistent simple way to trigger the problem (at least on my system). And, as I guessed before, it seems to be related just to readonly snapshots: 1) I create a readonly snapshot 2) I do some

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread Rich Freeman
On Mon, Oct 13, 2014 at 4:55 PM, Rich Freeman r-bt...@thefreemanclan.net wrote: On Mon, Oct 13, 2014 at 4:48 PM, john terragon jterra...@gmail.com wrote: After the rebooting (or the remount) I consistently have the corruption with the usual multitude of these in dmesg parent transid verify

Re: What is the vision for btrfs fs repair?

2014-10-13 Thread Josef Bacik
On 10/08/2014 03:11 PM, Eric Sandeen wrote: I was looking at Marc's post:

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread john terragon
I'm using compress=no so compression doesn't seem to be related, at least in my case. Just read-only snapshots on 3.17 (although I haven't tried 3.16). John -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread David Arendt
As these to machines are running as server for different purposes (yes, I know that btrfs is unstable and any corruption or data loss is at my own risk therefore I have good backups), I want to reboot them not more then necessary. However I tried to bring my reboot times in relation with

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread David Arendt
I'm also using no compression. On 10/13/2014 11:22 PM, john terragon wrote: I'm using compress=no so compression doesn't seem to be related, at least in my case. Just read-only snapshots on 3.17 (although I haven't tried 3.16). John -- To unsubscribe from this list: send the line

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread Duncan
David Arendt posted on Mon, 13 Oct 2014 23:25:23 +0200 as excerpted: I'm also using no compression. On 10/13/2014 11:22 PM, john terragon wrote: I'm using compress=no so compression doesn't seem to be related, at least in my case. Just read-only snapshots on 3.17 (although I haven't tried

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread Duncan
Rich Freeman posted on Mon, 13 Oct 2014 16:42:14 -0400 as excerpted: On Mon, Oct 13, 2014 at 4:27 PM, David Arendt ad...@prnet.org wrote: From my own experience and based on what other people are saying, I think there is a random btrfs filesystem corruption problem in kernel 3.17 at least

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread Rich Freeman
On Mon, Oct 13, 2014 at 5:22 PM, john terragon jterra...@gmail.com wrote: I'm using compress=no so compression doesn't seem to be related, at least in my case. Just read-only snapshots on 3.17 (although I haven't tried 3.16). I was using lzo compression, and hence my comment about turning it

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread john terragon
And another worrying thing I didn't notice before. Two snapshots have dates that do not make sense. root-b3 and root-b4 have been created Oct 14th (and btw root's modification time was also on Oct the 14th). So why do they show Oct 10th? And root-prov has actually been created on Oct 10 15:37, as

Re: what is the best way to monitor raid1 drive failures?

2014-10-13 Thread Anand Jain
On 10/14/14 03:50, Suman C wrote: I had progs 3.12 and updated to the latest from git(3.16). With this update, btrfs fi show reports there is a missing device immediately after i pull it out. Thanks! I am using virtualbox to test this. So, I am detaching the drive like so: vboxmanage

Re: [PATCH v2] Btrfs: return failure if btrfs_dev_replace_finishing() failed

2014-10-13 Thread Eryu Guan
On Mon, Oct 13, 2014 at 06:18:04PM +0800, Anand Jain wrote: On 10/13/14 14:59, Eryu Guan wrote: On Mon, Oct 13, 2014 at 02:23:57PM +0800, Anand Jain wrote: comments below.. On 10/13/14 12:42, Eryu Guan wrote: device replace could fail due to another running scrub process or any