Re: Adventures in btrfs raid5 disk recovery

2016-07-06 Thread Chris Murphy
On Wed, Jul 6, 2016 at 1:15 PM, Austin S. Hemmelgarn wrote: > On 2016-07-06 14:45, Chris Murphy wrote: >> I think it's statistically 0 people changing this from default. It's >> people with drives that have no SCT ERC support, used in raid1+, who >> happen to stumble upon this very obscure work a

Re: Adventures in btrfs raid5 disk recovery

2016-07-06 Thread Austin S. Hemmelgarn
On 2016-07-06 14:45, Chris Murphy wrote: On Wed, Jul 6, 2016 at 11:18 AM, Austin S. Hemmelgarn wrote: On 2016-07-06 12:43, Chris Murphy wrote: So does it make sense to just set the default to 180? Or is there a smarter way to do this? I don't know. Just thinking about this: 1. People who a

Re: Adventures in btrfs raid5 disk recovery

2016-07-06 Thread Chris Murphy
On Wed, Jul 6, 2016 at 11:18 AM, Austin S. Hemmelgarn wrote: > On 2016-07-06 12:43, Chris Murphy wrote: >> So does it make sense to just set the default to 180? Or is there a >> smarter way to do this? I don't know. > > Just thinking about this: > 1. People who are setting this somewhere will be

Re: Adventures in btrfs raid5 disk recovery

2016-07-06 Thread Austin S. Hemmelgarn
On 2016-07-06 12:43, Chris Murphy wrote: On Wed, Jul 6, 2016 at 5:51 AM, Austin S. Hemmelgarn wrote: On 2016-07-05 19:05, Chris Murphy wrote: Related: http://www.spinics.net/lists/raid/msg52880.html Looks like there is some traction to figuring out what to do about this, whether it's a udev

Re: Adventures in btrfs raid5 disk recovery

2016-07-06 Thread Chris Murphy
On Wed, Jul 6, 2016 at 5:51 AM, Austin S. Hemmelgarn wrote: > On 2016-07-05 19:05, Chris Murphy wrote: >> >> Related: >> http://www.spinics.net/lists/raid/msg52880.html >> >> Looks like there is some traction to figuring out what to do about >> this, whether it's a udev rule or something that happ

Re: Adventures in btrfs raid5 disk recovery

2016-07-06 Thread Austin S. Hemmelgarn
On 2016-07-05 19:05, Chris Murphy wrote: Related: http://www.spinics.net/lists/raid/msg52880.html Looks like there is some traction to figuring out what to do about this, whether it's a udev rule or something that happens in the kernel itself. Pretty much the only hardware setup unaffected by th

Re: Adventures in btrfs raid5 disk recovery

2016-07-05 Thread Chris Murphy
Related: http://www.spinics.net/lists/raid/msg52880.html Looks like there is some traction to figuring out what to do about this, whether it's a udev rule or something that happens in the kernel itself. Pretty much the only hardware setup unaffected by this are those with enterprise or NAS drives.

Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Steven Haigh
On 29/06/16 04:01, Chris Murphy wrote: > Just wiping the slate clean to summarize: > > > 1. We have a consistent ~1 in 3 maybe 1 in 2, reproducible corruption > of *data extent* parity during a scrub with raid5. Goffredo and I have > both reproduced it. It's a big bug. It might still be useful if

Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Chris Murphy
Just wiping the slate clean to summarize: 1. We have a consistent ~1 in 3 maybe 1 in 2, reproducible corruption of *data extent* parity during a scrub with raid5. Goffredo and I have both reproduced it. It's a big bug. It might still be useful if someone else can reproduce it too. Goffredo, can

Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Steven Haigh
On 28/06/16 22:25, Austin S. Hemmelgarn wrote: > On 2016-06-28 08:14, Steven Haigh wrote: >> On 28/06/16 22:05, Austin S. Hemmelgarn wrote: >>> On 2016-06-27 17:57, Zygo Blaxell wrote: On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote: > On Mon, Jun 27, 2016 at 5:21 AM, Austin S

Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Austin S. Hemmelgarn
On 2016-06-28 08:14, Steven Haigh wrote: On 28/06/16 22:05, Austin S. Hemmelgarn wrote: On 2016-06-27 17:57, Zygo Blaxell wrote: On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote: On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn wrote: On 2016-06-25 12:44, Chris Murphy wrote:

Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Steven Haigh
On 28/06/16 22:05, Austin S. Hemmelgarn wrote: > On 2016-06-27 17:57, Zygo Blaxell wrote: >> On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote: >>> On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn >>> wrote: On 2016-06-25 12:44, Chris Murphy wrote: > On Fri, Jun 24, 2016 a

Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Austin S. Hemmelgarn
On 2016-06-27 17:57, Zygo Blaxell wrote: On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote: On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn wrote: On 2016-06-25 12:44, Chris Murphy wrote: On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn wrote: OK but hold on. During s

Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Austin S. Hemmelgarn
On 2016-06-27 23:17, Zygo Blaxell wrote: On Mon, Jun 27, 2016 at 08:39:21PM -0600, Chris Murphy wrote: On Mon, Jun 27, 2016 at 7:52 PM, Zygo Blaxell wrote: On Mon, Jun 27, 2016 at 04:30:23PM -0600, Chris Murphy wrote: Btrfs does have something of a work around for when things get slow, and th

Re: Adventures in btrfs raid5 disk recovery

2016-06-27 Thread Zygo Blaxell
On Mon, Jun 27, 2016 at 08:39:21PM -0600, Chris Murphy wrote: > On Mon, Jun 27, 2016 at 7:52 PM, Zygo Blaxell > wrote: > > On Mon, Jun 27, 2016 at 04:30:23PM -0600, Chris Murphy wrote: > >> Btrfs does have something of a work around for when things get slow, > >> and that's balance, read and rewri

Re: Adventures in btrfs raid5 disk recovery

2016-06-27 Thread Chris Murphy
On Mon, Jun 27, 2016 at 7:52 PM, Zygo Blaxell wrote: > On Mon, Jun 27, 2016 at 04:30:23PM -0600, Chris Murphy wrote: >> On Mon, Jun 27, 2016 at 3:57 PM, Zygo Blaxell >> wrote: >> > On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote: >> > If anything, I want the timeout to be shorter so

Re: Adventures in btrfs raid5 disk recovery

2016-06-27 Thread Zygo Blaxell
On Mon, Jun 27, 2016 at 04:30:23PM -0600, Chris Murphy wrote: > On Mon, Jun 27, 2016 at 3:57 PM, Zygo Blaxell > wrote: > > On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote: > > If anything, I want the timeout to be shorter so that upper layers with > > redundancy can get an EIO and ini

Re: Adventures in btrfs raid5 disk recovery

2016-06-27 Thread Chris Murphy
On Mon, Jun 27, 2016 at 3:57 PM, Zygo Blaxell wrote: > On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote: > >> It just came up again in a thread over the weekend on linux-raid@. I'm >> going to ask while people are paying attention if a patch to change >> the 30 second time out to some

Re: Adventures in btrfs raid5 disk recovery

2016-06-27 Thread Zygo Blaxell
On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote: > On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn > wrote: > > On 2016-06-25 12:44, Chris Murphy wrote: > >> On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn > >> wrote: > >> > >> OK but hold on. During scrub, it should rea

Re: Adventures in btrfs raid5 disk recovery

2016-06-27 Thread Henk Slager
On Mon, Jun 27, 2016 at 6:17 PM, Chris Murphy wrote: > On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn > wrote: >> On 2016-06-25 12:44, Chris Murphy wrote: >>> >>> On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn >>> wrote: >>> Well, the obvious major advantage that comes to min

Re: Adventures in btrfs raid5 disk recovery

2016-06-27 Thread Chris Murphy
For what it's worth I found btrfs-map-logical can produce mapping for raid5 (didn't test raid6) by specifying the extent block length. If that's omitted it only shows the device+mapping for the first strip. This example is a 3 disk raid5, with a 128KiB file all in a single extent. [root@f24s ~]#

Re: Adventures in btrfs raid5 disk recovery

2016-06-27 Thread Chris Murphy
On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn wrote: > On 2016-06-25 12:44, Chris Murphy wrote: >> >> On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn >> wrote: >> >>> Well, the obvious major advantage that comes to mind for me to >>> checksumming >>> parity is that it would let us

Re: Adventures in btrfs raid5 disk recovery

2016-06-27 Thread Austin S. Hemmelgarn
On 2016-06-25 12:44, Chris Murphy wrote: On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn wrote: Well, the obvious major advantage that comes to mind for me to checksumming parity is that it would let us scrub the parity data itself and verify it. OK but hold on. During scrub, it shoul

Re: Adventures in btrfs raid5 disk recovery

2016-06-26 Thread Zygo Blaxell
On Sun, Jun 26, 2016 at 01:30:03PM -0600, Chris Murphy wrote: > On Sun, Jun 26, 2016 at 1:54 AM, Andrei Borzenkov wrote: > > 26.06.2016 00:52, Chris Murphy пишет: > >> Interestingly enough, so far I'm finding with full stripe writes, i.e. > >> 3x raid5, exactly 128KiB data writes, devid 3 is alway

Re: Adventures in btrfs raid5 disk recovery

2016-06-26 Thread Chris Murphy
On Sun, Jun 26, 2016 at 1:54 AM, Andrei Borzenkov wrote: > 26.06.2016 00:52, Chris Murphy пишет: >> Interestingly enough, so far I'm finding with full stripe writes, i.e. >> 3x raid5, exactly 128KiB data writes, devid 3 is always parity. This >> is raid4. > > That's not what code suggests and what

Re: Adventures in btrfs raid5 disk recovery

2016-06-26 Thread Duncan
Andrei Borzenkov posted on Sun, 26 Jun 2016 10:54:16 +0300 as excerpted: > P.S. usage of "stripe" to mean "stripe element" actually adds to > confusion when reading code :) ... and posts (including patches, which I guess are code as well, just not applied yet). I've been noticing that in the "s

Re: Adventures in btrfs raid5 disk recovery

2016-06-26 Thread Andrei Borzenkov
26.06.2016 00:52, Chris Murphy пишет: > Interestingly enough, so far I'm finding with full stripe writes, i.e. > 3x raid5, exactly 128KiB data writes, devid 3 is always parity. This > is raid4. That's not what code suggests and what I see in practice - parity seems to be distributed across all dis

Re: Adventures in btrfs raid5 disk recovery

2016-06-25 Thread Chris Murphy
Interestingly enough, so far I'm finding with full stripe writes, i.e. 3x raid5, exactly 128KiB data writes, devid 3 is always parity. This is raid4. So...I wonder if some of these slow cases end up with a bunch of stripes that are effectively raid4-like, and have a lot of parity overwrites, which

Re: Adventures in btrfs raid5 disk recovery

2016-06-25 Thread Chris Murphy
On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn wrote: > Well, the obvious major advantage that comes to mind for me to checksumming > parity is that it would let us scrub the parity data itself and verify it. OK but hold on. During scrub, it should read data, compute checksums *and* pari

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Austin S. Hemmelgarn
On 2016-06-24 13:52, Chris Murphy wrote: On Fri, Jun 24, 2016 at 11:21 AM, Andrei Borzenkov wrote: 24.06.2016 20:06, Chris Murphy пишет: On Fri, Jun 24, 2016 at 3:52 AM, Andrei Borzenkov wrote: On Fri, Jun 24, 2016 at 11:50 AM, Hugo Mills wrote: eta)data and RAID56 parity is not data.

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Zygo Blaxell
On Fri, Jun 24, 2016 at 11:40:56AM -0600, Chris Murphy wrote: > On Fri, Jun 24, 2016 at 4:16 AM, Hugo Mills wrote: > > On Fri, Jun 24, 2016 at 12:52:21PM +0300, Andrei Borzenkov wrote: > >For data, say you have n-1 good devices, with n-1 blocks on them. > > Each block has a checksum in the met

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Chris Murphy
On Fri, Jun 24, 2016 at 11:21 AM, Andrei Borzenkov wrote: > 24.06.2016 20:06, Chris Murphy пишет: >> On Fri, Jun 24, 2016 at 3:52 AM, Andrei Borzenkov >> wrote: >>> On Fri, Jun 24, 2016 at 11:50 AM, Hugo Mills wrote: >>> eta)data and RAID56 parity is not data. Checksums are not par

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Chris Murphy
On Fri, Jun 24, 2016 at 4:16 AM, Hugo Mills wrote: > On Fri, Jun 24, 2016 at 12:52:21PM +0300, Andrei Borzenkov wrote: >> Yes, that is what I wrote below. But that means that RAID5 with one >> degraded disk won't be able to reconstruct data on this degraded disk >> because reconstructed extent co

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Chris Murphy
On Fri, Jun 24, 2016 at 4:16 AM, Andrei Borzenkov wrote: > On Fri, Jun 24, 2016 at 8:20 AM, Chris Murphy wrote: > >> [root@f24s ~]# filefrag -v /mnt/5/* >> Filesystem type is: 9123683e >> File size of /mnt/5/a.txt is 16383 (4 blocks of 4096 bytes) >> ext: logical_offset:physical_offs

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Andrei Borzenkov
24.06.2016 20:06, Chris Murphy пишет: > On Fri, Jun 24, 2016 at 3:52 AM, Andrei Borzenkov wrote: >> On Fri, Jun 24, 2016 at 11:50 AM, Hugo Mills wrote: >> eta)data and RAID56 parity is not data. >>> >>>Checksums are not parity, correct. However, every data block >>> (including, I think, the p

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Chris Murphy
On Fri, Jun 24, 2016 at 3:52 AM, Andrei Borzenkov wrote: > On Fri, Jun 24, 2016 at 11:50 AM, Hugo Mills wrote: >eta)data and RAID56 parity is not data. >> >>Checksums are not parity, correct. However, every data block >> (including, I think, the parity) is checksummed and put into the csum >>

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Chris Murphy
On Fri, Jun 24, 2016 at 2:50 AM, Hugo Mills wrote: >Checksums are not parity, correct. However, every data block > (including, I think, the parity) is checksummed and put into the csum > tree. I don't see how parity is checksummed. It definitely is not in the csum tree. Two file systems, one

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Hugo Mills
On Fri, Jun 24, 2016 at 10:52:53AM -0600, Chris Murphy wrote: > On Fri, Jun 24, 2016 at 2:50 AM, Hugo Mills wrote: > > >Checksums are not parity, correct. However, every data block > > (including, I think, the parity) is checksummed and put into the csum > > tree. > > I don't see how parity

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Zygo Blaxell
On Fri, Jun 24, 2016 at 07:02:34AM +0300, Andrei Borzenkov wrote: > >> I don't read code well enough, but I'd be surprised if Btrfs > >> reconstructs from parity and doesn't then check the resulting > >> reconstructed data to its EXTENT_CSUM. > > > > I wouldn't be surprised if both things happen i

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Zygo Blaxell
On Thu, Jun 23, 2016 at 11:20:40PM -0600, Chris Murphy wrote: > [root@f24s ~]# filefrag -v /mnt/5/* > Filesystem type is: 9123683e > File size of /mnt/5/a.txt is 16383 (4 blocks of 4096 bytes) > ext: logical_offset:physical_offset: length: expected: flags: >0:0.. 0:

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Austin S. Hemmelgarn
On 2016-06-24 06:59, Hugo Mills wrote: On Fri, Jun 24, 2016 at 01:19:30PM +0300, Andrei Borzenkov wrote: On Fri, Jun 24, 2016 at 1:16 PM, Hugo Mills wrote: On Fri, Jun 24, 2016 at 12:52:21PM +0300, Andrei Borzenkov wrote: On Fri, Jun 24, 2016 at 11:50 AM, Hugo Mills wrote: On Fri, Jun 24, 2

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Austin S. Hemmelgarn
On 2016-06-24 01:20, Chris Murphy wrote: On Thu, Jun 23, 2016 at 8:07 PM, Zygo Blaxell wrote: With simple files changing one character with vi and gedit, I get completely different logical and physical numbers with each change, so it's clearly cowing the entire stripe (192KiB in my 3 dev raid5

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Hugo Mills
On Fri, Jun 24, 2016 at 01:19:30PM +0300, Andrei Borzenkov wrote: > On Fri, Jun 24, 2016 at 1:16 PM, Hugo Mills wrote: > > On Fri, Jun 24, 2016 at 12:52:21PM +0300, Andrei Borzenkov wrote: > >> On Fri, Jun 24, 2016 at 11:50 AM, Hugo Mills wrote: > >> > On Fri, Jun 24, 2016 at 07:02:34AM +0300, An

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Andrei Borzenkov
On Fri, Jun 24, 2016 at 1:16 PM, Hugo Mills wrote: > On Fri, Jun 24, 2016 at 12:52:21PM +0300, Andrei Borzenkov wrote: >> On Fri, Jun 24, 2016 at 11:50 AM, Hugo Mills wrote: >> > On Fri, Jun 24, 2016 at 07:02:34AM +0300, Andrei Borzenkov wrote: >> >> 24.06.2016 04:47, Zygo Blaxell пишет: >> >> >

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Hugo Mills
On Fri, Jun 24, 2016 at 12:52:21PM +0300, Andrei Borzenkov wrote: > On Fri, Jun 24, 2016 at 11:50 AM, Hugo Mills wrote: > > On Fri, Jun 24, 2016 at 07:02:34AM +0300, Andrei Borzenkov wrote: > >> 24.06.2016 04:47, Zygo Blaxell пишет: > >> > On Thu, Jun 23, 2016 at 06:26:22PM -0600, Chris Murphy wro

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Andrei Borzenkov
On Fri, Jun 24, 2016 at 8:20 AM, Chris Murphy wrote: > [root@f24s ~]# filefrag -v /mnt/5/* > Filesystem type is: 9123683e > File size of /mnt/5/a.txt is 16383 (4 blocks of 4096 bytes) > ext: logical_offset:physical_offset: length: expected: flags: >0:0.. 3:293

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Andrei Borzenkov
On Fri, Jun 24, 2016 at 11:50 AM, Hugo Mills wrote: > On Fri, Jun 24, 2016 at 07:02:34AM +0300, Andrei Borzenkov wrote: >> 24.06.2016 04:47, Zygo Blaxell пишет: >> > On Thu, Jun 23, 2016 at 06:26:22PM -0600, Chris Murphy wrote: >> >> On Thu, Jun 23, 2016 at 1:32 PM, Goffredo Baroncelli >> >> wro

Re: Adventures in btrfs raid5 disk recovery

2016-06-24 Thread Hugo Mills
On Fri, Jun 24, 2016 at 07:02:34AM +0300, Andrei Borzenkov wrote: > 24.06.2016 04:47, Zygo Blaxell пишет: > > On Thu, Jun 23, 2016 at 06:26:22PM -0600, Chris Murphy wrote: > >> On Thu, Jun 23, 2016 at 1:32 PM, Goffredo Baroncelli > >> wrote: > >>> The raid5 write hole is avoided in BTRFS (and in

Re: Adventures in btrfs raid5 disk recovery

2016-06-23 Thread Chris Murphy
On Thu, Jun 23, 2016 at 8:07 PM, Zygo Blaxell wrote: >> With simple files changing one character with vi and gedit, >> I get completely different logical and physical numbers with each >> change, so it's clearly cowing the entire stripe (192KiB in my 3 dev >> raid5). > > You are COWing the entire

Re: Adventures in btrfs raid5 disk recovery

2016-06-23 Thread Andrei Borzenkov
24.06.2016 04:47, Zygo Blaxell пишет: > On Thu, Jun 23, 2016 at 06:26:22PM -0600, Chris Murphy wrote: >> On Thu, Jun 23, 2016 at 1:32 PM, Goffredo Baroncelli >> wrote: >>> The raid5 write hole is avoided in BTRFS (and in ZFS) thanks to the >>> checksum. >> >> Yeah I'm kinda confused on this poin

Re: Adventures in btrfs raid5 disk recovery

2016-06-23 Thread Zygo Blaxell
On Thu, Jun 23, 2016 at 05:37:09PM -0600, Chris Murphy wrote: > > I expect that parity is in this data block group, and therefore is > > checksummed the same as any other data in that block group. > > This appears to be wrong. Comparing the same file, one file only, one > two new Btrfs volumes, on

Re: Adventures in btrfs raid5 disk recovery

2016-06-23 Thread Zygo Blaxell
On Thu, Jun 23, 2016 at 05:37:09PM -0600, Chris Murphy wrote: > > So in your example of degraded writes, no matter what the on disk > > format makes it discoverable there is a problem: > > > > A. The "updating" is still always COW so there is no overwriting. > > There is RMW code in btrfs/raid56.c

Re: Adventures in btrfs raid5 disk recovery

2016-06-23 Thread Zygo Blaxell
On Thu, Jun 23, 2016 at 06:26:22PM -0600, Chris Murphy wrote: > On Thu, Jun 23, 2016 at 1:32 PM, Goffredo Baroncelli > wrote: > > The raid5 write hole is avoided in BTRFS (and in ZFS) thanks to the > > checksum. > > Yeah I'm kinda confused on this point. > > https://btrfs.wiki.kernel.org/index

Re: Adventures in btrfs raid5 disk recovery

2016-06-23 Thread Zygo Blaxell
On Thu, Jun 23, 2016 at 09:32:50PM +0200, Goffredo Baroncelli wrote: > The raid write hole happens when a stripe is not completely written > on the platters: the parity and the related data mismatch. In this > case a "simple" raid5 may return wrong data if the parity is used to > compute the data.

Re: Adventures in btrfs raid5 disk recovery

2016-06-23 Thread Chris Murphy
On Thu, Jun 23, 2016 at 1:32 PM, Goffredo Baroncelli wrote: > > The raid5 write hole is avoided in BTRFS (and in ZFS) thanks to the checksum. Yeah I'm kinda confused on this point. https://btrfs.wiki.kernel.org/index.php/RAID56 It says there is a write hole for Btrfs. But defines it in terms o

Re: Adventures in btrfs raid5 disk recovery

2016-06-23 Thread Chris Murphy
On Wed, Jun 22, 2016 at 11:14 AM, Chris Murphy wrote: > > However, from btrfs-debug-tree from a 3 device raid5 volume: > > item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15621 itemsize 144 > chunk length 2147483648 owner 2 stripe_len 65536 > type DATA|RAID5 num_stripes 3 > stripe 0 d

Re: Adventures in btrfs raid5 disk recovery

2016-06-23 Thread Goffredo Baroncelli
On 2016-06-22 22:35, Zygo Blaxell wrote: >> I do not know the exact nature of the Btrfs raid56 write hole. Maybe a >> > dev or someone who knows can explain it. > If you have 3 raid5 devices, they might be laid out on disk like this > (e.g. with a 16K stripe width): > > Address: 0..16K

Re: Adventures in btrfs raid5 disk recovery

2016-06-22 Thread Zygo Blaxell
On Wed, Jun 22, 2016 at 11:14:30AM -0600, Chris Murphy wrote: > > Before deploying raid5, I tested these by intentionally corrupting > > one disk in an otherwise healthy raid5 array and watching the result. > > It's difficult to reproduce if no one understands how you > intentionally corrupted tha

Re: Adventures in btrfs raid5 disk recovery

2016-06-22 Thread Chris Murphy
On Mon, Jun 20, 2016 at 7:55 PM, Zygo Blaxell wrote: > On Mon, Jun 20, 2016 at 03:27:03PM -0600, Chris Murphy wrote: >> On Mon, Jun 20, 2016 at 2:40 PM, Zygo Blaxell >> wrote: >> > On Mon, Jun 20, 2016 at 01:30:11PM -0600, Chris Murphy wrote: >> >> >> For me the critical question is what does "so

Re: Adventures in btrfs raid5 disk recovery - update

2016-06-21 Thread Zygo Blaxell
TL;DR: Kernel 4.6.2 causes a world of pain. Use 4.5.7 instead. 'btrfs dev stat' doesn't seem to count "csum failed" (i.e. corruption) errors in compressed extents. On Sun, Jun 19, 2016 at 11:44:27PM -0400, Zygo Blaxell wrote: > Not so long ago, I had a disk fail in a b

Re: Adventures in btrfs raid5 disk recovery

2016-06-20 Thread Zygo Blaxell
On Mon, Jun 20, 2016 at 09:55:59PM -0400, Zygo Blaxell wrote: > In this current case, I'm getting things like this: > > [12008.243867] BTRFS info (device vdc): csum failed ino 4420604 extent > 26805825306624 csum 4105596028 wanted 787343232 mirror 0 [...] > The other other weird thing here is tha

Re: Adventures in btrfs raid5 disk recovery

2016-06-20 Thread Zygo Blaxell
On Mon, Jun 20, 2016 at 03:27:03PM -0600, Chris Murphy wrote: > On Mon, Jun 20, 2016 at 2:40 PM, Zygo Blaxell > wrote: > > On Mon, Jun 20, 2016 at 01:30:11PM -0600, Chris Murphy wrote: > > >> For me the critical question is what does "some corrupted sectors" mean? > > > > On other raid5 arrays, I

Re: Adventures in btrfs raid5 disk recovery

2016-06-20 Thread Chris Murphy
On Mon, Jun 20, 2016 at 2:40 PM, Zygo Blaxell wrote: > On Mon, Jun 20, 2016 at 01:30:11PM -0600, Chris Murphy wrote: >> For me the critical question is what does "some corrupted sectors" mean? > > On other raid5 arrays, I would observe a small amount of corruption every > time there was a system

Re: Adventures in btrfs raid5 disk recovery

2016-06-20 Thread Zygo Blaxell
On Mon, Jun 20, 2016 at 01:30:11PM -0600, Chris Murphy wrote: > On Mon, Jun 20, 2016 at 1:11 PM, Zygo Blaxell > wrote: > > On Mon, Jun 20, 2016 at 11:13:51PM +0500, Roman Mamedov wrote: > >> On Sun, 19 Jun 2016 23:44:27 -0400 > Seems difficult at best due to this: > >>The normal 'device delete' op

Re: Adventures in btrfs raid5 disk recovery

2016-06-20 Thread Chris Murphy
On Mon, Jun 20, 2016 at 1:11 PM, Zygo Blaxell wrote: > On Mon, Jun 20, 2016 at 11:13:51PM +0500, Roman Mamedov wrote: >> On Sun, 19 Jun 2016 23:44:27 -0400 >> Zygo Blaxell wrote: >> From a practical standpoint, [aside from not using Btrfs RAID5], you'd be >> better off shutting down the system, b

Re: Adventures in btrfs raid5 disk recovery

2016-06-20 Thread Zygo Blaxell
On Mon, Jun 20, 2016 at 11:13:51PM +0500, Roman Mamedov wrote: > On Sun, 19 Jun 2016 23:44:27 -0400 > Zygo Blaxell wrote: > From a practical standpoint, [aside from not using Btrfs RAID5], you'd be > better off shutting down the system, booting a rescue OS, copying the content > of the failing dis

Re: Adventures in btrfs raid5 disk recovery

2016-06-20 Thread Roman Mamedov
On Sun, 19 Jun 2016 23:44:27 -0400 Zygo Blaxell wrote: > It's not going well so far. Pay attention, there are at least four > separate problems in here and we're not even half done yet. > > I'm currently using kernel 4.6.2 with btrfs fixes forward-ported from > 4.5.7, because 4.5.7 has a number

Adventures in btrfs raid5 disk recovery

2016-06-19 Thread Zygo Blaxell
Not so long ago, I had a disk fail in a btrfs filesystem with raid1 metadata and raid5 data. I mounted the filesystem readonly, replaced the failing disk, and attempted to recover by adding the new disk and deleting the missing disk. It's not going well so far. Pay attention, there are at least