Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-11-05 Thread Goffredo Baroncelli
Resent because I don't see it in ml Hi Qu, On 2016-11-04 03:10, Qu Wenruo wrote: [...] > > I reproduced your problem and find that seems to be a problem of race. [...] [...]> > I digged a little further into the case 2) and found: > a) Kernel is scrubbing correct ran

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-11-03 Thread Qu Wenruo
At 06/25/2016 08:21 PM, Goffredo Baroncelli wrote: Hi all, following the thread "Adventures in btrfs raid5 disk recovery", I investigated a bit the BTRFS capability to scrub a corrupted raid5 filesystem. To test it, I first find where a file was stored, and then I tried to corrupt the data d

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Qu Wenruo
At 09/22/2016 11:07 AM, Christoph Anton Mitterer wrote: On Thu, 2016-09-22 at 10:08 +0800, Qu Wenruo wrote: And I don't see the necessary to csum the parity. Why csum a csum again? I'd say simply for the following reason: Imagine the smallest RAID5: 2x data D1 D2, 1x parity P If D2 is lost i

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Chris Murphy
On Wed, Sep 21, 2016 at 9:00 PM, Qu Wenruo wrote: >> Both copies are not scrubbed? Oh hell... > > > I was replying to the "--check-data-csum" of btrfsck. > > I mean, for --check-data-csum, it doesn't read the backup if the first data > can be read out without error. > > And if the first data is w

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Christoph Anton Mitterer
On Thu, 2016-09-22 at 10:08 +0800, Qu Wenruo wrote: > And I don't see the necessary to csum the parity. > Why csum a csum again? I'd say simply for the following reason: Imagine the smallest RAID5: 2x data D1 D2, 1x parity P If D2 is lost it could be recalculated via D1 and P. What if only (all)

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Qu Wenruo
At 09/22/2016 10:44 AM, Chris Murphy wrote: On Wed, Sep 21, 2016 at 8:08 PM, Qu Wenruo wrote: At 09/21/2016 11:13 PM, Chris Murphy wrote: I understand some things should go in fsck for comparison. But in this case I don't see how it can help. Parity is not checksummed. The only way to kn

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Chris Murphy
On Wed, Sep 21, 2016 at 8:08 PM, Qu Wenruo wrote: > > > At 09/21/2016 11:13 PM, Chris Murphy wrote: >> I understand some things should go in fsck for comparison. But in this >> case I don't see how it can help. Parity is not checksummed. The only >> way to know if it's wrong is to read all of the

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Qu Wenruo
At 09/21/2016 11:13 PM, Chris Murphy wrote: On Wed, Sep 21, 2016 at 3:15 AM, Qu Wenruo wrote: At 09/21/2016 03:35 PM, Tomasz Torcz wrote: On Wed, Sep 21, 2016 at 03:28:25PM +0800, Qu Wenruo wrote: Hi, For this well-known bug, is there any one fixing it? It can't be more frustrating fi

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Chris Murphy
On Wed, Sep 21, 2016 at 3:15 AM, Qu Wenruo wrote: > > > At 09/21/2016 03:35 PM, Tomasz Torcz wrote: >> >> On Wed, Sep 21, 2016 at 03:28:25PM +0800, Qu Wenruo wrote: >>> >>> Hi, >>> >>> For this well-known bug, is there any one fixing it? >>> >>> It can't be more frustrating finding some one has al

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Chris Murphy
On Wed, Sep 21, 2016 at 1:28 AM, Qu Wenruo wrote: > Hi, > > For this well-known bug, is there any one fixing it? > > It can't be more frustrating finding some one has already worked on it after > spending days digging. > > BTW, since kernel scrub is somewhat scrap for raid5/6, I'd like to implemen

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Qu Wenruo
At 09/21/2016 03:35 PM, Tomasz Torcz wrote: On Wed, Sep 21, 2016 at 03:28:25PM +0800, Qu Wenruo wrote: Hi, For this well-known bug, is there any one fixing it? It can't be more frustrating finding some one has already worked on it after spending days digging. BTW, since kernel scrub is some

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Tomasz Torcz
On Wed, Sep 21, 2016 at 03:28:25PM +0800, Qu Wenruo wrote: > Hi, > > For this well-known bug, is there any one fixing it? > > It can't be more frustrating finding some one has already worked on it after > spending days digging. > > BTW, since kernel scrub is somewhat scrap for raid5/6, I'd like

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Qu Wenruo
Hi, For this well-known bug, is there any one fixing it? It can't be more frustrating finding some one has already worked on it after spending days digging. BTW, since kernel scrub is somewhat scrap for raid5/6, I'd like to implement btrfsck scrub support, at least we can use btrfsck to fix

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-08-19 Thread Philip Espunkt
> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: >> >> most of the time, it seems that btrfs-raid5 is not capable to >> rebuild parity and data. Worse the message returned by scrub is >> incoherent by the status on the disk. The tests didn't fail every >> time; this complicate the diagnosis. Ho

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-07-18 Thread Goffredo Baroncelli
Hi On 2016-07-16 17:51, Jarkko Lavinen wrote: > On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: >> Using "btrfs insp phy" I developed a script to trigger the bug. > > Thank you for the script and all for sharing the raid5 and scrubbing > issues. I have been using two raid5 arrays and ran scrub

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-07-17 Thread Jarkko Lavinen
On Sat, Jul 16, 2016 at 06:51:11PM +0300, Jarkko Lavinen wrote: > The modified script behaves very much like the original dd version. Not quite. The bad sector simulation works like old hard drives without error correction and bad block remapping. This changes the error behaviour. My script pri

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-07-16 Thread Jarkko Lavinen
On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: > Using "btrfs insp phy" I developed a script to trigger the bug. Thank you for the script and all for sharing the raid5 and scrubbing issues. I have been using two raid5 arrays and ran scrub occasionally without any problems lately and been in

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Andrei Borzenkov
15.07.2016 19:29, Chris Mason пишет: > >> However I have to point out that this kind of test is very >> difficult to do: the file-cache could lead to read an old data, so please >> suggestion about how flush the cache are good (I do some sync, >> unmount the filesystem and perform "echo 3 >/proc/sy

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Goffredo Baroncelli
On 2016-07-15 06:39, Andrei Borzenkov wrote: > 15.07.2016 00:20, Chris Mason пишет: >> >> >> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: >>> Hi All, >>> >>> I developed a new btrfs command "btrfs insp phy"[1] to further >>> investigate this bug [2]. Using "btrfs insp phy" I developed a scrip

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Chris Mason
On 07/15/2016 12:28 PM, Goffredo Baroncelli wrote: On 2016-07-14 23:20, Chris Mason wrote: On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: Hi All, I developed a new btrfs command "btrfs insp phy"[1] to further investigate this bug [2]. Using "btrfs insp phy" I developed a script to trig

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Goffredo Baroncelli
On 2016-07-14 23:20, Chris Mason wrote: > > > On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: >> Hi All, >> >> I developed a new btrfs command "btrfs insp phy"[1] to further >> investigate this bug [2]. Using "btrfs insp phy" I developed a >> script to trigger the bug. The bug is not always t

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Chris Mason
On 07/15/2016 11:10 AM, Andrei Borzenkov wrote: 15.07.2016 16:20, Chris Mason пишет: Interesting, thanks for taking the time to write this up. Is the failure specific to scrub? Or is parity rebuild in general also failing in this case? How do you rebuild parity without scrub as long as a

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Andrei Borzenkov
15.07.2016 16:20, Chris Mason пишет: >>> >>> Interesting, thanks for taking the time to write this up. Is the >>> failure specific to scrub? Or is parity rebuild in general also failing >>> in this case? >>> >> >> How do you rebuild parity without scrub as long as all devices appear to >> be pres

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Chris Mason
On 07/15/2016 12:39 AM, Andrei Borzenkov wrote: 15.07.2016 00:20, Chris Mason пишет: On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: Hi All, I developed a new btrfs command "btrfs insp phy"[1] to further investigate this bug [2]. Using "btrfs insp phy" I developed a script to trigger th

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-14 Thread Andrei Borzenkov
15.07.2016 00:20, Chris Mason пишет: > > > On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: >> Hi All, >> >> I developed a new btrfs command "btrfs insp phy"[1] to further >> investigate this bug [2]. Using "btrfs insp phy" I developed a script >> to trigger the bug. The bug is not always trigg

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-14 Thread Chris Mason
On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: Hi All, I developed a new btrfs command "btrfs insp phy"[1] to further investigate this bug [2]. Using "btrfs insp phy" I developed a script to trigger the bug. The bug is not always triggered, but most of time yes. Basically the script cre

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-27 Thread Duncan
Steven Haigh posted on Mon, 27 Jun 2016 13:21:00 +1000 as excerpted: > I'd also recommend updates to the ArchLinux wiki - as for some reason I > always seem to end up there when searching for a certain topic... Not really btrfs related, but for people using popular search engines, at least, this

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-27 Thread Christoph Anton Mitterer
On Mon, 2016-06-27 at 07:35 +0300, Andrei Borzenkov wrote: > The problem is that current implementation of RAID56 puts exactly CoW > data at risk. I.e. writing new (copy of) data may suddenly make old > (copy of) data inaccessible, even though it had been safely committed > to > disk and is now in

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-26 Thread Andrei Borzenkov
27.06.2016 06:50, Christoph Anton Mitterer пишет: > > What should IMHO be done as well is giving a big fat warning in the > manpages/etc. that when nodatacow is used RAID recovery cannot produce > valid data (at least as long as there isn't checksumming implemented > for nodatacow). The problem i

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-26 Thread Christoph Anton Mitterer
On Sun, 2016-06-26 at 15:33 -0700, ronnie sahlberg wrote: > 1, a much more strongly worded warning in the wiki. Make sure there > are no misunderstandings > that they really should not use raid56 right now for new filesystems. I doubt most end users can be assumed to read the wiki... > 2, Instead

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-26 Thread Steven Haigh
On 2016-06-27 08:38, Hugo Mills wrote: On Sun, Jun 26, 2016 at 03:33:08PM -0700, ronnie sahlberg wrote: On Sat, Jun 25, 2016 at 7:53 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Could this explain why people have been reporting so many raid56 mode > cases of btrfs replacing a first drive appearing

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-26 Thread Steven Haigh
On 2016-06-27 08:33, ronnie sahlberg wrote: On Sat, Jun 25, 2016 at 7:53 PM, Duncan <1i5t5.dun...@cox.net> wrote: Chris Murphy posted on Sat, 25 Jun 2016 11:25:05 -0600 as excerpted: Wow. So it sees the data strip corruption, uses good parity on disk to fix it, writes the fix to disk, recompu

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-26 Thread Hugo Mills
On Sun, Jun 26, 2016 at 03:33:08PM -0700, ronnie sahlberg wrote: > On Sat, Jun 25, 2016 at 7:53 PM, Duncan <1i5t5.dun...@cox.net> wrote: > > Could this explain why people have been reporting so many raid56 mode > > cases of btrfs replacing a first drive appearing to succeed just fine, > > but then

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-26 Thread ronnie sahlberg
On Sat, Jun 25, 2016 at 7:53 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Chris Murphy posted on Sat, 25 Jun 2016 11:25:05 -0600 as excerpted: > >> Wow. So it sees the data strip corruption, uses good parity on disk to >> fix it, writes the fix to disk, recomputes parity for some reason but >> does i

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-26 Thread Chris Murphy
On Sun, Jun 26, 2016 at 3:20 AM, Goffredo Baroncelli wrote: > On 2016-06-26 00:33, Chris Murphy wrote: >> On Sat, Jun 25, 2016 at 12:42 PM, Goffredo Baroncelli >> wrote: >>> On 2016-06-25 19:58, Chris Murphy wrote: >>> [...] > Wow. So it sees the data strip corruption, uses good parity on dis

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-26 Thread Goffredo Baroncelli
On 2016-06-26 00:33, Chris Murphy wrote: > On Sat, Jun 25, 2016 at 12:42 PM, Goffredo Baroncelli > wrote: >> On 2016-06-25 19:58, Chris Murphy wrote: >> [...] Wow. So it sees the data strip corruption, uses good parity on disk to fix it, writes the fix to disk, recomputes parity for some

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-25 Thread Duncan
Chris Murphy posted on Sat, 25 Jun 2016 11:25:05 -0600 as excerpted: > Wow. So it sees the data strip corruption, uses good parity on disk to > fix it, writes the fix to disk, recomputes parity for some reason but > does it wrongly, and then overwrites good parity with bad parity? > That's fucked.

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-25 Thread Chris Murphy
On Sat, Jun 25, 2016 at 12:42 PM, Goffredo Baroncelli wrote: > On 2016-06-25 19:58, Chris Murphy wrote: > [...] >>> Wow. So it sees the data strip corruption, uses good parity on disk to >>> fix it, writes the fix to disk, recomputes parity for some reason but >>> does it wrongly, and then overwri

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-25 Thread Goffredo Baroncelli
On 2016-06-25 19:58, Chris Murphy wrote: [...] >> Wow. So it sees the data strip corruption, uses good parity on disk to >> fix it, writes the fix to disk, recomputes parity for some reason but >> does it wrongly, and then overwrites good parity with bad parity? > > The wrong parity, is it valid f

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-25 Thread Chris Murphy
On Sat, Jun 25, 2016 at 11:25 AM, Chris Murphy wrote: > On Sat, Jun 25, 2016 at 6:21 AM, Goffredo Baroncelli > wrote: > >> 5) I check the disks at the offsets above, to verify that the data/parity is >> correct >> >> However I found that: >> 1) if I corrupt the parity disk (/dev/loop2), scrub d

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-25 Thread Chris Murphy
On Sat, Jun 25, 2016 at 6:21 AM, Goffredo Baroncelli wrote: > 5) I check the disks at the offsets above, to verify that the data/parity is > correct > > However I found that: > 1) if I corrupt the parity disk (/dev/loop2), scrub don't find any > corruption, but recomputed the parity (always cor