Re: [zfs-discuss] triple-parity: RAID-Z3

Adam Leventhal Wed, 22 Jul 2009 00:12:41 -0700

Don't hear about triple-parity RAID that often:
I agree completely. In fact, I have wondered (probably in theseforums), why we don't bite the bullet and make a generic raidzN,where N is any number >=0.

I agree, but raidzN isn't simple to implement and it's potentiallydifficultto get it to perform well. That said, it's something I intend to bringto

ZFS in the next year or so.

If memory serves, the second parity is calculated using Reed-Solomonwhich implies that any number of parity devices is possible.


True; it's a degenerate case.

In fact, get rid of mirroring, because it clearly is a variant ofraidz with two devices. Want three way mirroring? Call that raidz2with three devices. The truth is that a generic raidzN would rollup everything: striping, mirroring, parity raid, double parity, etc.into a single format with one parameter.

That's an interesting thought, but there are some advantages tocalling out mirroring for example as its own vdev type. As has beenpointed out, reading from either side of the mirror involves nocomputation whereas reading from a RAID-Z 1+2 for example wouldinvolve more computation. This would

complicate the calculus of balancing read operations over the mirror
devices.

Let's not stop there, though. Once we have any number of paritydevices, why can't I add a parity device to an array? That shouldbe simple enough with a scrub to set the parity. In fact, what isto stop me from removing a parity device? Once again, I think thecode would make this rather easy.

With RAID-Z stripes can be of variable width meaning that, say, asingle rowin a 4+2 configuration might have two stripes of 1+2. In other words,theremight not be enough space in the new parity device. I did write up thesteps

that would be needed to support RAID-Z expansion; you can find it here:

  http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z

Ok, back to the real world. The one downside to triple parity isthat I recall the code discovered the corrupt block by excluding itfrom the stripe, reconstructing the stripe and comparing that withthe checksum. In other words, for a given cost of X to compute astripe and a number P of corrupt blocks, the cost of reading astripe is approximately X^P. More corrupt blocks would radicallyslow down the system. With raidz2, the maximum number of corruptblocks would be two, putting a cap on how costly the read can be.

Computing the additional parity of triple-parity RAID-Z is slightlymore expensive, but not much -- it's just bitwise operations.Recovering froma read failure is identical (and performs identically) to raidz1 orraidz2until you actually have sustained three failures. In that case,performanceis slower as more computation is involved -- but aren't you just happyto

get your data back?

If there is silent data corruption, then and only then can you encounter

the O(n^3) algorithm that you alluded to, but only as a last resort.If wedon't know what drives failed, we try to reconstruct your data byassumingthat one drive, then two drives, then three drives are returning baddata.For raidz1, this was a linear operation; raidz2, quadratic; now raidz3isN-cubed. There's really no way around it. Fortunately with properscrubbing

encountering data corruption in one stripe on three different drives is
highly unlikely.

Adam

--
Adam Leventhal, Fishworks                        http://blogs.sun.com/ahl

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] triple-parity: RAID-Z3

Reply via email to