Don't hear about triple-parity RAID that often:

I agree completely. In fact, I have wondered (probably in these forums), why we don't bite the bullet and make a generic raidzN, where N is any number >=0.

I agree, but raidzN isn't simple to implement and it's potentially difficult to get it to perform well. That said, it's something I intend to bring to
ZFS in the next year or so.

If memory serves, the second parity is calculated using Reed-Solomon which implies that any number of parity devices is possible.

True; it's a degenerate case.

In fact, get rid of mirroring, because it clearly is a variant of raidz with two devices. Want three way mirroring? Call that raidz2 with three devices. The truth is that a generic raidzN would roll up everything: striping, mirroring, parity raid, double parity, etc. into a single format with one parameter.

That's an interesting thought, but there are some advantages to calling out mirroring for example as its own vdev type. As has been pointed out, reading from either side of the mirror involves no computation whereas reading from a RAID-Z 1+2 for example would involve more computation. This would
complicate the calculus of balancing read operations over the mirror
devices.

Let's not stop there, though. Once we have any number of parity devices, why can't I add a parity device to an array? That should be simple enough with a scrub to set the parity. In fact, what is to stop me from removing a parity device? Once again, I think the code would make this rather easy.

With RAID-Z stripes can be of variable width meaning that, say, a single row in a 4+2 configuration might have two stripes of 1+2. In other words, there might not be enough space in the new parity device. I did write up the steps
that would be needed to support RAID-Z expansion; you can find it here:

  http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z

Ok, back to the real world. The one downside to triple parity is that I recall the code discovered the corrupt block by excluding it from the stripe, reconstructing the stripe and comparing that with the checksum. In other words, for a given cost of X to compute a stripe and a number P of corrupt blocks, the cost of reading a stripe is approximately X^P. More corrupt blocks would radically slow down the system. With raidz2, the maximum number of corrupt blocks would be two, putting a cap on how costly the read can be.

Computing the additional parity of triple-parity RAID-Z is slightly more expensive, but not much -- it's just bitwise operations. Recovering from a read failure is identical (and performs identically) to raidz1 or raidz2 until you actually have sustained three failures. In that case, performance is slower as more computation is involved -- but aren't you just happy to
get your data back?

If there is silent data corruption, then and only then can you encounter
the O(n^3) algorithm that you alluded to, but only as a last resort. If we don't know what drives failed, we try to reconstruct your data by assuming that one drive, then two drives, then three drives are returning bad data. For raidz1, this was a linear operation; raidz2, quadratic; now raidz3 is N-cubed. There's really no way around it. Fortunately with proper scrubbing
encountering data corruption in one stripe on three different drives is
highly unlikely.

Adam

--
Adam Leventhal, Fishworks                        http://blogs.sun.com/ahl

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to