Don't hear about triple-parity RAID that often:
I agree completely. In fact, I have wondered (probably in these
forums), why we don't bite the bullet and make a generic raidzN,
where N is any number >=0.
I agree, but raidzN isn't simple to implement and it's potentially
difficult
to get it to perform well. That said, it's something I intend to bring
to
ZFS in the next year or so.
If memory serves, the second parity is calculated using Reed-Solomon
which implies that any number of parity devices is possible.
True; it's a degenerate case.
In fact, get rid of mirroring, because it clearly is a variant of
raidz with two devices. Want three way mirroring? Call that raidz2
with three devices. The truth is that a generic raidzN would roll
up everything: striping, mirroring, parity raid, double parity, etc.
into a single format with one parameter.
That's an interesting thought, but there are some advantages to
calling out mirroring for example as its own vdev type. As has been
pointed out, reading from either side of the mirror involves no
computation whereas reading from a RAID-Z 1+2 for example would
involve more computation. This would
complicate the calculus of balancing read operations over the mirror
devices.
Let's not stop there, though. Once we have any number of parity
devices, why can't I add a parity device to an array? That should
be simple enough with a scrub to set the parity. In fact, what is
to stop me from removing a parity device? Once again, I think the
code would make this rather easy.
With RAID-Z stripes can be of variable width meaning that, say, a
single row
in a 4+2 configuration might have two stripes of 1+2. In other words,
there
might not be enough space in the new parity device. I did write up the
steps
that would be needed to support RAID-Z expansion; you can find it here:
http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z
Ok, back to the real world. The one downside to triple parity is
that I recall the code discovered the corrupt block by excluding it
from the stripe, reconstructing the stripe and comparing that with
the checksum. In other words, for a given cost of X to compute a
stripe and a number P of corrupt blocks, the cost of reading a
stripe is approximately X^P. More corrupt blocks would radically
slow down the system. With raidz2, the maximum number of corrupt
blocks would be two, putting a cap on how costly the read can be.
Computing the additional parity of triple-parity RAID-Z is slightly
more expensive, but not much -- it's just bitwise operations.
Recovering from
a read failure is identical (and performs identically) to raidz1 or
raidz2
until you actually have sustained three failures. In that case,
performance
is slower as more computation is involved -- but aren't you just happy
to
get your data back?
If there is silent data corruption, then and only then can you encounter
the O(n^3) algorithm that you alluded to, but only as a last resort.
If we
don't know what drives failed, we try to reconstruct your data by
assuming
that one drive, then two drives, then three drives are returning bad
data.
For raidz1, this was a linear operation; raidz2, quadratic; now raidz3
is
N-cubed. There's really no way around it. Fortunately with proper
scrubbing
encountering data corruption in one stripe on three different drives is
highly unlikely.
Adam
--
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss