2012-01-15 20:06, Peter Tribble wrote:
(Try writing over one
half of a zfs mirror with dd and watch it cheerfully repair your data
without an actual error in sight.)
Are you certain it always works?
AFAIK, mirror reads are round-robined (which leads to parallel
read performance boosts). Only if your read hits the mismatch,
the block would be reconstructed from another copy.
And scrubs are one mechanism to force such reads of all copies
of all blocks and trigger reconstructions as needed.
1) How does raidzN protect agaist bit-rot without known full
death of a component disk, if it at all does?
Or does it only help against "loud corruption" where the
disk reports a sector-access error or dies completely?
2) Do the "leaf blocks" (on-disk sectors or ranges of sectors
that belong to a raidzN stripe) have any ZFS checksums of
their own? That is, can ZFS determine which of the disks
produced invalid data and reconstruct the whole stripe?
No, the checksum is against the whole stripe. And you do the
combinatorial reconstruction to work out which is bad.
Hmmm, in this case, how does ZFS precisely know which disks
contain which sector ranges of variable-width stripes?
In Max Bruning's weblog I saw a reference to kernel routine
vdev_raidz_map_alloc(). Without a layer of pointers to sectors
with data on each physical vdev (and, as I hoped, such layer
might contain checksums or ECCs), it seems like a fundamental
unchangeable part of ZFS raidz. Is it true?
2**) Alternatively, how does raidzN get into situation like
"I know there is an error somewhere, but don't know where"?
Does this signal simultaneous failures in different disks
of one stripe?
If you have raidz1, and two devices give bad data, then you don't
have enough redundancy to do the reconstruction. I've not seen this
myself for random bitrot, but it's the sort of thing that can give you
trouble if you lose a whole disk and then hit a bad block on another
device during resilver.
(Regular scrubs to identify and fix bad individual blocks before you have
to do a resilver are therefore a good thing.)
That's what I did more or less regularly. Then one nice scrub
gave me such a condition... :(
How *do* some things get fixed then - can only dittoed data
or metadata be salvaged from second good copies on raidZ?
You can recover anything you have enough redundancy for. Which
means everything, up to the redundancy of the vdev. Beyond that,
you may be able to recover dittoed data (of which metadata is just
one example) even if you've lost an entire vdev.
And, now, with my one pool-level error and two raidz-level
errors, is it correct to interpret that attempts to read
both dittoed copies of pool:<metadata>:<0x0>, whatever
that is, have failed?
In particular, shouldn't the metadata redundancy (mirroring
and/or copies=2 over raidz or over its component disks) point
to specific disks that contained the block and failed to
produce it correctly?..
Thanks all for the replies,
//Jim Klimov
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss