>>>>> "as" == Andras Spitzer <wsen...@gmail.com> writes:

    as> So, you telling me that even if the SAN provides redundancy
    as> (HW RAID5 or RAID1), people still configure ZFS with either
    as> raidz or mirror?

There's some experience that, in the case where the storage device or
the FC mesh glitches or reboots while the ZFS host stays up across the
reboot, you are less likely to lose the whole pool to ``ZFS-8000-72
The pool metadata is corrupted and cannot be opened. Destroy the pool
and restore from backup.'' if you have ZFS-level redundancy than if
you don't.

Note that this ``corrupt and cannot be opened'' is a different problem
from ``not being able to self-heal.''  When you need self-healing and
don't have it, you usually shouldn't lose the whole pool.  You should
get a message in 'zpool status' telling you the name of a file that
has unrecoverable errors.  Any attempt to read the file returns an I/O
error (not the marginal data).  Then you have to go delete that file
to clear the error, but otherwise the pool keeps working.  In this
self-heal case, if you'd had the ZFS-layer redundancy you'd get a
count in the checksum column of one device and wouldn't have to delete
the file, in fact you wouldn't even know the name of the file that got
healed.

some people have been trying to blame the ``corrupt and cannot be
opened'' on bit-flips supposedly happening inside the storage or the
FC cloud, the same kind of bit flip that causes the other
self-healable problem, but I don't buy it.  I think it's probably
cache sync / write barrier problems that's killing the unredundant
pools on SAN's.

Attachment: pgpYVvSe908RY.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to