On 07/24/09 04:35 PM, Bob Friesenhahn wrote:
 Regardless, it [VirtualBox] has committed a crime.

But ZFS is a journalled file system! Any hardware can lose a flush;
it's just more likely in a VM, especially when anything Microsoft
is involved, and the whole point of journalling is to prevent things
like this happening. However the issue is moot since CR 6667683 is
being addressed. Here's a related thought - does it make sense to
mirror ZFS on iscsi if the host drives are themselves ZFS mirrors?

The whole question of the requirement for ECC depends on your
tolerance for loss of files vs. errors in files. As Richard
Elling points out, there are other sources of error (e.g.,
no checking of PCI parity). But that isn't relevant to the ECC
on main memory question. You can disable checksumming, and then
ZFS is no worse in this regard than any other file system; bad
files get read and you either notice or you don't, but you won't
lose any because of fatal checksum errors and you still have all
the other great features of ZFS,

If you don't mirror, all bets are off. You should set copies=2 or
higher and cross your fingers. You should also disable file
checksumming in ZFS and in that sense degenerate to the behavior
of lesser file systems. However mirroring doesn't buy you much
here because it evidently doesn't double buffer the write before
calculating the checksum, so a stray bitflip can cause metatdata or
data corruption, causing a mirrored file to have an unrecoverable
checksum failure (of course there are many other reasons to mirror).

The real question is - what is the probability of this occurring?
IMO the typical SOHO user has a 1 in 10 to 1 in 100 chance of this
happening in a year of reasonably constant operation (a few dozen
writes/day). I believe that this can be mitigated by setting
copies=2, a good idea anyway if you have biggish disks since, as
Richard Elling has pointed out in his excellent blogs, if you need
to resilver after a disk failure you have a rather large possibility
of a disk read error causing file loss and copies=2 also mitigates
this. Note that hopefully fixing CR 6667683 should eliminate any
possibility of losing an entire mirrored or raidz pool.

So, it seem to me ZFS has a definite dependency on ECC for reliable
operation. However, for non-commercial uses (i.e., less than an
hour or so a day of writes) the probability of losing a file is
fairly small and can be mitigated still further by setting copies=2.
But to eliminate the possibility entirely, you must have ECC. You
should also make sure that the buses have at least parity if not
ECC and that this is actually checked - maybe Richard can comment
on this since I believe he thinks this is a more likely source
of errors.

HTH -- Frank







_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to