Re: [zfs-discuss] Data loss by memory corruption?

Nico Williams Wed, 18 Jan 2012 08:38:25 -0800

On Wed, Jan 18, 2012 at 4:53 AM, Jim Klimov <jimkli...@cos.ru> wrote:
> 2012-01-18 1:20, Stefan Ring wrote:
>> I don’t care too much if a single document gets corrupted – there’ll
>> always be a good copy in a snapshot. I do care however if a whole
>> directory branch or old snapshots were to disappear.
>
> Well, as far as this problem "relies" on random memory corruptions,
> you don't get to choose whether your document gets broken or some
> low-level part of metadata tree ;)


Other filesystems tend to be much more tolerant of bit rot of all
types precisely because they have no block checksums.

But I'd rather have ZFS -- *with* redundancy, of course, and with ECC.

It might be useful to have a way to recover from checksum mismatches
by involving a human.  I'm imagining a tool that tests whether
accepting a block's actual contents results in making data available
that the human thinks checks out, and if so, then rewriting that
block.  Some bit errors might simply result in meaningless metadata,
but in some cases this can be corrected (e.g., ridiculous block
addresses).  But if ECC takes care of the problem then why waste the
effort?  (Partial answer: because it'd be a very neat GSoC type
project!)

> Besides, what if that document you don't care about is your account's
> entry in a banking system (as if they had no other redundancy and
> double-checks)? And suddenly you "don't exist" because of some EIOIO,
> or your balance is zeroed (or worse, highly negative)? ;)

This is why we have paper trails, logs, backups, redundancy at various
levels, ...

Nico
--
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Data loss by memory corruption?

Reply via email to