Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

Richard Elling Tue, 04 Mar 2008 10:31:55 -0800

[slightly different angle below...]

Nathan Kroenert wrote:
> Hey, Bob,
>
> Though I have already got the answer I was looking for here, I thought 
> I'd at least take the time to provide my point of view as to my *why*...
>
> First: I don't think any of us have forgotten the goodness that ZFS's 
> checksum *can* bring.
>
> I'm also keenly aware that we have some customers running HDS / EMC 
> boxes who disable the ZFS checksum by default because they 'don't want 
> to have files break due to a single bit flip...' and they really don't 
> care where the flip happens, and they don't want to 'waste' disks or 
> bandwidth allowing ZFS to do it's own protection when they already pay 
> for it inside their zillion dollar disk box. (Some say waste, some call 
> it insurance... ;). Oracle users in particular seem to have this 
> mindset, though that's another thread entirely. :)
>


If you look at the zfs-discuss archives, you will find anecdotes
of failing raid arrays (yes, even expensive ones) and SAN switches
causing corruption which was detected by ZFS.  A telltale sign of
borken hardware is someone complaining that ZFS checksums are
borken, only to find out their hardware is at fault.

As for Oracle, modern releases of the Oracle database also have
checksumming enabled by default, so there is some merit to the
argument that ZFS checksums are redundant.  IMNSHO, ZFS is
not being designed to replace ASM.

> I'd suspect we don't hear people whining about single bit flips, because 
> they would not know if it's happening unless the app sitting on top had 
> it's own protection. Or - if the error is obvious, or crashes their 
> system... Or if they were running ZFS, but at this stage, we cannot 
> delineate between single bit or massively crapped out errors, so what's 
> to say we are NOT seeing it?
>
> Also - Don't assume bit rot on disk is the only way we can get single 
> bit errors.
>
> Considering that until very recently (and quite likely even now to a 
> reasonable extent), most CPU's did not have data protection in *every* 
> place data transited through, single bit flips are still a very real 
> possibility, and becoming more likely as process shrinks continue. 
> Granted, on CPU's with Register Parity protection, undetected doubles 
> are more likely to 'slip under the radar', as registers are typically 
> protected with parity at best, if at all... A single bit in the parity 
> protected register will be detected, a double won't.
>   

It depends on the processor.  Most of the modern SPARC processors
have extensive error detection and correction inside.  But processors
are still different than memories in that the time a datum resides in a
single location is quite short.  We worry more about random data
losses when the datum is stored in one place for a long time, which
is why you see different sorts of data protection at the different layers
of a system design.  To put this in more mathematical terms, there is
a failure rate for each failure mode, but your exposure to the failure
mode is time bounded.

> It does seem that some of us are getting a little caught up in disks and 
> their magnificence in what they write to the platter and read back, and 
> overlooking the potential value of a simple (though potentially 
> computationally expensive) circus trick, which might, just might, make 
> your broken 1TB archive useful again...
>
> I don't think it's a good idea for us to assume that it's OK to 'leave 
> out' potential goodness for the masses that want to use ZFS in 
> non-enterprise environments like laptops / home PC's, or use commodity 
> components in conjunction with the Big Stuff... (Like white box PC's 
> connected to an EMC or HDS box... )
>
> Anyhoo - I'm glad we have pretty much already done this work once 
> before. It gives me hope that we'll see it make a comeback. ;)
>
> (And I look forward to Jeff & Co developing a hyper cool way of 
> generating 128000000 checksums using all 64 threads of a Niagara 2, 
> using the same source data in cache, so we don't need to hit memory, so 
> that it happens in the blink of an eye. or two. ok - maybe three... ;) 
> Maybe we could also use the SPU's as well... OK - So, I'm possibly 
> dreaming here, but hell, if I'm dreaming, why not dream big. :)
>   

I sense that the requested behaviour here is to be able to
get to the corrupted contents of a file, even if we know it
is corrupted.  I think this is a good idea because:

1. The block is what is corrupted, not necessarily my file.
   A single block may contain several files which are grouped
   together, checksummed, and written to disk.

2.  The current behaviour of returning EIO when read()ing a
   file up to the (possible) corruption point is rather irritating,
   but probably the right thing to do.  Since we know the
   files affected, we could write a savior, providing we can
   get some reasonable response other than EIO.
   As Jeff points out, I'm not sure that automatic repair is
   the right answer, but a manual savior might work better
   than restore from backup.

Note: some apps can handle partially missing files.  Others
do things like zip everything together (eg. StarOffice), which
makes manual recover difficult.

Also note: the checksums don't have enough information to
recreate the data for very many bit changes.  Hashes might,
but I don't know anyone using sha256.

now, where was that intern hiding? ... :-)
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

Reply via email to