budy,
here are some links. Remember, the reason you get corrupted files, is because 
ZFS detects it. Probably, you got corruption earlier as well, but your hardware 
did not notice it. This is called Silent Corruption. But ZFS is designed to 
detect and correct Silent Corruption. Which no normal hardware is designed for.

The thing is, ZFS does end-to-end checksum. The data in RAM, is it identlcal on 
disc? From RAM down to controller to disk. There can be errors in the passing 
between the realms. Normally, there are checksums within each realm (checksums 
on the disc), but no checksums from the beginning of the chaing, to the end: 
end to the end checksums:
http://jforonda.blogspot.com/2007/01/faulty-fc-port-meets-zfs.html

Here are some links. CERN did a data integrity survey on 3000 hw raid and saw 
silent corruptions.
http://storagemojo.com/2007/09/19/cerns-data-corruption-research/


 In another CERN paper, they say "such data corruption is found in all 
solutions, no matter price (even very expensive Enterprise solutions)"!!! From 
that paper (can not find the link now)
"Conclusions
-silent corruptions are a fact of life
-first step towards a solution is detection
-elimination seems impossible
-existing datasets are at the mercy of Murphy
-correction will cost time AND money
-effort has to start now (if not started already)
-multiple cost-schemes exist
--trade time and storage space (à la Google)
--trade time and CPU power (correction codes"

CERN writes: "checksumming - not necessarily enough" you need to use 
"end-to-end checksumming (ZFS has a point)"


See the specifications on a new SAS Enterprise disk, typically it says:
"one irrecoverable error in 10^15 bits". With todays large and fast raids, you 
quickly reach 10^ 15 bits in a short time.


Greenplums database solution faces one such bit every 15 min:
http://queue.acm.org/detail.cfm?id=1317400


Ordinary filesystems such as XFS, ReiserFS, JFS, etc does not protect your 
data, nor detect all errors (here is a PhD thesis link)
http://www.zdnet.com/blog/storage/how-microsoft-puts-your-data-at-risk/169


ZFS data integrity tested by researchers:
http://www.zdnet.com/blog/storage/zfs-data-integrity-tested/811?tag=rbxccnbzd1
(if they had ran zfs raid, ZFS would have corrected all artificially injected 
errors. Now, ZFS only detected all errors - which is very difficult to do. 
First step is detection, then repair the errors)


Companies tries to hide silent corruption:
http://www.enterprisestorageforum.com/sans/features/article.php/3704666


http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt
"When a drive returns garbage, since RAID5 does not EVER check parity on read 
(RAID3 & RAID4 do BTW and both perform better for databases than RAID5 to boot) 
if you write a garbage sector back garbage parity will be calculated and your 
RAID5 integrity is lost! Similarly if a drive fails and one of the remaining 
drives is flaky the replacement will be rebuilt with garbage also propagating 
the problem to two blocks instead of just one."


http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
"The paper explains that the best RAID-6 can do is use probabilistic methods to 
distinguish between single and dual-disk corruption, eg. "there are 95% chances 
it is single-disk corruption so I am going to fix it assuming that, but there 
are 5% chances I am going to actually corrupt more data, I just can't tell". I 
wouldn't want to rely on a RAID controller that takes gambles :-)"


Researchers write regarding hw-raid:
http://www.cs.wisc.edu/adsl/Publications/parity-fast08.html
"We use the model checker to evaluate a number of different approaches found in 
real RAID systems, focusing on parity-based protection and single errors. We 
find holes in all of the schemes examined, where systems potentially exposes 
data to loss or returns corrupt data to the user. In data loss scenarios, the 
error is detected, but the data cannot be recovered, while in the rest, the 
error is not detected and therefore corrupt data is returned to the user. For 
example, we examine a combination of two techniques – block-level checksums 
(where checksums of the data block are stored within the same disk block as 
data and verified on every read) and write-verify (where data is read back 
immediately after it is written to disk and verified for correctness), and show 
that the scheme could still fail to detect certain error conditions, thus 
returning corrupt data to the user.

We discover one particularly interesting and general problem that we call 
parity pollution. In this situation, corrupt data in one block of a stripe 
spreads to other blocks through various parity calculations. We find a number 
of cases where parity pollution occurs, and show how pollution can lead to data 
loss. Specifically, we find that data scrubbing (which is used to reduce the 
chances of double disk failures) tends to be one of themain causes of parity 
pollution."


http://www.cs.wisc.edu/adsl/Publications/corruption-fast08.pdf
"Detecting and recovering from data corruption requires protection techniques 
beyond those provided by the disk drive. In fact, basic protection schemes such 
as RAID [13] may also be unable to detect these problems.
...
as we discuss later, checksums do not protect against all forms of corruption"



http://www.cs.wisc.edu/adsl/Publications/corrupt-mysql-icde10.pdf
"More reliable SCSI drives encounter fewer problems, but even within this 
expensive and carefully-engineered drive class, corruption still takes place."
....
Recent work has shown that even with sophisticated RAID protection strategies, 
the “right” combination of a single fault and certain repair activities (e.g., 
a parity scrub) can still lead to data loss [19]. Thus, while these schemes 
reduce the chances of corruption, the possibility still exists; any 
higher-level client of storage that is serious about managing data reliably 
must consider the possibility that a disk will return data in a corrupted form."
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to