I think that part of the problem is that a hash duplication is nearly undetectable until you have restored and tested it as false.
We all know that 99.999% of what we back up is never restored. It just ages gracefully on media and is expired. If any of that .001% is restored and is damaged due to a tape fault (and we've all had it happen) then we all know that we can usually reach back to a different version or different tape and we'll be close enough to make the user go away and let us return to our coffee and surfing. I think a big part of the worry of a hash collision is that the restore seems to happen, the file restores flawlessly, and it'll not be detectable unless someone can checksum the whole file or it's a binary or similar that simply refuses to work. Again, restoring from a different tape, different version may be ineffective depending on where the hash collision occurred and for what reason. Every version may use this same unchanging block which is restore incorrectly due to an invalid hash match. I know the odds are astronomical but I still remember that even though the odds are 150 million to one I'll win the lottery, I still see smiling faces on TV holding giant checks. It's a bet, like all other restore techniques, and I'm going to make sure management has full knowledge of the risks before we implement it here (which is likely). -M -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Monday, October 22, 2007 10:28 AM To: Austin Murphy; [email protected] Subject: Re: [Veritas-bu] Tapeless backup environments This paper looks to be 5 years old (based on newest references it cites - it actually cites others that go back nearly 10 years). It would be interesting to see his take on current deduplication offerings to see if the other checks they contain over simple hashing were enough to allay his concerns. One thing I've not seen in all this discussion is anyone saying they've actually experienced data loss as a result of commercial deduplication devices. Can anyone here claim that? -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Austin Murphy Sent: Monday, October 22, 2007 10:47 AM To: [email protected] Subject: Re: [Veritas-bu] Tapeless backup environments Here is some required reading on the topic from Val Henson, a noted academic/storage-guru. An Analysis of Compare-by-hash www.nmt.edu/~val/review/hash.pdf Of particular interst is why hardware error rates can't be compared with deterministic software errors. Austin _______________________________________________ Veritas-bu maillist - [email protected] http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ---------------------------------- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. ---------------------------------- _______________________________________________ Veritas-bu maillist - [email protected] http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu _______________________________________________ Veritas-bu maillist - [email protected] http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
