On 05/12/2013 09:16 AM, Andrew Hume wrote:

On May 12, 2013, at 6:20 AM, Edward Ned Harvey (lopser) wrote:

Without checking the internet, and before you listen to other peoples' anecdotes or anything, I'd like to hear your gut feel, I want to know what your natural instinct is. What do you think about the reliability of the following tools?

every one of these tools is incomplete by themselves.
(some might say i come from a paranoid viewpoint.)

anything that does not implement end-to-end checksumming is defective.
that, to me, implies
* a simple reliable path to capturing the checksum of the original contents
* a simple way to verify the checksum of the recovered contents.

i know of no backup system that does, or ever did.
that is why i have written a few backup systems myself.
sometimes, this means just putting wrappers around existing tools.
and while it doesn't hurt to check return codes, they are neither
necessary nor sufficient for correct results; i therefore mostly ignore them.

given the recent availability of erasure code systems, if i were doing this
now, i would devise  an end-to-end scheme around erasure code tools,
which would give you enormous reliability.


I think checksumming has a place in backup/archive systems, but I'm not sure that end-to-end checksumming will allow sufficient scalability, at least with current filesystem technology. At $WORK, if we had to checksum each file on each filesystem we backup, I doubt we could complete our backups in our window, and it would likely be a significant disk and network performance impact as well. We scan 300+ million files and 4+PB nightly, with about 10TB of changed data, and I doubt we could read and generate checksum data for each file. Even if we depend on inode information to find candidate files, and only checksum changed files based on that data, it would be a challenge.

What I think /could/ work, though, is if checksumming filesystems like ZFS could expose the checksum data to user applications (like backup clients), and then applications could use that data without generating it for themselves.

Skylar
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to