> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Bertrand Augereau > > is there a way to compute very quickly some hash of a file in a zfs? > As I understand it, everything is signed in the filesystem, so I'm > wondering if I can avoid reading whole files with md5sum just to get a > unique hash. Seems very redundant to me :)
If I understand right: Although zfs is calculating hashes of blocks, it doesn't correlate to hashes of files, for many reasons: Block boundaries are not well aligned with file boundaries. A single block might encapsulate several small files, or a file might start in the middle of a block, span several more, and end in the middle of another block. Blocks also contain non-file information. Hashing blocks will be even more irrelevant to file hashes, if you have compression enabled, because I think it hashes the compressed data, not the uncompressed data. If you want to create file hashes out of block hashes, it's even more convoluted. Because you can't generally compute hash(A+B) based on hash(A) and hash(B). Although perhaps you can for some algorithms. My advice would be: Computing hashes is not very expensive, as long as you're just computing hashes for data that you were going to handle for other reasons anyway. Specifically, I benchmarked several hash algorithms a while back, and found ... I forget which ... either adler32 or crc is almost zero-time to compute ... that is ... the cpu was very lightly utilized while hashing blocks at maximum disk speed. The weakness of adler32 and crc is that they're not cryptographic hashes. If a malicious person wants to corrupt a data stream while preserving the hash, it's not difficult to do. adler32 and crc are good as long as you can safely assume no malice. md5 is significantly slower (but surprisingly not much slower) and it's a cryptographic hash. Probably not necessary for your needs. And one more thing. No matter how strong your hash is, unless your hash is just as big as your file, collisions happen. Don't assume data is the same just because hash is the same, if you care about your data. Always byte-level verify every block or file whose hash matches some other hash. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss