On May 21, 2011, at 11:56 AM, Eugen Leitl <eu...@leitl.org> wrote:

> ----- Forwarded message from Zooko O'Whielacronx <zo...@zooko.com> -----
> 
> From: Zooko O'Whielacronx <zo...@zooko.com>
> Date: Sat, 21 May 2011 12:50:19 -0600
> To: Crypto discussion list <cryptogra...@randombit.net>
> Subject: Re: [cryptography] rolling hashes, EDC/ECC vs MAC/MIC, etc.
> Reply-To: Crypto discussion list <cryptogra...@randombit.net>
> 
> Dear Nico Williams:
> 
> Thanks for the reference! Very cool.
> 
> What I would most want is for ZFS (and every other filesystem) to
> maintain a Merkle Tree over the file data with a good secure hash.
> Whenever a change to a file is made, the filesystem can update the
> Merkle Tree this with mere O(log(N)) work in the size of the file plus
> O(N) work in the size of the change. For a modern filesystem like ZFS
> which is already maintaining a checksum tree the *added* cost of
> maintaining the secure hash Merkle Tree could be minimal.

ZFS already tracks the blocks that have been written, and the time that
they were written. So we already know when something was writtem, though
that does not answer the question of whether the data was changed. I think
it is a pretty good bet that newly written data is different :-)

> 
> Then, the filesystem should make this Merkle Tree available to
> applications through a simple query.

Something like "zfs diff" ?

> 
> This would enable applications—without needing any further
> in-filesystem code—to perform a Merkle Tree sync, which would range
> from "noticeably more efficient" to "dramatically more efficient" than
> rsync or zfs send. :-)

Since ZFS send already has an option to only send the changed blocks,
I disagree with your assertion that your solution will be "dramatically
more efficient" than zsf send. We already know zfs send is much more
efficient than rsync for large file systems.
 -- richard

> 
> Of course it is only more efficient because we're treating the
> maintenance of the secure-hash Merkle Tree as free. There are two
> senses in which this is legitimate and it is almost free:
> 
> 1. Since the values get maintained persistently over the file's
> lifetime then the total computation required is approximately O(N)
> where N is the total size of all deltas that have been applied to the
> file in its life. (Let's just drop the logarithmic part for now,
> because see 2. below.)
> 
> Compare this to the cost of doing a fast, insecure CRC over the whole
> file such as in rsync. The cost of that is O(N) * K where N is the
> (then current) size of the file and K is the number of times you run
> rsync on that file.
> 
> The extreme case is if the file hasn't changed. Then for the
> application-level code to confirm that the file on this machine is the
> same as the file on that machine, it merely has to ask the filesystem
> for the root hash on each machine and transmit that root hash over the
> network. This is optimally fast compared to rsync, and unlike "zfs
> send|recv" it is optimally fast whenever the two files are identical
> even if they have both changed since the last time they were synced.
> 
> 2. Since the modern, sophisticated filesystem like ZFS is maintaining
> a tree of checksums over the data *anyway* you can piggy-back this
> computation onto that work, avoiding any extra seeks and minimizing
> extra memory access.
> 
> In fact, ZFS itself can actually use SHA-256 for the checksum tree,
> which would make it almost provide exactly what I want, except for:
> 
> 2. a. From what I've read, nobody uses the SHA-256 configuration in
> ZFS because it is too computationally expensive, so they use an
> insecure checksum (fletcher2/4) instead.
> 
> 2. b. I assume the shape of the resulting checksum tree is modified by
> artifacts of the ZFS layout instead of being a simple canonical shape.
> This is a show-stopper for this use case because if the same file data
> exists on a different system, and some software on that system
> computes a Merkle Tree over the data, it might come out with different
> hashes than the ZFS checksum tree, thus eliminating all of the
> performance benefits of this approach.
> 
> But, if ZFS could be modified to fix these problems or if a new
> filesystem would add a feature of maintaining a canonical,
> reproducible Merkle Tree, then it might be extremely useful.
> 
> Thanks to Brian Warner and Dan Shoutis for discussions about this idea.
> 
> Regards,
> 
> Zooko
> _______________________________________________
> cryptography mailing list
> cryptogra...@randombit.net
> http://lists.randombit.net/mailman/listinfo/cryptography
> 
> ----- End forwarded message -----
> -- 
> Eugen* Leitl <a href="http://leitl.org";>leitl</a> http://leitl.org
> ______________________________________________________________
> ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
> 8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to