[zfs-discuss] Block unification in ZFS
Hello list. I have a storage server running ZFS which primarily is used for storing on-site mirrors of source trees and interesting sites (textfiles.com and bitsavers.org, for example) and for backups of local machines. There are several (small) problems with the otherwise ideal picture: - Some mirrors include sparse or slightly stale copies of others. - Not all of the local machines are always networked (laptops), and their backups tend to have duplicated data wrt the rest of the system. - My pre-ZFS backup tarballs are in a similar state. Therefore, I wonder if something like block unification (which seems to be an old idea, though I know of it primarily through Venti[1]) would be useful to ZFS. Since ZFS checksums all of the data passing through it, it seems natural to hook those checksums and have a hash table from checksum to block pointer. It would seem that one could write a shim vdev which used the ZAP and a host vdev to store this hash table and could inform the higher layers that, when writing a block, that they should simply alias an earlier block (and increment its reference count -- already there for snapshots -- appropriately; naturally if the block's reference count becomes zero, its checksum should be deleted from the hash). The only (slight) complications that leap to mind are: 1. Strictly accounting for used space becomes a little more funny. 2. ZFS wide block pointers (ditto blocks) would have to somehow bypass block unification or risk missing the point. As far as I understand ZFS's on disk structures[2], though, this is not a problem: one copy of the wide block could be stored in the unified vdev and the other two could simply be stored directly in the host vdev. 3. It's possible such an algorithm would miss identical blocks checksummed under different schemes. I think I'm OK with that. 4. Relatedly, one may want to expose a check before unifying option for those who are sufficiently paranoid to fear hash collisions deleting data. Thoughts? Is something like this already possible and I just don't know about it? :) --nwf; [1] http://plan9.bell-labs.com/sys/doc/venti.html [2] I'm aware of http://opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf but if there's a more recent version available or if I've grossly mistook something therein, please let me know. P.S. This message is sent via opensolaris.org; I originally sent a slightly earlier version via SMTP and received a notice that a moderator would look at it, however that copy seems to have gotten lost. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Block unification in ZFS
I was just thinking of a similar feature request: one of the things I'm doing is hosting vm's. I build a base vm with standard setup in a dedicated filesystem, then when I need a new instance zfs clone and voila! ready to start tweaking for the needs of the new instance, using a fraction of the space. Until update time. It still saves space, but it would be nice if there were a way to identify the common blocks. I realize it's a double whammy because vms just look like big monolithic files to the base filesystem, whereas normally you might simply look for identical files to map together (though the regular clone mechanism seems to be block based), but something to think about in the nice to haves... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Block unification in ZFS
Therefore, I wonder if something like block unification (which seems to be an old idea, though I know of it primarily through Venti[1]) would be useful to ZFS. Since ZFS checksums all of the data passing through it, it seems natural to hook those checksums and have a hash table from checksum to block pointer. It would seem that one could write a shim vdev which used the ZAP and a host vdev to store this hash table and could inform the higher layers that, when writing a block, that they should simply alias an earlier block (and increment its reference count -- already there for snapshots -- appropriately; naturally if the block's reference count becomes zero, its checksum should be deleted from the hash). De duplication has been discussed many times, but it is not trivial to implement. There are no reference counts for blocks.Blocks have a time stamp that is compared to the creation time of snapshots to work out if it can be freed when you destroy a snapshot. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Block unification in ZFS
See the long thread titled ZFS deduplication, last active approximately 2 weeks ago. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Block unification in ZFS
Alan alan at peak.org writes: I was just thinking of a similar feature request: one of the things I'm doing is hosting vm's. I build a base vm with standard setup in a dedicated filesystem, then when I need a new instance zfs clone and voila! ready to start tweaking for the needs of the new instance, using a fraction of the space. This is OT but FYI some virtualization apps have built-in support for exactly what you want, you can create disk images that share identical blocks between themselves. In Qemu/KVM this feature copy-on-write disk images: $ qemu-img create -b base_image -f qcow2 new_image In Microsoft Virtual Server, there is also an equivalent feature but I can't recall how it is called. -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss