Group, Within this long list and am sure an incomplete list...
Let me add some thoughts for the future by a non ZFS developer.. The immediate items below are just for thought. Ok,, I assume it at the block level, but easily could be wrong.. Aren't ZFS file block when modified: read, copied, then inode updated to point to new block. IMO, based on #3, I don't think you need to support block splitting and if you did, what would prevent heavily modified files/objects degrading to the smallest block supported? Can't a app find holes now? My (immediate) short list is: 1) How are you going to support backward compatibility to remove existing dups? Where dups are located locally and/or network wide. 2) Other than additional code space and code complication, what level of performance degradation, due to what must be some hash lookup, etc added into the code fastpath? 3) With the storage capacities/density rapidly rising and the ability to mirror data for disaster recovery, load balance, allow single digit/lan ms network access time vs wan access times, then how does a single administrator within the LAN determine the level of support/tradeoffs of this new feature within a global co.. 4) What disk/file objects are considered inappropriate for dedup..? 5) How will you support Direct I/O or will you support wrt Direct I/ O? Mitchell Erblich ------------------------------ On Apr 19, 2009, at 1:15 AM, Daniel Carosone wrote: > Awesome news, Jeff. I know you said you'd write about it later, but > I want to pose these questions now for several reasons: > - I'm excited and eager and can't wait :-) > - There may be things we could do now to prepare existing data and > pools for easier dedup later > - There may be useful hints in here for documentation, test cases, > further RFEs, etc. > > So, in no particular order: > - will it use only the existing checksums, or an additional compare > or method? > - will it depend on using a particular (eg stronger) checksum? would > it help to switch now to that checksum method so blocks written in > the meantime are "ready"? (I'm already concerned about the > fletcher2 implementation thread and will likely switch anyway) > - will it dedup across the entire pool, or only within a dataset? > - will it be enable/disable per dataset? (space vs speed) > - will it interact with copies=>1? especially where dup blocks exist > between datasets that differ in copies= settings? I hope I'd get > new ditto blocks for the highest copies= referrer, but then what > about when that dataset is destroyed and there are more copies than > needed? > - will it interact with compression (i.e, does it dedup source > blocks or on-disk blocks)? If I write the same files to datasets > with differing compression settings, how many copies do I store? > - will it detect only whole blocks with the same alignment, or is > there something I can do to improve detection of smaller duplicate > blocks and split them? > - will there be a way for me to examine files for the "dup > nature" (I'm thinking of something like seeking for holes) at the > app level, to use the information the fs has already discovered? > - will it depend on bp-rewrite at all? (for delivery; I presume bp- > rewrite will be needed to dedup existing blocks, but is there an > implementation dependency that entangles these two somehow, such > that we need to wait for both?) > - will zfs send be able to avoid sending multiple copies of dup data? > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code