Dan> ... It would still need a complex bp_rewrite.
Are you certain about that? For example, scrubbing/resilvering and fixing corrupt blocks with non-matching checksums is a post-processing operation which works on an existing pool and rewrites some blocks if needed. And it works without a bp_rewrite in place... Basically, you'd need to ensure that a single TXG would include updates to the DDT entry for found unique blocks, and freeing of extra blocks with same data (checksum), and creation of "ditto" copies if a specified threshold is exceeded - where the dittos might point to one of the already existing extra blocks instead of freeing it. What's more: if the offline DDT were modelled (or implemented) like scrubbing, it could be stopped at any point in progress and the continued (or redone from start - but with some blocks already deduped) and have a cumulative effect between invokations, and this would be acceptable for users with "bursty" writes, i.e. storing documents on a filer during their work-day. That is, you could schedule offline-dedup to run say between 0am and 6am, and by the time workers come to office some of their storage's disk space may be recovered and the system is fast and responsive. The next night it continues and maybe recovers some more space... Also if the offline-dedup would be throttled like the scrubs can be throttled now, it could continuously run in the background. Perhaps with ARC/L2ARC cache large enough, it wouldn't even be a huge real-time performance degrader like it is now. I can stand by Ed's findings that enabled dedup slows down write speeds on my system approximately 10x as compared to writes into non-deduped datasets, however lots of time is spent by CPU in kernel calls (close to 50% on a dual-core) and pretty much in disk IOs. At the moment my test system is down, so I can't quote specific numbers, but as I remember there were about 2-3Mb/s writes to each of my 6 disks in raidz2 while the end-user throughput (according to rsync) was 1.8-2Mb/s overall. Writes to datasets without dedup could sustain 20-40Mb/s at least. HTH, //Jim
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss