Dan> ... It would still need a complex bp_rewrite.

Are you certain about that?
 
For example, scrubbing/resilvering and fixing corrupt blocks with
non-matching checksums is a post-processing operation which
works on an existing pool and rewrites some blocks if needed.
And it works without a bp_rewrite in place...
 
Basically, you'd need to ensure that a single TXG would include 
updates to the DDT entry for found unique blocks, and freeing of 
extra blocks with same data (checksum), and creation of "ditto" 
copies if a specified threshold is exceeded - where the dittos might 
point to one of the already existing extra blocks instead of freeing it.
 
What's more: if the offline DDT were modelled (or implemented)
like scrubbing, it could be stopped at any point in progress and
the continued (or redone from start - but with some blocks already
deduped) and have a cumulative effect between invokations, 
and this would be acceptable for users with "bursty" writes,
i.e. storing documents on a filer during their work-day.
 
That is, you could schedule offline-dedup to run say between
0am and 6am, and by the time workers come to office some of
their storage's disk space may be recovered and the system is
fast and responsive. The next night it continues and maybe 
recovers some more space...
 
Also if the offline-dedup would be throttled like the scrubs can
be throttled now, it could continuously run in the background.
Perhaps with ARC/L2ARC cache large enough, it wouldn't
even be a huge real-time performance degrader like it is now.
 
I can stand by Ed's findings that enabled dedup slows down
write speeds on my system approximately 10x as compared
to writes into non-deduped datasets, however lots of time is
spent by CPU in kernel calls (close to 50% on a dual-core)
and pretty much in disk IOs. At the moment my test system 
is down, so I can't quote specific numbers, but as I remember
there were about 2-3Mb/s writes to each of my 6 disks in
raidz2 while the end-user throughput (according to rsync)
was 1.8-2Mb/s overall. Writes to datasets without dedup
could sustain 20-40Mb/s at least.
 
HTH,
//Jim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to