Jeremy Kitchen wrote:

On Nov 2, 2009, at 9:07 AM, Victor Latushkin wrote:

Enda O'Connor wrote:
it works at a pool wide level with the ability to exclude at a dataset level, or the converse, if set to off at top level dataset can then set lower level datasets to on, ie one can include and exclude depending on the datasets contents.
so largefile will get deduped in the example below.

And you can use 'zdb -S' (which is a lot better now than it used to be before dedup) to see how much benefit is there (without even turning dedup on):

forgive my ignorance, but what's the advantage of this new dedup over the existing compression option? Wouldn't full-filesystem compression naturally de-dupe?

See this for example:

Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1     625K    9.9G   7.90G   7.90G     625K    9.9G   7.90G   7.90G
     2     9.8K    184M    132M    132M    20.7K    386M    277M    277M

Allocated means what is actually allocated on disk, referenced - what would be allocated on disk without deduplication; then LSIZE denotes logical size, PSIZE denotes physical size after compression.

Row with reference count of 1 shows the same figures both in "allocated" and "referenced" and this is expected - there only one reference to a block.

But row with reference count of 2 shows good difference - without deduplication it is 20.7 thousands blocks on disk with logical size totalling to 386M and physical size after compression 277M. With deduplication there would be only 9.8 thousands blocks on disk (dedup factor of over 2x!), with logical size totalling to 184M and physical size of 132M.

So with compression without deduplication it is 277M on disk, with deduplication it would be only 132M - good savings!

Hope this helps,
victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to