On Sat, Jan 08, 2011 at 12:59:17PM -0500, Edward Ned Harvey wrote:
> Has anybody measured the cost of enabling or disabling verification?

Of course there is no easy answer:)

Let me explain how verification works exactly first.

You try to write a block. You see that block is already in dedup table
(it is already referenced). You read the block (maybe it is in ARC or in
L2ARC). You compare read block with what you want to write.

Based on the above:
1. If you have dedup on, but your blocks are not deduplicable at all,
   you will pay no price for verification, as there will be no need to
   compare anything.
2. If your data is highly deduplicable you will verify often. Now it
   depends if the data you need to read fits into your ARC/L2ARC or not.
   If it can be found in ARC, the impact will be small.
   If your pool is very large and you can't count on ARC help, each
   write will be turned into a read.

Also note an interesting property of dedup: if your data is highly
deduplicable you can actually improve performance by avoiding data
writes (and just increasing reference count).
Let me show you three degenerated tests to compare options.
I'm writing 64GB of zeros to a pool with dedup turned off, with dedup turned on
and with dedup+verification turned on (I use SHA256 checksum everywhere):

        # zpool create -O checksum=sha256 tank ada{0,1,2,3}
        # time sh -c 'dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; 
zpool export tank'
        254,11 real         0,07 user        40,80 sys

        # zpool create -O checksum=sha256 -O dedup=on tank ada{0,1,2,3}
        # time sh -c 'dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; 
zpool export tank'
        154,60 real         0,05 user        37,10 sys

        # zpool create -O checksum=sha256 -O dedup=sha256,verify tank 
ada{0,1,2,3}
        # time sh -c 'dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; 
zpool export tank'
        173,43 real         0,02 user        38,41 sys

As you can see in second and third test the data is of course in ARC, so the
difference here is only because of data comparison (no extra reads are needed)
and verification is 12% slower.

This is of course silly test, but as you can see dedup (even with verification)
is much faster than nodedup case, but this data is highly deduplicable:)

        # zpool list
        NAME   SIZE  ALLOC   FREE    CAP  DEDUP       HEALTH  ALTROOT
        tank   149G  8,58M   149G     0%  524288.00x  ONLINE  -

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
p...@freebsd.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

Attachment: pgp3iTC1h5dwE.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to