On Sat, Jan 08, 2011 at 12:59:17PM -0500, Edward Ned Harvey wrote: > Has anybody measured the cost of enabling or disabling verification?
Of course there is no easy answer:) Let me explain how verification works exactly first. You try to write a block. You see that block is already in dedup table (it is already referenced). You read the block (maybe it is in ARC or in L2ARC). You compare read block with what you want to write. Based on the above: 1. If you have dedup on, but your blocks are not deduplicable at all, you will pay no price for verification, as there will be no need to compare anything. 2. If your data is highly deduplicable you will verify often. Now it depends if the data you need to read fits into your ARC/L2ARC or not. If it can be found in ARC, the impact will be small. If your pool is very large and you can't count on ARC help, each write will be turned into a read. Also note an interesting property of dedup: if your data is highly deduplicable you can actually improve performance by avoiding data writes (and just increasing reference count). Let me show you three degenerated tests to compare options. I'm writing 64GB of zeros to a pool with dedup turned off, with dedup turned on and with dedup+verification turned on (I use SHA256 checksum everywhere): # zpool create -O checksum=sha256 tank ada{0,1,2,3} # time sh -c 'dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; zpool export tank' 254,11 real 0,07 user 40,80 sys # zpool create -O checksum=sha256 -O dedup=on tank ada{0,1,2,3} # time sh -c 'dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; zpool export tank' 154,60 real 0,05 user 37,10 sys # zpool create -O checksum=sha256 -O dedup=sha256,verify tank ada{0,1,2,3} # time sh -c 'dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; zpool export tank' 173,43 real 0,02 user 38,41 sys As you can see in second and third test the data is of course in ARC, so the difference here is only because of data comparison (no extra reads are needed) and verification is 12% slower. This is of course silly test, but as you can see dedup (even with verification) is much faster than nodedup case, but this data is highly deduplicable:) # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT tank 149G 8,58M 149G 0% 524288.00x ONLINE - -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am!
pgp3iTC1h5dwE.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss