Re: [zfs-discuss] Dedup performance hit

Erik Trimble Sun, 13 Jun 2010 22:19:19 -0700

Hernan F wrote:

Hello, I tried enabling dedup on a filesystem, and moved files into it to take 
advantage of it. I had about 700GB of files and left it for some hours. When I 
returned, only 70GB were moved.
I checked zpool iostat, and it showed about 8MB/s R/W performance (the old and 
new zfs filesystems are in the same pool). So I disabled dedup for a few 
seconds and instantly the performance jumped to 80MB/s
It's Athlon64 x2 machine with 4GB RAM, it's only a fileserver (4x1TB SATA for ZFS). arcstat.pl shows 2G for arcsz, top shows 13% CPU during the 8MB/s transfers.Is this normal behavior? Should I always expect such low performance, or is there anything wrong with my setup?
Thanks in advance,
Hernan

You are severely RAM limited. In order to do dedup, ZFS has to maintaina catalog of every single block it writes and the checksum for thatblock. This is called the Dedup Table (DDT for short).So, during the copy, ZFS has to (a) read a block from the oldfilesystem, (b) check the current DDT to see if that block exists and(c) either write the block to the new filesytem (and add an appropriateDDT entry for it), or write a metadata update with the dedup referenceblock reference.


Likely, you have two problems:

(1) I suspect your source filesystem has lots of blocks (that is, it'slikely made up smaller-sized files). Lots of blocks means lots ofseeking back and forth to read all those blocks.

(2) Lots of blocks also means lots of entries in the DDT. It's trivialto overwhelm a 4GB system with a large DDT. If the DDT can't fit inRAM, then it has to get partially refreshed from disk.


Thus, here's what's likely going on:

(1)  ZFS reads a block and it's checksum from the old filesystem
(2)  it checks the DDT to see if that checksum exists

(3) finding that the entire DDT isn't resident in RAM, it starts a cycleto read the rest of the (potential) entries from the new filesystems'metadata. That is, it tries to reconstruct the DDT from disk. Whichinvolves a HUGE amount of random seek reads on the new filesystem.

In essence, since you likely can't fit the DDT in RAM, each block readfrom the old filesystem forces a flurry of reads from the newfilesystem. Which eats up the IOPS that your single pool can provide.It thrashes the disks. Your solution is to either buy more RAM, or findsomething you can use as an L2ARC cache device for your pool. Ideally,it would be an SSD. However, in this case, a plain hard drive would doOK (NOT one already in a pool). To add such a device, you would do:'zpool add tank mycachedevice'





--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedup performance hit

Reply via email to