Re: [zfs-discuss] Dedup memory overhead

Mertol Ozyoney Thu, 04 Feb 2010 02:00:33 -0800

Sorry fort he late answer. 

Approximately it's 150 bytes per individual block. So increasing the
blocksize is a good idea. 
Also when L1 and L2 arc is not enough system will start making disk IOPS and
RaidZ is not very effective for random IOPS and it's likely that when your
dram is not enough your perfor ance will suffer. 
You may choose to use Raid 10 which is a lot better on random loads
Mertol





Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +902123352222
Email mertol.ozyo...@sun.com



-----Original Message-----
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of erik.ableson
Sent: Thursday, January 21, 2010 6:05 PM
To: zfs-discuss
Subject: [zfs-discuss] Dedup memory overhead

Hi all,

I'm going to be trying out some tests using b130 for dedup on a server with
about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks).  What
I'm trying to get a handle on is how to estimate the memory overhead
required for dedup on that amount of storage.  From what I gather, the dedup
hash keys are held in ARC and L2ARC and as such are in competition for the
available memory.

So the question is how much memory or L2ARC would be necessary to ensure
that I'm never going back to disk to read out the hash keys. Better yet
would be some kind of algorithm for calculating the overhead. eg - averaged
block size of 4K = a hash key for every 4k stored and a hash occupies 256
bits. An associated question is then how does the ARC handle competition
between hash keys and regular ARC functions?

Based on these estimations, I think that I should be able to calculate the
following:
1,7     TB
1740,8  GB
1782579,2       MB
1825361100,8    KB
4       average block size
456340275,2     blocks
256     hash key size-bits
1,16823E+11     hash key overhead - bits
14602888806,4   hash key size-bytes
14260633,6      hash key size-KB
13926,4 hash key size-MB
13,6    hash key overhead-GB

Of course the big question on this will be the average block size - or
better yet - to be able to analyze an existing datastore to see just how
many blocks it uses and what is the current distribution of different block
sizes. I'm currently playing around with zdb with mixed success  on
extracting this kind of data. That's also a worst case scenario since it's
counting really small blocks and using 100% of available storage - highly
unlikely. 

# zdb -ddbb siovale/iphone
Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects

    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0,
flags 0x0

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         0    7    16K    16K  57.0K    64K   77.34  DMU dnode
         1    1    16K     1K  1.50K     1K  100.00  ZFS master node
         2    1    16K    512  1.50K    512  100.00  ZFS delete queue
         3    2    16K    16K  18.0K    32K  100.00  ZFS directory
         4    3    16K   128K   408M   408M  100.00  ZFS plain file
         5    1    16K    16K  3.00K    16K  100.00  FUID table
         6    1    16K     4K  4.50K     4K  100.00  ZFS plain file
         7    1    16K  6.50K  6.50K  6.50K  100.00  ZFS plain file
         8    3    16K   128K   952M   952M  100.00  ZFS plain file
         9    3    16K   128K   912M   912M  100.00  ZFS plain file
        10    3    16K   128K   695M   695M  100.00  ZFS plain file
        11    3    16K   128K   914M   914M  100.00  ZFS plain file
 
Now, if I'm understanding this output properly, object 4 is composed of
128KB blocks with a total size of 408MB, meaning that it uses 3264 blocks.
Can someone confirm (or correct) that assumption? Also, I note that each
object  (as far as my limited testing has shown) has a single block size
with no internal variation.

Interestingly, all of my zvols seem to use fixed size blocks - that is,
there is no variation in the block sizes - they're all the size defined on
creation with no dynamic block sizes being used. I previously thought that
the -b option set the maximum size, rather than fixing all blocks.  Learned
something today :-)

# zdb -ddbb siovale/testvol
Dataset siovale/testvol [ZVOL], ID 45, cr_txg 4717890, 23.9K, 2 objects

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         0    7    16K    16K  21.0K    16K    6.25  DMU dnode
         1    1    16K    64K      0    64K    0.00  zvol object
         2    1    16K    512  1.50K    512  100.00  zvol prop

# zdb -ddbb siovale/tm-media
Dataset siovale/tm-media [ZVOL], ID 706, cr_txg 4426997, 240G, 2 objects

    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0,
flags 0x0

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         0    7    16K    16K  21.0K    16K    6.25  DMU dnode
         1    5    16K     8K   240G   250G   97.33  zvol object
         2    1    16K    512  1.50K    512  100.00  zvol prop

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedup memory overhead

Reply via email to