Re: [zfs-discuss] questions about the DDT and other things

Erik Trimble Thu, 01 Dec 2011 17:56:37 -0800

On 12/1/2011 4:59 PM, Ragnar Sundblad wrote:

I am sorry if these are dumb questions. If there are explanations
available somewhere for those questions that I just haven't found, please
let me know! :-)


1. It has been said that when the DDT entries, some 376 bytes or so, are
rolled out on L2ARC, there still is some 170 bytes in the ARC to reference
them (or rather the ZAP objects I believe). In some places it sounds like
  those 170 bytes refers to ZAP objects that contain several DDT entries.
In other cases it sounds like for each DDT entry in the L2ARC there must
be one 170 byte reference in the ARC. What is the story here really?

Yup. Each entry (not just a DDT entry, but any cached reference) in theL2ARC requires a pointer record in the ARC, so the DDT entries held inL2ARC also consume ARC space. It's a bad situation.

2. Deletion with dedup enabled is a lot heavier for some reason that I don't
understand. It is said that the DDT entries have to be updated for each
deleted reference to that block. Since zfs already have a mechanism for sharing
blocks (for example with snapshots), I don't understand why the DDT has to
contain any more block references at all, or why deletion should be much harder
just because there are checksums (DDT entries) tied to those blocks, and even
if they have to, why it would be much harder than the other block reference
mechanism. If anyone could explain this (or give me a pointer to an
explanation), I'd be very happy!

Remember that, when using Dedup, each block can potentially be part of avery large number of files. So, when you delete a file, you have to golook at the DDT entry FOR EACH BLOCK IN THAT FILE, and make theappropriate DDT updates. It's essentially the same problem that erasingsnapshots has - for each block you delete, you have to find and updatethe metadata for all the other files that share that block usage. Dedupand snapshot deletion share the same problem, it's just usually worsefor dedup, since there's a much larger number of blocks that have to beupdated.

The problem is that you really need to have the entire DDT in some formof high-speed random-access memory in order for things to be efficient.If you have to search the entire hard drive to get the proper DDT entryevery time you delete a block, then your IOPs limits are going to gethammered hard.

3. I, as many others, would of course like to be able to have very large
datasets deduped without having to have enormous amounts of RAM.
Since the DDT is a AVL tree, couldn't just that entire tree be cached on
for example a SSD and be searched there without necessarily having to store
anything of it in RAM? That would probably require some changes to the DDT
lookup code, and some mechanism to gather the tree to be able to lift it
over to the SSD cache, and some other stuff, but still that sounds - with
my very basic (non-)understanding of zfs - like a not to overwhelming change.

L2ARC typically sits on an SSD, and the DDT is usually held there, ifthe L2ARC device exists. There does need to be serious work on changinghow the DDT in the L2ARC is referenced, however; the ARC memoryrequirements for DDT-in-L2ARC definitely need to be removed (whichrequires a non-trivial rearchitecting of dedup). There are some otherchanges that have to happen for Dedup to be really usable.Unfortunately, I can't see anyone around willing to do those changes,and my understanding of the code says that it is much more likely thatwe will simply remove and replace the entire dedup feature rather thantrying to fix the existing design.

4. Now and then people mention that the problem with bp_rewrite has been
explained, on this very mailing list I believe, but I haven't found that
explanation. Could someone please give me a pointer to that description
(or perhaps explain it again :-) )?

Thanks for any enlightenment!

/ragge

bp_rewrite is a feature which stands for the (as yet unimplemented)system call of the same name, which does Block Pointer re-writing. Thatis, it would allow ZFS to change the physical location on media of anexisting ZFS data slab. That is, bp_rewrite is necessary to allow ZFS tochange the Physical layout of data on media, without changing theConceptual arrangement of such data.

It's been the #1 most-wanted feature of ZFS since I can remember,probably for 10 years now.


-Erik

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] questions about the DDT and other things

Reply via email to