On 5/6/2011 5:46 PM, Richard Elling wrote:
On May 6, 2011, at 3:24 AM, Erik Trimble<erik.trim...@oracle.com>  wrote:

Casper and Richard are correct - RAM starvation seriously impacts snapshot or 
dataset deletion when a pool has dedup enabled.  The reason behind this is that 
ZFS needs to scan the entire DDT to check to see if it can actually delete each 
block in the to-be-deleted snapshot/dataset, or if it just needs to update the 
dedup reference count.
AIUI, the issue is not the the DDT is scanned, it is an AVL tree for a reason. The issue 
is that each reference update means that one, small bit of data is changed. If the 
reference is not already in ARC, then a small, probably random read is needed. If you 
have a typical consumer disk, especially a "green" disk, and have not tuned 
zfs_vdev_max_pending, then that itty bitty read can easily take more than 100 
milliseconds(!) Consider that you can have thousands or millions of reference updates to 
do during a zfs destroy, and the math gets ugly. This is why fast SSDs make good dedup 
candidates.
Just out of curiosity - I'm assuming that a delete works like this:

    (1) find list of blocks associated with file to be deleted
    (2) using the DDT, find out if any other files are using those blocks
(3) delete/update any metadata associated with the file (dirents, ACLs, etc.)
    (4) for each block in the file
(4a) if the DDT indicates there ARE other files using this block, update the DDT entry to change the refcount (4b) if the DDT indicates there AREN'T any other files, move the physical block to the free list, and delete the DDT entry


In a bulk delete scenario (not just snapshot deletion), I'd presume #1 above almost always causes a Random I/O request to disk, as all the relevant metadata for every (to be deleted) file is unlikely to be stored in ARC. If you can't fit the DDT in ARC/L2ARC, #2 above would require you to pull in the remainder of the DDT info from disk, right? #3 and #4 can be batched up, so they don't hurt that much.

Is that a (roughly) correct deletion methodology? Or can someone give a more accurate view of what's actually going on?



If it can't store the entire DDT in either the ARC or L2ARC, it will be forced 
to do considerable I/O to disk, as it brings in the appropriate DDT entry.   
Worst case for insufficient ARC/L2ARC space can increase deletion times by many 
orders of magnitude. E.g. days, weeks, or even months to do a deletion.
I've never seen months, but I have seen days, especially for low-perf disks.
I've seen an estimate of 5 weeks for removing a snapshot on a 1TB dedup pool made up of 1 disk.

Not an optimal set up.

:-)

If dedup isn't enabled, snapshot and data deletion is very light on RAM 
requirements, and generally won't need to do much (if any) disk I/O.  Such 
deletion should take milliseconds to a minute or so.
Yes, perhaps a bit longer for recursive destruction, but everyone here knows 
recursion is evil, right? :-)
  -- richard
You, my friend, have obviously never worshipped at the Temple of the Lamba Calculus, nor been exposed to the Holy Writ that is "Structure and Interpretation of Computer Programs" (http://mitpress.mit.edu/sicp/full-text/book/book.html).

I sentence you to a semester of 6.001 problem sets, written by Prof Sussman sometime in the 1980s.

(yes, I went to MIT.)

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to