On 5/6/2011 5:46 PM, Richard Elling wrote:
On May 6, 2011, at 3:24 AM, Erik Trimble<erik.trim...@oracle.com> wrote:
Casper and Richard are correct - RAM starvation seriously impacts snapshot or
dataset deletion when a pool has dedup enabled. The reason behind this is that
ZFS needs to scan the entire DDT to check to see if it can actually delete each
block in the to-be-deleted snapshot/dataset, or if it just needs to update the
dedup reference count.
AIUI, the issue is not the the DDT is scanned, it is an AVL tree for a reason. The issue
is that each reference update means that one, small bit of data is changed. If the
reference is not already in ARC, then a small, probably random read is needed. If you
have a typical consumer disk, especially a "green" disk, and have not tuned
zfs_vdev_max_pending, then that itty bitty read can easily take more than 100
milliseconds(!) Consider that you can have thousands or millions of reference updates to
do during a zfs destroy, and the math gets ugly. This is why fast SSDs make good dedup
candidates.
Just out of curiosity - I'm assuming that a delete works like this:
(1) find list of blocks associated with file to be deleted
(2) using the DDT, find out if any other files are using those blocks
(3) delete/update any metadata associated with the file (dirents,
ACLs, etc.)
(4) for each block in the file
(4a) if the DDT indicates there ARE other files using this
block, update the DDT entry to change the refcount
(4b) if the DDT indicates there AREN'T any other files, move
the physical block to the free list, and delete the DDT entry
In a bulk delete scenario (not just snapshot deletion), I'd presume #1
above almost always causes a Random I/O request to disk, as all the
relevant metadata for every (to be deleted) file is unlikely to be
stored in ARC. If you can't fit the DDT in ARC/L2ARC, #2 above would
require you to pull in the remainder of the DDT info from disk, right?
#3 and #4 can be batched up, so they don't hurt that much.
Is that a (roughly) correct deletion methodology? Or can someone give a
more accurate view of what's actually going on?
If it can't store the entire DDT in either the ARC or L2ARC, it will be forced
to do considerable I/O to disk, as it brings in the appropriate DDT entry.
Worst case for insufficient ARC/L2ARC space can increase deletion times by many
orders of magnitude. E.g. days, weeks, or even months to do a deletion.
I've never seen months, but I have seen days, especially for low-perf disks.
I've seen an estimate of 5 weeks for removing a snapshot on a 1TB dedup
pool made up of 1 disk.
Not an optimal set up.
:-)
If dedup isn't enabled, snapshot and data deletion is very light on RAM
requirements, and generally won't need to do much (if any) disk I/O. Such
deletion should take milliseconds to a minute or so.
Yes, perhaps a bit longer for recursive destruction, but everyone here knows
recursion is evil, right? :-)
-- richard
You, my friend, have obviously never worshipped at the Temple of the
Lamba Calculus, nor been exposed to the Holy Writ that is "Structure and
Interpretation of Computer Programs"
(http://mitpress.mit.edu/sicp/full-text/book/book.html).
I sentence you to a semester of 6.001 problem sets, written by Prof
Sussman sometime in the 1980s.
(yes, I went to MIT.)
--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss