On Dec 29, 2009, at 12:34 AM, Brent Jones wrote:
On Sun, Dec 27, 2009 at 1:35 PM, Brent Jones <br...@servuhome.net> wrote:
On Sun, Dec 27, 2009 at 12:55 AM, Stephan Budach <stephan.bud...@jvm.de > wrote:
Brent,

I had known about that bug a couple of weeks ago, but that bug has been files against v111 and we're at v130. I have also seached the ZFS part of this forum and really couldn't find much about this issue.

The other issue I noticed is that, as opposed to the statements I read, that once zfs is underway destroying a big dataset, other operations would continue to work, but that doesen't seem to be the case. When destroying the 3 TB dataset, the other zvol that had been exported via iSCSI stalled as well and that's really bad.

Cheers,
budy
--
This message posted from opensolaris.org
_______________________________________________
opensolaris-help mailing list
opensolaris-h...@opensolaris.org


I just tested your claim, and you appear to be correct.

I created a couple dummy ZFS filesystems, loaded them with about 2TB,
exported them via CIFS, and destroyed one of them.
The destroy took the usual amount of time (about 2 hours), and
actually, quite to my surprise, all I/O on the ENTIRE zpool stalled.
I dont recall seeing this prior to 130, in fact, I know I would have
noticed this, as we create and destroy large ZFS filesystems very
frequently.

So it seems the original issue I reported many months back has
actually gained some new negative impacts  :(

I'll try to escalate this with my Sun support contract, but Sun
support still isn't very familiar/clued in about OpenSolaris, so I
doubt I will get very far.

Cross posting to ZFS-discuss also, as other may have seen this and
know of a solution/workaround.



--
Brent Jones
br...@servuhome.net


I did some more testing, and it seems this is 100% reproducible ONLY
if the file system and/or entire pool had compression or de-dupe
enabled at one point.
It doesn't seem to matter if de-dupe/compression was enabled for 5
minutes, or the entire life of the pool, as soon as either are turned
on in snv_130, doing any type of mass change (like deleting a big file
system) will hang ALL I/O for a significant amount of time.

I don't believe compression matters.  But dedup can really make a big
difference.  When you enable dedup, the deduplication table (DDT) is
created to keep track of the references to blocks. When you remove a
file, the reference counter needs to be decremented for each block
in the file. When a DDT entry has a reference count of zero, the block
can be freed. When you destroy a file system (or dataset) which has
dedup enabled, then all of the blocks written since dedup was enabled
will need to have their reference counters decremented. This workload
looks like a small, random read followed by a small write. With luck, the
small, random read will already be loaded in the ARC, but you can't
escape the small write (though they should be coalesced).

Bottom line, rm or destroy of deduplicated files or datasets will create
a flurry of small, random I/O to the pool. If you use devices in the pool which are not optimized for lots of small, random I/O, then this activity
will take a long time.

...which brings up a few interesting questions;

Does it make sense to remove deduplicated files?

How do we schedule automatic snapshot removal?

I filed an RFE on a method to address this problem.  I'll pass along
the CR if or when it is assigned.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to