> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > So here's what I'm going to do. With arc_meta_limit at 7680M, of which > 100M > was consumed "naturally," that leaves me 7580 to play with. Call it 7500M. > Divide by 412 bytes, it means I'll hit a brick wall when I reach a little > over 19M blocks. Which means if I set my recordsize to 32K, I'll hit that > limit around 582G disk space consumed. That is my hypothesis, and now > beginning the test.
Well, this is interesting. With 7580MB theoretically available for DDT in ARC, the expectation was that 19M DDT entries would finally max out the ARC and then I'd jump off a performance cliff and start seeing a bunch of pool reads killing my write performance. In reality, what I saw was: * Up to a million blocks, the performance difference with/without dedup was basically negligible. Write time with dedup = 1x write time without dedup. * After a million, the dedup write time consistently reached 2x longer than the native write time. This happened when my ARC became full of user data (not meta data) * As the # of unique blocks in pool increased, gradually, the dedup write time deviated from the non-dedup write time. 2x, 3x, 4x. I got a consistent 4x longer write time with dedup enabled, after the pool reached 22.5M blocks. * And then it jumped off a cliff. When I got to 24M blocks, it was the last datapoint able to be collected. 28x slower write with dedup (4966 sec to write 3G, as compared to 178sec), and for the first time, a nonzero rm time. All the way up till now, even with dedup, the rm time was zero. But now it was 72sec. * I waited another 6 hours, and never got another data point. So I found the limit where the pool becomes unusably slow. At a cursory look, you might say this supported the hypothesis. You might say "24M compared to 19M, that's not too far off. This could be accounted for by using the 376byte size of ddt_entry_t, instead of the 412byte size apparently measured... This would adjust the hypothesis to 21.1M blocks." But I don't think that's quite fair. Because my arc_meta_used never got above 5,159. And I never saw the massive read overload that was predicted to be the cause of failure. In fact, starting from 0.4M to 0.5M blocks (early, early, early on) from that point onward, I always had 40-50 reads for every 250 writes. Right to the bitter end. And my arc is full of user data, not meta data. So the conclusions I'm drawing are: (1) If you don't tweak arc_meta_limit, and you want to enable dedup, you're toast. But if you do tweak arc_meta_limit, you might reasonably expect dedup to perform 3x to 4x slower on unique data... And based on results that I haven't talked about yet here, dedup performs 3x to 4x faster on duplicate data. So if you have 50% or higher duplicate data (dedup ratio 2x or higher) and you have plenty of memory and tweak it, then your performance with dedup could be comparable, or even faster than running without dedup. Of course, depending on your data patterns and usage patterns. YMMV. (2) The above is pretty much the best you can do, if your server is going to be a "normal" server, handling both reads & writes. Because the data and the meta_data are both stored in the ARC, the data has a tendency to push the meta_data out. But in a special use case - Suppose you only care about write performance and saving disk space. For example, suppose you're the destination server of a backup policy. You only do writes, so you don't care about keeping data in cache. You want to enable dedup to save cost on backup disks. You only care about keeping meta_data in ARC. If you set primarycache=metadata .... I'll go test this now. The hypothesis is that my arc_meta_used should actually climb up to the arc_meta_limit before I start hitting any disk reads, so my write performance with/without dedup should be pretty much equal up to that point. I'm sacrificing the potential read benefit of caching data in ARC, in order to hopefully gain write performance - So write performance can be just as good with dedup enabled or disabled. In fact, if there's much duplicate data, the dedup write performance in this case should be significantly better than without dedup. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss