On Thu, May 26, 2011 at 10:25:04AM -0400, Edward Ned Harvey wrote:
> (2) Now, in a pool with 2.4M unique blocks and dedup enabled (no verify), a
> test file requires 10m38s to write and 2m54s to delete, but with dedup
> disabled it only requires 0m40s to write and 0m13s to delete exactly the
> same file.  So ... 13x performance degradation.  
> 
> zpool iostat is indicating the disks are fully utilized doing writes.  No
> reads.  During this time, it is clear the only bottleneck is write iops.
> There is still oodles of free mem.  I am not near arc_meta_limit, nor c_max.
> The cpu is 99% idle.  It is write iops limited.  Period.

Ok.

> Assuming DDT maintenance is the only disk write overhead that dedup adds, I
> can only conclude that with dedup enabled, and a couple million unique
> blocks in the pool, the DDT must require substantial maintenance.  In my
> case, something like 12 DDT writes for every 1 actual intended new unique
> file block write.

Where did number come from?  Are there actually 13x as many IOs, or is
that just extrapolated from elapsed time?  It won't be anything like a
linear extrapolation, especially if the heads are thrashing.

Note that DDT blocks have their own allocation metadata to be updated
as well.

Try to get a number for actual total IOs and scaling factor.

> For the heck of it, since this machine has no other purpose at the present
> time, I plan to do two more tests.  And I'm open to suggestions if anyone
> can think of anything else useful to measure: 
> 
> (1) I'm currently using a recordsize of 512b, because the intended purpose
> of this test has been to rapidly generate a high number of new unique
> blocks.  Now just to eliminate the possibility that I'm shooting myself in
> the foot by systematically generating a worst case scenario, I'll try to
> systematically generate a best-case scenario.  I'll push the recordsize back
> up to 128k, and then repeat this test something slightly smaller than 128k.
> Say, 120k. That way there should be plenty of room available for any write
> aggregation the system may be trying to perform.
> 
> (2) For the heck of it, why not.  Disable ZIL and confirm that nothing
> changes.  (Understanding so far is that all these writes are async, and
> therefore ZIL should not be a factor.  Nice to confirm this belief.)

Good tests. See how the IO expansion factor changes with block size.

(3) Experiment with the maximum number of allowed outstanding current
io's per disk (I forget the specific tunable OTTOMH).  If the load
really is ~100% async write, this might well be a case where raising
that figure lets the disk firmware maximise throughput without causing
the latency impact that can happen otherwise (and leads to
recommendations to shorten the limit in general cases).

(4) See if changing the txg sync interval to (much) longer
helps. Multiple DDT entries can live in the same block, and a longer
interval may allow coalescing of these writes.

--
Dan.

Attachment: pgppptUIsk3CQ.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to