On Thu, Sep 24, 2009 at 11:29 PM, James Lever <j...@jamver.id.au> wrote:
>
> On 25/09/2009, at 11:49 AM, Bob Friesenhahn wrote:
>
> The commentary says that normally the COMMIT operations occur during
> close(2) or fsync(2) system call, or when encountering memory pressure.  If
> the problem is slow copying of many small files, this COMMIT approach does
> not help very much since very little data is sent per file and most time is
> spent creating directories and files.
>
> The problem appears to be slog bandwidth exhaustion due to all data being
> sent via the slog creating a contention for all following NFS or locally
> synchronous writes.  The NFS writes do not appear to be synchronous in
> nature - there is only a COMMIT being issued at the very end, however, all
> of that data appears to be going via the slog and it appears to be inflating
> to twice its original size.
> For a test, I just copied a relatively small file (8.4MB in size).  Looking
> at a tcpdump analysis using wireshark, there is a SETATTR which ends with a
> V3 COMMIT and no COMMIT messages during the transfer.
> iostat output that matches looks like this:
> slog write of the data (17MB appears to hit the slog)
[snip]
> then a few seconds later, the transaction group gets flushed to primary
> storage writing nearly 11.4MB which is inline with raid Z2 (expect around
> 10.5MB; 8.4/8*10):
[snip]
> So I performed the same test with a much larger file (533MB) to see what it
> would do, being larger than the NVRAM cache in front of the SSD.  Note that
> after the second second of activity the NVRAM is full and only allowing in
> about the sequential write speed of the SSD (~70MB/s).
[snip]
> Again, the slog wrote about double the file size (1022.6MB) and a few
> seconds later, the data was pushed to the primary storage (684.9MB with an
> expectation of 666MB = 533MB/8*10) so again about the right number hit the
> spinning platters.
[snip]
> Can anybody explain what is going on with the slog device in that all data
> is being shunted via it and why about double the data size is being written
> to it per transaction?

By any chance do you have copies=2 set?

That will make 2 transactions of 1.

Also, try setting zfs_write_limit_override equal to the size of the
NVRAM cache (or half depending on how long it takes to flush):

echo zfs_write_limit_override/W0t268435456 | mdb -kw

Set the PERC flush interval to say 1 second.

As a side an slog device will not be too beneficial for large
sequential writes, because it will be throughput bound not latency
bound. slog devices really help when you have lots of small sync
writes. A RAIDZ2 with the ZIL spread across it will provide much
higher throughput then an SSD. An example of a workload that benefits
from an slog device is ESX over NFS, which does a COMMIT for each
block written, so it benefits from an slog, but a standard media
server will not (but an L2ARC would be beneficial).

Better workload analysis is really what it is about.

-Ross
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to