I found something similar happening when writing over NFS (at significantly lower throughput than available on the system directly), specifically that effectively all data, even asynchronous writes, were being written to the ZIL, which I eventually traced (with help from Richard Elling and others on this list) at least partially to the linux NFS client issuing commit requests before ZFS wanted to write the asynchronous data to a txg. I tried fiddling with zfs_write_limit_override to get more data onto normal vdevs faster, but this reduced performance (perhaps setting a tunable to make ZFS not throttle writes while hitting the write limit could fix that), and didn't cause it to go significantly easier on the ZIL devices. I decided to live with the default behavior, since my main bottleneck is ethernet anyway, and the projected lifespan of the ZIL devices was fairly large due to our workload.
I did find that setting logbias=throughput on a zfs filesystem caused it to act as though the ZIL devices weren't there, which actually reduced commit times under continuous streaming writes (mostly due to having more throughput for the same amount of data to commit, in large chunks, but the zilstat script also reported less writing to the ZIL blocks (which are allocated from normal vdevs without a ZIL device, or with logbias=throughput) under this condition, so perhaps there is more to the story), so if you have different workloads for different datasets, this could help (since it isn't a poolwide setting). Obviously, small synchronous writes to that zfs filesystem will take a large hit from this setting. It would be nice if there was a feature in ZFS that could direct small commits to ZIL blocks on log devices, but behave like logbias=throughput for large commits. It would probably need manual tuning, but it would treat SSD log devices more gently, and increase performance for large contiguous writes. If you can't configure ZFS to write less data to the ZIL, I think a RAM based ZIL device would be a good way to get throughput up higher (and less worries about flash endurance, etc). Tim On Wed, Oct 3, 2012 at 1:28 PM, Schweiss, Chip <c...@innovates.com> wrote: > I'm in the planing stages of a rather larger ZFS system to house > approximately 1 PB of data. > > I have only one system with SSDs for L2ARC and ZIL, The ZIL seems to be > the bottle neck for large bursts of data being written. I can't confirm > this for sure, but the when throwing enough data at my storage pool and the > write latency starts rising, the ZIL write speed hangs close the max > sustained throughput I've measured on the SSD (~200 MB/s). > > The pool when empty w/o L2ARC or ZIL it was tested with Bonnie++ and > showed ~1300MB/s serial read and ~800MB/s serial write speed. > > How can I determine for sure that my ZIL is my bottleneck? If it is the > bottleneck, is it possible to keep adding mirrored pairs of SSDs to the ZIL > to make it faster? Or should I be looking for a DDR drive, ZeusRAM, etc. > > Thanks for any input, > -Chip > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss