On 10/04/12 05:30, Schweiss, Chip wrote:
Thanks for all the input.  It seems information on the performance of the ZIL is sparse and scattered.   I've spent significant time researching this the past day.  I'll summarize what I've found.   Please correct me if I'm wrong.
  • The ZIL can have any number of SSDs attached either mirror or individually.   ZFS will stripe across these in a raid0 or raid10 fashion depending on how you configure.

The ZIL code chains blocks together and these are allocated round robin among slogs or
if they don't exist then the main pool devices.

  • To determine the true maximum streaming performance of the ZIL setting sync=disabled will only use the in RAM ZIL.   This gives up power protection to synchronous writes.

There is no RAM ZIL. If sync=disabled then all writes are asynchronous and are written
as part of the periodic ZFS transaction group (txg) commit that occurs every 5 seconds.

  • Many SSDs do not help protect against power failure because they have their own ram cache for writes.  This effectively makes the SSD useless for this purpose and potentially introduces a false sense of security.  (These SSDs are fine for L2ARC)

The ZIL code issues a write cache flush to all devices it has written before returning
from the system call. I've heard, that not all devices obey the flush but we consider them
as broken hardware. I don't have a list to avoid.


  • Mirroring SSDs is only helpful if one SSD fails at the time of a power failure.  This leave several unanswered questions.  How good is ZFS at detecting that an SSD is no longer a reliable write target?   The chance of silent data corruption is well documented about spinning disks.  What chance of data corruption does this introduce with up to 10 seconds of data written on SSD.  Does ZFS read the ZIL during a scrub to determine if our SSD is returning what we write to it?

If the ZIL code gets a block write failure it will force the txg to commit before returning.
It will depend on the drivers and IO subsystem as to how hard it tries to write the block.


  • Zpool versions 19 and higher should be able to survive a ZIL failure only loosing the uncommitted data.   However, I haven't seen good enough information that I would necessarily trust this yet.

This has been available for quite a while and I haven't heard of any bugs in this area.

  • Several threads seem to suggest a ZIL throughput limit of 1Gb/s with SSDs.   I'm not sure if that is current, but I can't find any reports of better performance.   I would suspect that DDR drive or Zeus RAM as ZIL would push past this.

1GB/s seems very high, but I don't have any numbers to share.

  •   

Anyone care to post their performance numbers on current hardware with E5 processors, and ram based ZIL solutions?  

Thanks to everyone who has responded and contacted me directly on this issue.

-Chip

On Thu, Oct 4, 2012 at 3:03 AM, Andrew Gabriel <andrew.gabr...@cucumber.demon.co.uk> wrote:
Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote:
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Schweiss, Chip

How can I determine for sure that my ZIL is my bottleneck?  If it is the
bottleneck, is it possible to keep adding mirrored pairs of SSDs to the ZIL to
make it faster?  Or should I be looking for a DDR drive, ZeusRAM, etc.

Temporarily set sync=disabled
Or, depending on your application, leave it that way permanently.  I know, for the work I do, most systems I support at most locations have sync=disabled.  It all depends on the workload.

Noting of course that this means that in the case of an unexpected system outage or loss of connectivity to the disks, synchronous writes since the last txg commit will be lost, even though the applications will believe they are secured to disk. (ZFS filesystem won't be corrupted, but it will look like it's been wound back by up to 30 seconds when you reboot.)

This is fine for some workloads, such as those where you would start again with fresh data and those which can look closely at the data to see how far they got before being rudely interrupted, but not for those which rely on the Posix semantics of synchronous writes/syncs meaning data is secured on non-volatile storage when the function returns.

--
Andrew



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  • [zfs-discuss] Mak... Schweiss, Chip
    • Re: [zfs-dis... Timothy Coalson
    • Re: [zfs-dis... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
      • Re: [zfs... Andrew Gabriel
        • Re: ... Schweiss, Chip
          • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
          • ... Neil Perrin
            • ... Richard Elling
              • ... Schweiss, Chip
                • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
                • ... Richard Elling
            • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
              • ... Neil Perrin
                • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
        • Re: ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

Reply via email to