Re: [zfs-discuss] Adding ZIL to pool questions

Edward Ned Harvey Sun, 01 Aug 2010 15:44:18 -0700

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Gregory Gee
> 
> Edward, disabling ZIL might be ok, but let me characterize what my home
> server does and tell me if disabling ZIL is ok.


You should understand what it all means, and make your own choice.

For "sync" writes, an application tells the OS to write something to disk,
and the function call blocks (waits) until the data has been committed to
nonvolatile storage.

For "async" writes, an applicaiton tells the OS to write something to disk,
and the OS is permitted to buffer the write in RAM.  The application
continues doing other things, even if the data is not yet committed to
nonvolatile storage.

In ZFS, many async transactions can be aggregated into a single transaction
group.  ZFS chooses when to flush the TXG to disk based on many factors,
optimized for performance, but never longer than 30 sec.

In ZFS, sync transactions are first written to the ZIL, so the OS unblocks
the application, and then they become async transactions just like all the
other async transactions.  After an ungraceful crash, the OS checks the ZIL
to see if anything was requested to be written but not actually written.  Of
course, if any unplayed entries exist, they are played now, before the
filesystem is mounted.

In other filesystems and operating systems, it's critical to honor the
"sync" mode behavior, because transactions such as file creation and removal
are sync operations.  So in other systems, not honoring the "sync" behavior
could result in a corrupt filesystem, or corrupt data where a later write
was committed to disk before an earlier write.  ZFS is immune to those
problems.  Because ZFS keeps an in-memory snapshot of what the filesystem
looks like as a whole, and ZFS only commits to disk a newer snapshot of the
filesystem (doesn't commit individual file-based operations such as other
filesystems and OSes), and because the committal of a new TXG is an atomic
operation...  It is impossible to ever bootup and discover ZFS to be in a
corrupt or inconsistent state.

During a crash, up to 30 sec of async writes are at risk.  Anything which
was in a TXG not yet flushed to disk is lost.

So ... If you honor sync writes ... and some NFS client issues a sync write
... and the server reboots ungracefully ... then after reboot, the client
will see things as they were expected to be.

If you don't honor sync writes (ZIL disabled), it's possible for an
ungraceful reboot to come up with a filesystem in a state older than what
your NFS clients expect.  So it's probably a good idea to reboot or at least
remount your NFS clients along with the server reboot.  Just to get them all
into a consistent state.

If you are using NFS to export some VM's, and some other compute servers are
acting as the "heads" for those VM's ... Well, VM's are naturally "sync"
mode machines.  Because whenever an application inside the guest OS requests
a sync write, the guest OS is going to issue a sync write to the host OS.

I would not recommend NFS as the backend to host files for a VM guest.  I
would recommend iscsi, which will perform more natively and with less
overhead.

In either case, NFS or iscsi, if you disable ZIL, just make sure to reboot
your VM guests too if the server has an ungraceful reboot.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Adding ZIL to pool questions

Reply via email to