[zfs-discuss] Fwd: [zfs-code] Transaction consistency of ZFS

Zhu Han Tue, 08 Dec 2009 04:33:44 -0800

Answer from another guru...

nxyyt wrote:


> This question is forwarded from ZFS-discussion. Hope any developer can
> throw some light on it.
>
> I'm a newbie to ZFS. I have a special question against the COW transaction
> of ZFS.
>
> Does ZFS keeps the sequential consistency of the same file  when it meets
> power outage or server crash?
>
> Assume following scenario:
>
> My application has only a single thread and it appends the data to the file
> continuously. Suppose at time t1, it append a buf named A to the file. At
> time t2, which is later than t1, it appends a buf named B to the file. If
> the server crashes after t2, is it possible the buf B is flushed back to the
> disk but buf A is not?
>
> My application appends the file only without truncation or overwrite.Does
> ZFS keep the consistency that the data written to a file in sequential order
> or casual order be flushed to disk in the same order?
>
>  If the uncommitted writer operation to a single file always binding with
> the same opening transaction group and all transaction group is committed in
> sequential order, I think the answer should be YES. In other words,
> [b]whether there is only one opening transaction group at any time and  the
> transaction group is committed in order for a single pool?[/b]
>
>
> Hope anybody can help me clarify it. Thank you very much!
>
>

Assuming you are using synchronous write semantics,  the system call to do a
write will NEVER return UNTIL the data has been written to stable media
(which, the case of ZFS, might be an SSD-based ZIL, and not the actual
backing hard disks).  That is the whole point of synchronous write.

If, however, you are doing async writes, or are never closing the filehandle
(essentially doing a streaming write, which, it sounds like you are doing),
you have no guaranty that it will make it to stable storage at any given
instant (fsync() or fflush() is required to guaranty a commit).  For your
type of write, however, where you are constantly appending to the same file
handle, you can count on previous writes committing before subsequent ones -
that is, IF B has made it to stable storage, THEN A will also be there.
However, there is no guaranty that A makes it, it's just that B never makes
it without A having done so already.

I'm not 100% sure, but if you have uncommitted writes A (at t1), B (at t2)
both against the same file, and C (at t2) against a different file, there is
no guaranty that A commits before C.  Just that A will commit
before/simultaneously as B.

Don't count on there being a single transaction group for a single file - if
there are say 5 data writes pending on your file, you may see 1-3 committed
at once, while 4-5 wait (they might be committed together, or separately).

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Fwd: [zfs-code] Transaction consistency of ZFS

Reply via email to