>So you're saying that while the OS is building txg's to write to disk, the
>OS will never reorder the sequence in which individual write operations get
>ordered into the txg's.  That is, an application performing a small sync
>write, followed by a large async write, will never have the second operation
>flushed to disk before the first.  Can you support this belief in any way?

The question is not how the writes are ordered but whether an earlier
write can be in a later txg.  A transaction group is committed atomically.

In http://arc.opensolaris.org/caselog/PSARC/2010/108/mail I ask a similar 
question to make sure I understand it correctly, and the answer was:

"> = Casper", the answer is from Neil Perrin:

        > Is there a partialy order defined for all filesystem operations?
        >   

        File system operations  will be written in order for all settings of 
the 
        sync flag.

        > Specifically, will ZFS guarantee that when fsync()/O_DATA happens on a
        > file,
   
        (I assume by O_DATA you meant O_DSYNC).

        > that later transactions will not be in an earlier transaction group?
        > (Or is this already the case?)
          
        This is already the case.


So what I assumed was true but what you made me doubt, was apparently still
true: later transactions cannot be committed in an earlier txg.



>If that's true, if there's no increased risk of data corruption, then why
>doesn't everybody just disable their ZIL all the time on every system?

For an application running on the file server, there is no difference.
When the system panics you know that data might be lost.  The application 
also dies.  (The snapshot and the last valid uberblock are equally valid)

But for an application on an NFS client, without ZIL data will be lost 
while the NFS client believes the data is written amd it will not try 
again.  With the ZIL, when the NFS server says that data is written then 
it is actually on stable storage.

>The reason to have a sync() function in C/C++ is so you can ensure data is
>written to disk before you move on.  It's a blocking call, that doesn't
>return until the sync is completed.  The only reason you would ever do this
>is if order matters.  If you cannot allow the next command to begin until
>after the previous one was completed.  Such is the situation with databases
>and sometimes virtual machines.  

So the question is: when will your data invalid?

What happens with the data when the system dies before the fsync() call?
What happens with the data when the system dies after the fsync() call?
What happens with the data when the system dies after more I/O operations?

With the zil disabled, you call fsync() but you may encounter data from
before the call to fsync().  That could happen before, so I assume you can
actually recover from that situation.

Casper

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to