>So you're saying that while the OS is building txg's to write to disk, the >OS will never reorder the sequence in which individual write operations get >ordered into the txg's. That is, an application performing a small sync >write, followed by a large async write, will never have the second operation >flushed to disk before the first. Can you support this belief in any way?
The question is not how the writes are ordered but whether an earlier write can be in a later txg. A transaction group is committed atomically. In http://arc.opensolaris.org/caselog/PSARC/2010/108/mail I ask a similar question to make sure I understand it correctly, and the answer was: "> = Casper", the answer is from Neil Perrin: > Is there a partialy order defined for all filesystem operations? > File system operations will be written in order for all settings of the sync flag. > Specifically, will ZFS guarantee that when fsync()/O_DATA happens on a > file, (I assume by O_DATA you meant O_DSYNC). > that later transactions will not be in an earlier transaction group? > (Or is this already the case?) This is already the case. So what I assumed was true but what you made me doubt, was apparently still true: later transactions cannot be committed in an earlier txg. >If that's true, if there's no increased risk of data corruption, then why >doesn't everybody just disable their ZIL all the time on every system? For an application running on the file server, there is no difference. When the system panics you know that data might be lost. The application also dies. (The snapshot and the last valid uberblock are equally valid) But for an application on an NFS client, without ZIL data will be lost while the NFS client believes the data is written amd it will not try again. With the ZIL, when the NFS server says that data is written then it is actually on stable storage. >The reason to have a sync() function in C/C++ is so you can ensure data is >written to disk before you move on. It's a blocking call, that doesn't >return until the sync is completed. The only reason you would ever do this >is if order matters. If you cannot allow the next command to begin until >after the previous one was completed. Such is the situation with databases >and sometimes virtual machines. So the question is: when will your data invalid? What happens with the data when the system dies before the fsync() call? What happens with the data when the system dies after the fsync() call? What happens with the data when the system dies after more I/O operations? With the zil disabled, you call fsync() but you may encounter data from before the call to fsync(). That could happen before, so I assume you can actually recover from that situation. Casper _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss