Joerg Schilling wrote:
> James Andrewartha <jam...@daa.com.au> wrote:
> > Recently there's been discussion [1] in the Linux community about how 
> > filesystems should deal with rename(2), particularly in the case of a crash.
> > ext4 was found to truncate files after a crash, that had been written with
> > open("foo.tmp"), write(), close() and then rename("foo.tmp", "foo"). This is
> >  because ext4 uses delayed allocation and may not write the contents to disk
> > immediately, but commits metadata changes quite frequently. So when
> > rename("foo.tmp","foo") is committed to disk, it has a length of zero which
> > is later updated when the data is written to disk. This means after a crash,
> > "foo" is zero-length, and both the new and the old data has been lost, which
> > is undesirable. This doesn't happen when using ext3's default settings
> > because ext3 writes data to disk before metadata (which has performance
> > problems, see Firefox 3 and fsync[2])
> >
> > Ted T'so's (the main author of ext3 and ext4) response is that applications
> > which perform open(),write(),close(),rename() in the expectation that they
> > will either get the old data or the new data, but not no data at all, are
> > broken, and instead should call open(),write(),fsync(),close(),rename().
>
> The only granted way to have the file "new" in a stable state on the
> disk
> is to call:
> 
> f = open("new", O_WRONLY|O_CREATE|O_TRUNC, 0666);
> write(f, "dat", size);
> fsync(f);
> close(f);

AFAIUI, the ZFS transaction group maintains write ordering, at least as far as 
write()s to the file would be in the ZIL ahead of the rename() metadata updates.

So I think the atomicity is maintained without requiring the application to 
call fsync() before closing the file.  If the TXG is applied and the rename() 
is included, then the file writes have been too, so foo would have the new 
contents.  If the TXG containing the rename() isn't complete and on the ZIL 
device at crash time, foo would have the old contents.

Posix doesn't require the OS to sync() the file contents on close for local 
files like it does for NFS access?  How odd.

--Joe

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to