Hi all,

Recently there's been discussion [1] in the Linux community about how
filesystems should deal with rename(2), particularly in the case of a crash.
ext4 was found to truncate files after a crash, that had been written with
open("foo.tmp"), write(), close() and then rename("foo.tmp", "foo"). This is
 because ext4 uses delayed allocation and may not write the contents to disk
immediately, but commits metadata changes quite frequently. So when
rename("foo.tmp","foo") is committed to disk, it has a length of zero which
is later updated when the data is written to disk. This means after a crash,
"foo" is zero-length, and both the new and the old data has been lost, which
is undesirable. This doesn't happen when using ext3's default settings
because ext3 writes data to disk before metadata (which has performance
problems, see Firefox 3 and fsync[2])

Ted T'so's (the main author of ext3 and ext4) response is that applications
which perform open(),write(),close(),rename() in the expectation that they
will either get the old data or the new data, but not no data at all, are
broken, and instead should call open(),write(),fsync(),close(),rename().
Most other people are arguing that POSIX says rename(2) is atomic, and while
POSIX doesn't specify crash recovery, returning no data at all after a crash
is clearly wrong, and excessive use of fsync is overkill and
counter-productive (Ted later proposes a "yes-I-really-mean-it" flag for
fsync). I've omitted a lot of detail, but I think this is the core of the
argument.

Now the question I have, is how does ZFS deal with
open(),write(),close(),rename() in the case of a crash? Will it always
return the new data or the old data, or will it sometimes return no data? Is
 returning no data defensible, either under POSIX or common sense? Comments
about other filesystems, eg UFS are also welcome. As a counter-point, XFS
(written by SGI) is notorious for data-loss after a crash, but its authors
defend the behaviour as POSIX-compliant.

Note this is purely a technical discussion - I'm not interested in replies
saying ?FS is a better filesystem in general, or on GPL vs CDDL licensing.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781?comments=all
http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/
http://lwn.net/Articles/323169/
http://mjg59.livejournal.com/108257.html http://lwn.net/Articles/323464/
http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
http://lwn.net/Articles/323752/ *
http://lwn.net/Articles/322823/ *
* are currently subscriber-only, email me for a free link if you'd like to
read them
[2] http://lwn.net/Articles/283745/

-- 
James Andrewartha | Sysadmin
Data Analysis Australia Pty Ltd | STRATEGIC INFORMATION CONSULTANTS
97 Broadway, Nedlands, Western Australia, 6009
PO Box 3258, Broadway Nedlands, WA, 6009
T: +61 8 9386 3304 | F: +61 8 9386 3202 | I: http://www.daa.com.au
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to