It gets worse when you throw in journaled file systems. The problem linux 
faces is there are many filesystems you can pick, so the kernel/system needs 
to be a bit wishy-washy and leave some implementation details up to other 
codes. I think if your using ext3 (in data=journal and data=ordered modes) 
fsync will commit to disk, but that is mostly a function of how the journal 
system works and not fsync. This task is much easier for apple and MS 
because they control the filesystems and their interaction with the 
kernel/system.


On 5/18/05, Ludvig Strigeus <[EMAIL PROTECTED]> wrote:
> 
> I believe you are wrong. Read the bottom of:
> 
> http://www.fourteenminutes.com/code/avantslash/live/avantify.cgi?url=05/05/13/0529252
> 
> From Linux --
> NOTES
> In case the hard disk has write cache enabled, the data may not really
> be on permanent storage when fsync/fdatasync return.
> 
> There is an ATA FLUSH command used to flush the disk controller's disk
> cache. FlushFileBuffers on Windows uses this, F_FULLFSYNC on MacOS
> uses this. fsync() on Linux does not seem to use this.
> 
> Read the bottom of:
> http://lists.apple.com/archives/darwin-dev/2005/Feb/msg00087.html
> 
> Quote:
> Let me explain in more detail. With fsync() even though the OS
> writes the data through to the disk and the disk says "yes I wrote
> the data", the data is not actually on permanent storage. Unless
> you explicitly disable it, all disks have a write buffer which holds
> data you've written. The disk buffers the data you wrote until it
> decides to flush it to the platters (and the writes may not be in
> the order you wrote them). If you lose power or the system crashes
> before the data is written, you can wind up in a situation where only
> some of your data is actually on disk. What is worse is that even if
> you write blocks A, B and C, call fsync() and then write block D you
> may find after rebooting that blocks A and D are on disk but B and C
> are not (in fact any ordering of A, B, C, and D is possible).
> 
> ...
> On MacOS X fsync() behaves the same as it does
> on all Unices. That's not good enough if you really care about data
> integrity and so we also provide the F_FULLFSYNC fcntl. As far as I
> know, MacOS X is the only OS to provide this feature for apps that
> need to truly guarantee their data is on disk.
> 
> 
> /Ludvig
> 
> On 5/18/05, Reid Thompson wrote:
> > Ludvig Strigeus wrote:
> > > Stuff below relates to IDE drives.
> > >
> > > On Linux, the fsync() call doesn't actually force that the
> > > data reaches the
> > > physical disk platters. It just makes sure that the data is
> > > sent to the
> > > cache on the disk.
> >
> > I do not believe the above is exactly correct -- fsync relies on the
> > response of the device(disk), the problem is that IDE disk manufacturers
> > return to fsync that the data is written, when in actuality it still
> > remains in cache. This is a "failure" or a least "playing loosely with
> > what to do with fsync calls" on the part of the IDE disk manufacturers.
> >
> > Having said that, the below note is probably not correct either --
> > unless windows has some 'extra' ability to know that when the IDE drive
> > returns "written to media" (but data is still in cache) that the IDE
> > drive is lying.
> >
> >
> > > On Windows, FlushFileBuffers() forces the disk to actually
> > > write the data to
> > > the physical disk.
> > >
> > > This explains why sqlite on Linux is at least 5 times faster than on
> > > Windows. It also means that Sqlite is not ACID when using
> > > Linux with IDE
> > > drives.
> > >
> >
> > reid
> >
>

Reply via email to