On 1 February 2016 at 18:58, Simon Slavin <slavins at bigfraud.org> wrote:

>
> On 1 Feb 2016, at 9:23am, bm.email01 at gmail.com wrote:
>
> > ---
> > No, SQLite does not. On COMMIT it fsyncs the database file and unlinks
> the
> > journal[1], but does not fsync the directory.
> > ---
> >
> > Since that can cause the last transaction to be lost, despite Sqlite
> returning a 'Commit successful' code to the application, doesn't that mean
> that Sqlite is _not_ truly 'Durable' (per ACID)?
>
> 1) The fault here is the file system not SQLite.  This one particular file
> system has this strange requirement that you fsync the directory.  SQLite
> is ACID.  It's the file system that messes it up.  SQLite cannot defend
> against untrustworthy middleware -- sooner or later a programmer has to
> trust their hardware is doing what it says it's doing.
>

I take your point, but as Florian pointed out it's not just one file
system; its a somewhat well known quirk of POSIX fsync.
http://blog.httrack.com/blog/2013/11/15/everything-you-always-wanted-to-know-about-fsync/

It's a bit disingenuous to say "the file system requires a particular dance
to ensure an unlink persists on disk, but the dance is excessively complex
so we're going to skip it and blame any problems on the file system." The
fact remains that sqlite returned SQLITE_OK for a transaction that never
succeeded, because it didn't take the steps required for the i/o to hit
disk. However...


> 3) For both the above problems SQLite still neither loses data nor
> corrupts the database file.  If the program crashes the file is
> automatically closed.  When the SQLite API is used to reopen the file the
> unexpected close will be discovered and corrected for.
>

I agree with this 100%. SQLite absolutely provides consistency here, which
I would argue is the most important. In an actual power-loss scenario, what
is an application going to do between receiving SQLITE_OK and before losing
power that possibly matters? If the disk cache hasn't been flushed by the
time the power drops, then nothing else you've done in the meantime will
have hit disk either.

Note that the OP's situation is very unusual in that a controlled shutdown
is initiated after sqlite returns SQLITE_OK, but that shutdown *does not
sync the filesystem*! That is the only reason for the 30 second window. In
usual circumstances, my feeling is that the window between "power failing
in the middle of sqlite_exec" and "power failing after sqlite_exec has
returned but before the unlinked journal is synced to disk" is so small
that sqlite is within its rights to blur the two.

-Rowan

Reply via email to