[sqlite] WAL checkpoints versus fsync.

Scott Hess Wed, 30 Mar 2011 06:53:56 -0700

[From a thread about WAL checkpoints...]

On Wed, Mar 30, 2011 at 6:28 AM, Richard Hipp <[email protected]> wrote:
> On Wed, Mar 30, 2011 at 8:46 AM, Mikael <[email protected]> wrote:
>> Therefore I now wish to check with you, is SQLite implemented to somehow
>> make checkpoints not be able to corrupt the main database file ever? (Say a
>> checkpoint went halfway and then the OS ran out of memory, killed all
>> processes and rebooted.)
>
> Yes.  The WAL is not cleared or deleted until after the checkpoint
> completes.  If there is an OS crash or power loss in the middle of a
> checkpoint, then the checkpoint will automatically start over again the next
> time the database file is opened.
>
> This is where correct functioning of fsync() is critical.  We use fsync() as
> a write barrier operation.  We require that all I/O operations that occur
> before the fsync() (writes to the database file) must complete prior to any
> I/O operations that occur after the fsync() (clearing the WAL file).  If
> fsync() is working correctly, then everything is safe.  Unfortunately, on
> many consumer-grade disk drives nowadays, fsync() does not work correctly
> :-(  But you say that fsync is working correctly on your system, so it
> shouldn't be a problem for you.


Aha!  I've been wondering about this (and too lazy/busy to really dig
into the code).  This implies that if you have a platform which is
known to have periodic corruptions of the "fsync lied" sort, then
using WAL narrows but does not entirely close the window.  Since it
relies on fsync working at the time of checkpoint, you can tune the
frequency to reduce occurance.

I've seen some other storage systems work around this (to some extent)
by keeping overlap between merges of the log files.  By tuning that
overlap to represent a reasonable timeframe (like 15 minutes), if the
fsync fails chances are the _previous_ fsync succeeded, simply by
virtue of the system having been up for that 15 minutes, giving it
more time to have written out those blocks.  Unfortunately, in the
cases I'm aware of this worked by virtue of the system using a
collection of immutable files to represent the database, so it was
easy to keep the overlapped data and figure things out at startup from
the manifest.  I can't offhand think of a way to apply it to
update-in-place on the main SQLite database, but maybe someone else is
smarter than me?

-scott
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] WAL checkpoints versus fsync.

Reply via email to