On 10/11/2012 11:38 PM, Nico Williams wrote:
On Wed, Oct 10, 2012 at 12:48 PM, Richard Hipp<d...@sqlite.org>  wrote:
Could you list the requirements of such a light weight barrier?
i.e. what would it need to do minimally, what's different from
fsync/fdatasync ?

For SQLite, the write barrier needs to involve two separate inodes.  The
requirement is this:

...

Note also that when fsync() works as advertised, SQLite transactions are
ACID.  But when fsync() is reduced to a write-barrier, we loss the D
(durable) and transactions are only ACI.  In our experience, nobody really
cares very much about durable across a power-loss.  People are mainly
interested in Atomic, Consistent, and Isolated.  If you take a power loss
and then after reboot you find the 10 seconds of work prior to the power
loss is missing, nobody much cares about that as long as all of the prior
work is still present and consistent.

There is something you can do: use a combination of COW on-disk
formats in such a way that it's possible to detect partially-committed
transactions and rollback to the last good known root, and
backgrounded fsync()s (i.e., in a separate thread, without waiting for
the fsync() to complete).

SQLite WAL mode comes close to that if you run your checkpoints
in the background. Following a power failure, those transactions that
have been checkpointed to the database file are assumed to have been
synced. Then SQLite uses checksums to determine the subset of
transactions in the WAL file that are intact.

I say close, because if you keep on writing to the db while the
checkpoint is running you end up with the WAL file growing indefinitely.
So it doesn't quite work.

Omitting the D in ACID changes everything. With the D in, you need to
fsync() after every transaction. Without it, you need to fsync() before
reclaiming space (i.e. when overwriting old data with new - you need
to be sure that the old data will not be required following recovery
from a power failure, which means an fsync()).

Dan.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to