Re: [sqlite] WAL, synchronous=1 and durability

Daniel Frimerman Tue, 11 Sep 2012 09:03:41 -0700

I fully agree - my tests (involving many hard resets) confirm this.  If I
could just expound on some of your statements for clarity, for others.
This is my understanding gained from testing and reading the posts -- I did
not see the code.


> In both synchronous=NORMAL and synchronous=FULL the
> *-wal file may contain more than one transaction.
Because data is only written to the database file at "checkpoints" which
can be done manually or automatically by SQLite (current default set to
1000 pages).
So the -wal file will always contain up to 1000 pages (the default) before
getting flushed.
The difference between these 2 is that with NORMAL the writes are not
guaranteed, whereas with FULL the writes are guaranteed.

> The WAL file includes a running checksum so that if a power failure
> occurs, the next client to read the database can determine the prefix
> of the WAL file that can be considered trustworthy (as the power
> failure may have corrupted more recently written parts of the file).
To clarify if it is not obvious: this recover happens for OFF, NORMAL or
FULL when the DB is openned.

> In both NORMAL mode, we do the following:
>  1. Write a bunch of transactions into the WAL file.
In FULL mode, the above is sync'ed, although I don't know whether
individual writes to the WAL file are sync'ed, or when all the data
belonging to the commit is written the WAL is sync'ed.
In NORMAL mode this is not done and that is why it is much faster (at least
15 times faster in my basic tests).

The steps below are part of a checkpoint that apply to NORMAL and FULL.
This is when the WAL file reached it's page limit, or called manually by
the user.  This step might be redundant for FULL mode, but that's not
important.
>  2. fsync() the WAL file.
>  3. Copy the data from the WAL file into the database file.
>  4. fsync() the database file.

> Synchronous=FULL mode is different. During step 1 above in FULL
> mode, SQLite calls fsync() on the WAL file after writing each
> transaction to it - before the users COMMIT command returns.
Since someone else kindly pointed out that COMMIT can require several
writes -- is each and every write required for a COMMIT synchronised to
prevent corruption? Or does SQLite utilise the same idea as with NORMAL
mode using checksums to prevent corruption? Perhaps that is why WAL+FULL is
faster than non-WAL+FULL? The more sequential writes in WAL mode (as
explained in the manual) must help here.  If something like checksums is
used to prevent corruption, maybe that also can be applied to non-WAL
journal modes, as an option to speed up their NORMAL mode?


@Simon Slavin
> Though if your OS and hardware does perform as documented then
> neither mode will actually lose data or cause database corruption.
This doesn't make sense; OS's do I/O caching, data corruption is always a
possibility then.
> FULL) Every COMMIT will take more time, but a power-failure can lose a
> maximum of one transaction.
Someone already corrected this; power-failure will result in no loss of
transaction, not even one.
I think I know why you said that.  In non-WAL and NORMAL, there are
intermediate writes related to 1 commit,
Also, as for network setups, docs on the WAL clearly state that the DB
should not be used over a network.
Other than that, I enjoyed the information about hard drives (not)
supporting ACID.  Good to know.

@Keith Chew (original poster)
> I am using EXT3 with barrier=1 and write-cache disabled from the HDD...
I believe there will be no difference in NORMAL or FULL modes, when
measuring performance, durability or consistency (data corruption).
SQLite will be probably be making pointless calls to sync the I/O buffers.
The steps above (1-4) show the difference between NORMAL and FULL: in FULL
every write to the WAL file is sync'ed, the rest is the same.  So in your
case of hardware cache being turned off, every write to the WAL file will
be physical anyway.
Well this excessive clearing up is giving me a callus :-)



In my WAL tests with NORMAL mode and caching enabled, the longer you waited
(we are talking up to ten seconds), the less data was lost after a hard
reboot, so by no means all data is lost, some will make it to the file, and
thanks to the checksums (as pointed out by Dan) the data will be recovered
to the best of SQLite's ability.

Well these are my 10 pence.

On Tue, Sep 11, 2012 at 5:52 AM, Dan Kennedy <danielk1...@gmail.com> wrote:

> On 09/11/2012 11:12 AM, Keith Chew wrote:
>
>> Hi Jay
>>
>>>
>>>    In WAL mode that's only half incorrect.  Your description of NORMAL
>>>    seems correct, but FULL should be fully durable.  The WAL file may
>>>    need to be checkpointed on startup, but the if the commit happens in
>>>    WAL/FULL, you should have full durability.
>>>
>>>
>> This is the reason for my question. As far as I can see, the
>> difference between NORMAL and FULL in WAL mode is that FULL has one
>> transaction in WAL to be checkpointed, and NORMAL has multiple. Since
>> both of them need to checkpoint for durability, how is it that FULL
>> can guarantee durability for 1 transaction, but not NORMAL for
>> multiple? From a development point of view, I would imagine that both
>> will use the same checkpoint routine, but yet I cannot understand why
>> FULL has a guarantee and NORMAL does not..
>>
>
> In WAL mode, when a transaction is written to disk, the modified
> pages are appended to the *-wal file. Later on, during a checkpoint, all
> the modified pages in the *-wal file are copied back into the
> database file. In both synchronous=NORMAL and synchronous=FULL the
> *-wal file may contain more than one transaction.
>
> The WAL file includes a running checksum so that if a power failure
> occurs, the next client to read the database can determine the prefix
> of the WAL file that can be considered trustworthy (as the power
> failure may have corrupted more recently written parts of the file).
>
> In both NORMAL mode, we do the following:
>
>   1. Write a bunch of transactions into the WAL file.
>   2. fsync() the WAL file.
>   3. Copy the data from the WAL file into the database file.
>   4. fsync() the database file.
>
> If a power failure occurs at any time, the next process to read the
> database scans the WAL file and attempts to read as many transactions
> as possible. If the checksum fails at any point, it stops reading.
>
> So you can lose data. Say a power failure occurs between steps 1
> and 2 above. If your WAL had 5 unsynced transactions in it then
> following recovery you may find that none, some or all of them have
> survived, depending on how much of the WAL file actually made it
> to disk before the power failed.
>
> Synchronous=FULL mode is different. During step 1 above in FULL
> mode, SQLite calls fsync() on the WAL file after writing each
> transaction to it - before the users COMMIT command returns.
>
> In this case if the same power failure occurs there is no chance
> that any successfully committed transactions will be lost, as they
> are guaranteed to have made it to disk.
>
> So, in WAL mode, both sync=NORMAL and sync=FULL prevent database
> corruption. However if a power failure occurs in NORMAL mode, some
> recently committed transactions may have disappeared following
> recovery.
>
> Dan.
>
>
>
>
>
>
>
>
> ______________________________**_________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users>
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] WAL, synchronous=1 and durability

Reply via email to