Richard, Ryan,
Thanks for this. We were dimly aware of WAL but until now hadn’t
needed to use it.
We’ve done a quick check with it and it *seems* to work on a test
database. We’ve all read the docs again and paid attention to
https://www.sqlite.org/wal.html#bigwal
To test if it works we started our long running analytics query, on our
test machine it takes around 8 mins. We then speed up the rate of
updating our database with external data. In the real world an update
comes along every 3-5 mins, in our test system we queue them up so we
have them every 6-10 secs so they are around 60x quicker. The updates
are real data around 3-5MB in size.
We monitored the -wal and the .shm files created as we throw data in the
database.
The .wal file gets larger and larger until it hits 224MB and then stays
constant, the .shm file is only 1.8MB and seems to stay at that size. We
can also see that the main sqlite database is NOT updated (or at least
the timestamp isn’t) whilst we are running the updates in WAL mode.
This appears to be correct as the updates would be in the -wal file.
The times taken for each updates seems a little slower (10% possibly but
that could be just because we are looking at it) but since the data is
real and variable in size, it might be just our subjective impression.
Once the long running read-only analytics query completes, the main
sqlite database appears to get updated (or at least the timestamp on the
file is updated) as we are still updating with our test data and the
-wal files are still being used.
Once we stop updating with our test data, the -wal files and .shm files
disappear (as expected).
A quick check of the database appears to show its correct.
One question though, the size of the -wal file worries us.
https://www.sqlite.org/wal.html#bigwal states
```
Avoiding Excessively Large WAL Files
In normal cases, new content is appended to the WAL file until the WAL
file accumulates about 1000 pages (and is
thus about 4MB in size) at which point a checkpoint is automatically
run and the WAL file is recycled.
The checkpoint does not normally truncate the WAL file (unless the
journal_size_limit pragma is set).
Instead, it merely causes SQLite to start overwriting the WAL file from
the beginning. This is done because
it is normally faster to overwrite an existing file than to append. When
the last connection to a database
closes, that connection does one last checkpoint and then deletes the
WAL and its associated shared-memory
file, to clean up the disk.
```
We have not set the journal_size_limit and we have a -wal file which is
224MB in size, somewhat larger than 4MB. We are running
3.8.2 2013-12-06 14:53:30 27392118af4c38c5203a04b8013e1afdb1cebd0d
which does not appear to have the code in 3.11.0 so that the WAL file is
proportional to the size of the transaction. From the same page of the
manual:
```
Very large write transactions.
A checkpoint can only complete when no other transactions are running,
which
means the WAL file cannot be reset in the middle of a write transaction.
So a large change to a large database
might result in a large WAL file. The WAL file will be checkpointed once
the write transaction completes
(assuming there are no other readers blocking it) but in the meantime,
the file can grow very big.
As of SQLite version 3.11.0, the WAL file for a single transaction
should be proportional in size to
the transaction itself. Pages that are changed by the transaction should
only be written into the WAL
file once. However, with older versions of SQLite, the same page might
be written into the WAL file multiple
times if the transaction grows larger than the page cache.
```
We think that using WAL mode works for us, indeed inspection seems to
indicate it does, but the size of the -wal file appears to be far larger
than would be expected. Is there a problem here? It doesn’t appear to
be a problem but would welcome any comments.
Thanks for taking the time to reply.
Rob
On 6 Aug 2016, at 22:35, R Smith wrote:
On 2016/08/06 10:50 PM, Rob Willett wrote:
Our understanding of this is that many processes can READ the
database at the same time but NO process can INSERT/UPDATE if another
is reading. We had thought that one process can write and multiple
processes can read. Our reading (no pun intended) now of this
paragraph from the manual is that you cannot write if one or more
processes is reading. Have we understood this correctly? If so is
there an easy way to get around this?
The Write-Ahead-Log (WAL) journal mode will help you. It basically
allows a writer to write to the WAL Log in stead of the main database
so that any amount of readers can still do their thing reading the
database (and the parts of the WAL journal that is already committed,
or even parts still in progress if you use "read_uncommitted" mode).
SQLite then pushes committed data into the DB file based on
Checkpoints which you can invoke directly or set up to happen every so
often.
This is the new way to do things and the way you should always use
unless you have a specific reason not to (which might include file
restrictions, needing read-only-ness, separate speedy DBs that doesn't
fsync() so much, etc.)
More information here:
https://www.sqlite.org/wal.html
Your DB is quite big and it seems you write often, so please take
special note of this section:
https://www.sqlite.org/wal.html#bigwal
HTH - Cheers,
Ryan
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users