Richard, Ryan,

Thanks for this. We were dimly aware of WAL but until now hadn’t needed to use it.

We’ve done a quick check with it and it *seems* to work on a test database. We’ve all read the docs again and paid attention to https://www.sqlite.org/wal.html#bigwal

To test if it works we started our long running analytics query, on our test machine it takes around 8 mins. We then speed up the rate of updating our database with external data. In the real world an update comes along every 3-5 mins, in our test system we queue them up so we have them every 6-10 secs so they are around 60x quicker. The updates are real data around 3-5MB in size.

We monitored the -wal and the .shm files created as we throw data in the database.

The .wal file gets larger and larger until it hits 224MB and then stays constant, the .shm file is only 1.8MB and seems to stay at that size. We can also see that the main sqlite database is NOT updated (or at least the timestamp isn’t) whilst we are running the updates in WAL mode. This appears to be correct as the updates would be in the -wal file.

The times taken for each updates seems a little slower (10% possibly but that could be just because we are looking at it) but since the data is real and variable in size, it might be just our subjective impression.

Once the long running read-only analytics query completes, the main sqlite database appears to get updated (or at least the timestamp on the file is updated) as we are still updating with our test data and the -wal files are still being used.

Once we stop updating with our test data, the -wal files and .shm files disappear (as expected).

A quick check of the database appears to show its correct.

One question though, the size of the -wal file worries us. https://www.sqlite.org/wal.html#bigwal states

```
Avoiding Excessively Large WAL Files

In normal cases, new content is appended to the WAL file until the WAL file accumulates about 1000 pages (and is thus about 4MB in size) at which point a checkpoint is automatically run and the WAL file is recycled. The checkpoint does not normally truncate the WAL file (unless the journal_size_limit pragma is set). Instead, it merely causes SQLite to start overwriting the WAL file from the beginning. This is done because it is normally faster to overwrite an existing file than to append. When the last connection to a database closes, that connection does one last checkpoint and then deletes the WAL and its associated shared-memory
file, to clean up the disk.
```

We have not set the journal_size_limit and we have a -wal file which is 224MB in size, somewhat larger than 4MB. We are running

3.8.2 2013-12-06 14:53:30 27392118af4c38c5203a04b8013e1afdb1cebd0d

which does not appear to have the code in 3.11.0 so that the WAL file is proportional to the size of the transaction. From the same page of the manual:

```
Very large write transactions.

A checkpoint can only complete when no other transactions are running, which means the WAL file cannot be reset in the middle of a write transaction. So a large change to a large database might result in a large WAL file. The WAL file will be checkpointed once the write transaction completes (assuming there are no other readers blocking it) but in the meantime, the file can grow very big.

As of SQLite version 3.11.0, the WAL file for a single transaction should be proportional in size to the transaction itself. Pages that are changed by the transaction should only be written into the WAL file once. However, with older versions of SQLite, the same page might be written into the WAL file multiple
times if the transaction grows larger than the page cache.
```

We think that using WAL mode works for us, indeed inspection seems to indicate it does, but the size of the -wal file appears to be far larger than would be expected. Is there a problem here? It doesn’t appear to be a problem but would welcome any comments.

Thanks for taking the time to reply.

Rob


On 6 Aug 2016, at 22:35, R Smith wrote:

On 2016/08/06 10:50 PM, Rob Willett wrote:

Our understanding of this is that many processes can READ the database at the same time but NO process can INSERT/UPDATE if another is reading. We had thought that one process can write and multiple processes can read. Our reading (no pun intended) now of this paragraph from the manual is that you cannot write if one or more processes is reading. Have we understood this correctly? If so is there an easy way to get around this?

The Write-Ahead-Log (WAL) journal mode will help you. It basically allows a writer to write to the WAL Log in stead of the main database so that any amount of readers can still do their thing reading the database (and the parts of the WAL journal that is already committed, or even parts still in progress if you use "read_uncommitted" mode). SQLite then pushes committed data into the DB file based on Checkpoints which you can invoke directly or set up to happen every so often.

This is the new way to do things and the way you should always use unless you have a specific reason not to (which might include file restrictions, needing read-only-ness, separate speedy DBs that doesn't fsync() so much, etc.)

More information here:
https://www.sqlite.org/wal.html

Your DB is quite big and it seems you write often, so please take special note of this section:
https://www.sqlite.org/wal.html#bigwal


HTH - Cheers,
Ryan
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to