On 2016/08/07 8:55 AM, Rob Willett wrote:
Richard, Ryan,
Thanks for this. We were dimly aware of WAL but until now hadn’t
needed to use it.
We’ve done a quick check with it and it *seems* to work on a test
database. We’ve all read the docs again and paid attention to
https://www.sqlite.org/wal.html#bigwal
To test if it works we started our long running analytics query, on
our test machine it takes around 8 mins. We then speed up the rate of
updating our database with external data. In the real world an update
comes along every 3-5 mins, in our test system we queue them up so we
have them every 6-10 secs so they are around 60x quicker. The updates
are real data around 3-5MB in size.
We monitored the -wal and the .shm files created as we throw data in
the database.
The .wal file gets larger and larger until it hits 224MB and then
stays constant, the .shm file is only 1.8MB and seems to stay at that
size. We can also see that the main sqlite database is NOT updated (or
at least the timestamp isn’t) whilst we are running the updates in WAL
mode. This appears to be correct as the updates would be in the -wal
file.
I'm truncating this post for brevity - but basically your concern about
the size (voiced later in the post) is not a concern. What happens is
the stated 4MB is simply 1000 pages x 4KB default page size - your page
size might be a lot bigger (and should be set higher looking at your DB
size and data entry sizes - I think it is "nicer" if, at a minimum, a
complete row can fit on a page). Further, the WAL for your version of
SQLite will grow with copies of data and multiple inserts in it because
of the long-running query not allowing push-back check points for the
time - and looking at your insert frequency and size, your WAL size
seems pretty normal. (If you manage it wrong, it will fill up Terrabytes
- this is the situation you want to avoid, but I think you've got it
sorted).
The Documentation simply describes the normal situation, which yours isn't.
Also, on a point of satisfying my OCD... going on your quoted averages -
5 minutes vs. 10 secs is a 30:1 ratio (5 mins = 300 seconds vs. 10 secs)
so the speed is only 30x faster, not 60) - And before anyone asks, yes
the low end of the ranges given is 3mins (180s) vs. 6 secs which is also
a 30:1 ratio. Even if I take the opposite range extremes (5 mins vs.
6s) I still only get 50x speedup. LoL - Sorry, I'll shush now :)
As an aside, I think Richard posted a small study of testing multiple DB
ops with varying page sizes and varying hardware page sizes, and
basically, IIRC, the Jury was out on best size in the general case with
8192 seeming to be a good standard and the idea that the page size
should try match the underlying OS page size for best performance turned
out to be a bit of a "sometimes maybe", but the point was made that
every implementation should experiment to find the optimum size. That
said, my memory cannot be trusted - could someone re-post that or point
us to an on-line page somewhere? Thanks!
Cheers,
Ryan
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users