On Wednesday, 31 October, 2018 13:22, Gerlando Falauto 
<gerlando.fala...@gmail.com> wrote:

>first of all let me tell you I cannot thank you enough for going
>through all this trouble for me!

No problem.  I still really do not know if the answer is correct however it 
does let you get data about how much data is actually being moved around.

>It did not occur to me it would help to run checkpoints manually.
>There's a couple of things I don't understand though.

>- After a checkpoint runs to completion (assuming no other process is
>reading nor writing), what is the value of walpages (i.e. *pnLog)? Is
>it the number of pages *originally* in the logfile (i.e. the same as
>copypages), or is it the number of pages *remaining* in the logfile
>(i.e. zero)? Also, does it refer to the number of *valid* pages or the
>number of pages allocated by the file (i.e. fllesize divided by pagesize)?

The table generated by the passive checkpoints looks like this:

sqlite> select * from tracer order by timestamp desc limit 5;
timestamp   status      walpages    copypages
----------  ----------  ----------  ----------
1541032800  0           1462        1462
1541032740  0           1457        1457
1541032680  0           1452        1452
1541032620  0           1454        1454
1541032560  0           1449        1449

I presume that the walpages means the number of frames (pages) in the WAL file 
BEFORE the checkpoint and that copypages is the number of those frames copied 
to the database (the difference being the number left in the wal file).  If not 
all pages can be copied (for example a read transaction is blocking them) then 
I would expect that eventually they will be copied.  Therefore the copypages 
number is really the only useful one (all those pages must have been written to 
the WAL and must eventually be copied to the main database) and the fact that 
some of the pages could not be copied at some particular instant is not really 
useful for determining the actual total amount of data moved.  Whatever is in 
copypages had to be written to the wal and has to be written to the db (written 
twice, erasing the equivalent number of pages).  When exactly that happens is 
not important, I don't think.

>- How often are you running checkpoints? Do you expect this variable
>to have an impact on the end result?

Currently I let it run once per minute with the following set for the database 
right after it is opened:

pragma cache_size=262144;            # 1 Gigabyte
pragma journal_mode=wal;             # Set WAL Mode
pragma wal_autocheckpoint=0;         # Disable Autocheckpoint
pragma journal_size_limit=8388608;   # Truncate the WAL on checkpoint if 
possible to 8 MB
pragma synchronous=NORMAL;           # Only force sync on checkpoint not each 
commit

I don't think that the frequency of checkpointing will have much of a total 
effect on the overall rate of change, however, it does affect the size of each 
checkpoint and the amount of data that could be lost (the amount between 
checkpoints) on system failure.  

>Anyway, I'll definitely try the same approach on the current schema
>(which 
>is *way* different than your super-optimized version) just to see if
>it is
>more or less consistent with the trend reported by smartmontools. If
>that's
>the case, that makes for a much more convenient metric to estimate
>performance and compare implementations.

I just have an asynchronous generator that generates the incoming data in a 
queue, and I simply sit waiting on the queue for stuff to process, and log the 
checkpoint statistics into the same database.  That is, I wait up to 250 ms for 
something to arrive and if it does not then "commit" if a transaction is open 
(which results in a commit once per second) and then I check if the unix epoch 
time is a multiple of 60 (time % 60 == 0) and if so do a passive wal_checkpoint 
and record the results then block waiting for data.  If I did get data from one 
or the other waits I open a transaction if necessary and insert the data.  
Repeat forever.  The checkpoint log is in the same database and takes up a few 
bytes, but you then always have data available to see how many database pages 
are being moved around.  The tables are "trimmed" (delete old data) each time a 
transaction is opened.

Also, the page size of the database is 4K and each row is 4K or a bit more.  I 
wonder if either compressing the data somehow or increasing the database page 
size to ensure each row fits on a page may make any significant difference.

>So really, thanks a lot!

No problem.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to