On Wednesday, 31 October, 2018 13:22, Gerlando Falauto <gerlando.fala...@gmail.com> wrote:
>first of all let me tell you I cannot thank you enough for going >through all this trouble for me! No problem. I still really do not know if the answer is correct however it does let you get data about how much data is actually being moved around. >It did not occur to me it would help to run checkpoints manually. >There's a couple of things I don't understand though. >- After a checkpoint runs to completion (assuming no other process is >reading nor writing), what is the value of walpages (i.e. *pnLog)? Is >it the number of pages *originally* in the logfile (i.e. the same as >copypages), or is it the number of pages *remaining* in the logfile >(i.e. zero)? Also, does it refer to the number of *valid* pages or the >number of pages allocated by the file (i.e. fllesize divided by pagesize)? The table generated by the passive checkpoints looks like this: sqlite> select * from tracer order by timestamp desc limit 5; timestamp status walpages copypages ---------- ---------- ---------- ---------- 1541032800 0 1462 1462 1541032740 0 1457 1457 1541032680 0 1452 1452 1541032620 0 1454 1454 1541032560 0 1449 1449 I presume that the walpages means the number of frames (pages) in the WAL file BEFORE the checkpoint and that copypages is the number of those frames copied to the database (the difference being the number left in the wal file). If not all pages can be copied (for example a read transaction is blocking them) then I would expect that eventually they will be copied. Therefore the copypages number is really the only useful one (all those pages must have been written to the WAL and must eventually be copied to the main database) and the fact that some of the pages could not be copied at some particular instant is not really useful for determining the actual total amount of data moved. Whatever is in copypages had to be written to the wal and has to be written to the db (written twice, erasing the equivalent number of pages). When exactly that happens is not important, I don't think. >- How often are you running checkpoints? Do you expect this variable >to have an impact on the end result? Currently I let it run once per minute with the following set for the database right after it is opened: pragma cache_size=262144; # 1 Gigabyte pragma journal_mode=wal; # Set WAL Mode pragma wal_autocheckpoint=0; # Disable Autocheckpoint pragma journal_size_limit=8388608; # Truncate the WAL on checkpoint if possible to 8 MB pragma synchronous=NORMAL; # Only force sync on checkpoint not each commit I don't think that the frequency of checkpointing will have much of a total effect on the overall rate of change, however, it does affect the size of each checkpoint and the amount of data that could be lost (the amount between checkpoints) on system failure. >Anyway, I'll definitely try the same approach on the current schema >(which >is *way* different than your super-optimized version) just to see if >it is >more or less consistent with the trend reported by smartmontools. If >that's >the case, that makes for a much more convenient metric to estimate >performance and compare implementations. I just have an asynchronous generator that generates the incoming data in a queue, and I simply sit waiting on the queue for stuff to process, and log the checkpoint statistics into the same database. That is, I wait up to 250 ms for something to arrive and if it does not then "commit" if a transaction is open (which results in a commit once per second) and then I check if the unix epoch time is a multiple of 60 (time % 60 == 0) and if so do a passive wal_checkpoint and record the results then block waiting for data. If I did get data from one or the other waits I open a transaction if necessary and insert the data. Repeat forever. The checkpoint log is in the same database and takes up a few bytes, but you then always have data available to see how many database pages are being moved around. The tables are "trimmed" (delete old data) each time a transaction is opened. Also, the page size of the database is 4K and each row is 4K or a bit more. I wonder if either compressing the data somehow or increasing the database page size to ensure each row fits on a page may make any significant difference. >So really, thanks a lot! No problem. --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users