I'm using c++ builder 10.2 tokyo on windows 10 and sqlite 3.23.1. I was working
on code that stored RowIDs in a vector. I was sizing the vector beforehand and
then timing how long it took to store the RowIDs returned by an sqlite query.
By accident I sized the vector too big (by a factor of 10) and found the
timings doubled (sometimes tripled) despite the fact it was still only stepping
through the same query. We're talking going from 20 secs to 40-60 secs.
At first I thought 'memory problem' BUT as I'm using a ms surface pro 4 with 16
GB ram and 512 GB SSD and running a 64 bit release version I thought how can
that be? I mean the mistakenly sized vector only used 8 GB?
I've managed to whittle the problem down to the following console application.
The 'Big' table in the following code has just under 112 million records and
contains 2 integer columns and 4 text columns (full_name_1,
full_name_2,sort_name_1,sort_name_2). It has one (unique) index on the 2
integer columns. I did a vacuum on the database beforehand. Apart from
background tasks the console app was the only one running.
#include <vcl.h>
#include <windows.h>
#pragma hdrstop
#pragma argsused
#include <tchar.h>
#include <stdio.h>
#include <conio.h>
#include <iostream>
#include <vector>
#include "sqlite.h"
std::vector<int64_t> v;
const int Size[]={112000000,1000000000}; // 112 million, 1 billion
int _tmain(int argc, _TCHAR* argv[])
{
sqlite3 *DB;
sqlite3_open("c:/SQLiteData/Formbook.db",&DB);
sqlite3_stmt *stmt;
sqlite3_prepare_v2(DB,"select RowID from Big order by
RowID",-1,&stmt,NULL);
for (int i=0; i<2; i++)
{
v.resize(Size[i]);
// NB the exact same code is executed whether i is 0 or 1. The
only thing that
// changes is the size() of v and v isn't even used in the
timed code below.
clock_t Start=clock();
while (sqlite3_step(stmt)==SQLITE_ROW) {}
// Above just steps through stmt (111,724,900 steps to be
exact).
std::cout << 1.0*(clock()-Start)/CLOCKS_PER_SEC << std::endl;
sqlite3_reset(stmt);
}
sqlite3_finalize(stmt);
sqlite3_close(DB);
getch();
return 0;
}
5 sets of timings (secs) were as follows
i==0 i==1
17.610 24.172
20.344 24.594
19.953 24.375
19.891 23.594
19.938 25.516
I can't understand why the second pass (exact same code executed) takes an
average of 4.8 secs longer.
To add to the puzzle
I tried making v an int64_t*, replaced the resize with v=new int64_t[Size[i]]
and added delete [] v to the end of the inner block. Pretty much the same thing
as far as memory goes yet the average timings for i==0 and i==1 were almost
identical at around 17.4.
I tried replacing the sqlite related code with non-sqlite code e.g. populating
v up to v.begin+112000000 using rand(). Again the timing anomaly disappeared.
I swapped Size[0] with Size[1] so that the resize was going from large to
small. The timings on each pass were then similar but were still around the
24.xxx mark associated with the i==1 pass.
I copied the Big table from its current db to a new db where it was the only
table. I then redid the timings using the new db but there was no real change.
i==0 i==1
17.594 25.672
20.563 24.406
21.219 23.843
20.484 25.343
20.562 25.172
I lastly tried gradually lowering Size[1] from 1 billion to 200 million step
200 million. There was little difference to the results for Size[1]=800,000,000
but at Size[1]=600,000,000 the difference was down to under a second. At
Size[1]=400,000,000 timings for i==0 and i==1 were pretty much the same.
While that last test suggests it must be a memory issue it begs the question
why was there no difference in timings for i==0 & i==1 when an array was used
rather than a vector? Surely the memory requirements are very similar?
The timing anomaly was still apparent when Size[1]=800 million so why, when
you've got 16 GB ram, does a 6.4 GB vector cause any problems?
What's it got to do with sqlite, if anything? Why was sqlite_step slowed down?
Any suggestions appreciated.
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users