[sqlite] Disk caching impacts performance.

Julien Renggli Thu, 08 Nov 2007 02:45:54 -0800

Hello,


I'm currently working on a project which is very likely to use SQLite as
data storage. Since performance is important for us, I already found on
the SQLite website many ways to optimise the code (always working in
transactions where possible, using a page size of 4096 since it's
running on the Windows platform, using integers primary keys, ...). But
I have one problem that I "solved" in an unorthodox way; it works, but
maybe you have a better solution than mine?

 

I've been doing some test with a ~100 MB database, in which I have three

tables: one for structured data (Objects, 2000 entries), one for the
blobs we have to store (ObjectBlobs ID INTEGER PRIMARY KEY, Data BLOB)
(8000 entries), and one which binds the structured data and the blobs
(8000 entries). As you can imagine, each Object has 4 blobs linked to
it; the blobs can be quite large (let's say up to 1 MB).

My (C++) application just has to read the table "Objects", and one of
the Blobs for each Object.

 

Now the first time I run my application, it takes quite a long time

(30s) to load the blobs. But if I re-run the app, it only takes 1s to
load them. It's clearly a disk caching issue: if I copy huge files to
the disk between two runs, it takes again 30s to load the blobs (i.e.

the DB is no more in the disk cache). Profiling the application
indicates sqlite::winRead() is the bottleneck.

 

I then had the following idea: SQLite is probably reading the file
randomly, depending on where the data lies. If I can force the DB to be
cached, everything should be fine. So before connecting the database, I
first read it sequentially (using a C++ ifstream) until the end of file.

It perfectly solves the database problem, even though I still notice a
difference (3s to read the file on 1st run, 0.2s later). But 3s is OK
where 30s was worrying me.

 

I hope I explain the situation clear enough, and ask you now: is it the
only way to do it? I find the trick a bit nasty and don't like it; maybe
I missed something? Before you ask: I tried to VACUUM the DB, it only
reduced the timings to 16s, which was still bad for our requirements.

Tests with a larger DB (it can get much bigger than my example) and on
different machines tend to confirm my theory.

 

Thanks in advance (and a big thank for SQLite which is really nice and
easy to use !),

 

Julien Renggli

 

P.S.: Some technical informations:

- sqlite v.3.3.16 (we will upgrade to the latest version later), C API

- Windows XP SP2

- Timings on Pentium 4 3.4GHz, 2GB RAM

[sqlite] Disk caching impacts performance.

Reply via email to