On 2 Jul 2015, at 11:16pm, Rob Willett <rob.sqlite at robertwillett.com> wrote:
> We process this XML feed and pull out chunks of the XML, process it and
> update our database with it. This database is currently 16GB in size. Our
> concern is the time taken to process each file every five minutes. It has
> been getting steadily longer and longer. It started out at around 6 seconds
> and is now around 22-24 seconds. Whilst we expect processing time to get
> bigger, we are getting concerned about the performance and decided to have a
> look.
This suggests fragmentation. There can be fragmentation at various levels for
a SQLite database. Fortunately it's simple to get rid of them. First, take
some timings.
Using the '.dump' and '.read' commands from the command-line tool to create a
new database. It will be created with each row in each table in primary-key
order. Once you've done that defragment the disk the database is stored on
using your OS tools (if possible). Once you've done that take the same timings
and see whether anything improved.
> The database (DB) we use is only ever read and written by a single process,
> we do not network the database
Don't forget that write access to the database require exclusive access to the
entire database. You could implement some fancy multi-processing scheme only
to find that it's defeated by the SQLite library.
> The actual SQL called 5,000 times is
>
> ?select Id,Calculation,Peak,Red,Amber,Green from RAG where text = ??
[...]
>
> The RAG table schema is
>
> CREATE TABLE "RAG" (
> "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
> "Count" integer NOT NULL DEFAULT 0,
> "Text" TEXT,
> "Peak" integer,
> "Calculation" integer NOT NULL DEFAULT 0,
> "Red" integer DEFAULT 0,
> "Amber" integer DEFAULT 0,
> "Green" integer DEFAULT 0,
> "BayesAttributes" TEXT
> );
>
> It has four indexes on it
>
> CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC);
Do you ever depend on any indexing on the "Text" column which is not COLLATE
NOCASE ASC ? If not, then you should be able to speed up your search by
defining the column the way you think of it. So in your table definition use
"Text" TEXT COLLATE NOCASE ASC,
You may find that this immediately speeds up the search. Or you may find that
you may have to change your SELECT to
SELECT Id,Calculation,Peak,Red,Amber,Green FROM RAG WHERE text = ? ORDER BY
Text COLLATE NOCASE ASC
Simon.