On 2 Jul 2015, at 11:16pm, Rob Willett <rob.sqlite at robertwillett.com> wrote:

> We process this XML feed and pull out chunks of the XML, process it and 
> update our database with it. This database is currently 16GB in size. Our 
> concern is the time taken to process each file every five minutes. It has 
> been getting steadily longer and longer. It started out at around 6 seconds 
> and is now around 22-24 seconds. Whilst we expect processing time to get 
> bigger, we are getting concerned about the performance and decided to have a 
> look.

This suggests fragmentation.  There can be fragmentation at various levels for 
a SQLite database.  Fortunately it's simple to get rid of them.  First, take 
some timings.

Using the '.dump' and '.read' commands from the command-line tool to create a 
new database.  It will be created with each row in each table in primary-key 
order.  Once you've done that defragment the disk the database is stored on 
using your OS tools (if possible).  Once you've done that take the same timings 
and see whether anything improved.

> The database (DB) we use is only ever read and written by a single process, 
> we do not network the database

Don't forget that write access to the database require exclusive access to the 
entire database.  You could implement some fancy multi-processing scheme only 
to find that it's defeated by the SQLite library.

> The actual SQL called 5,000 times is 
> 
> ?select Id,Calculation,Peak,Red,Amber,Green from RAG where text = ??

[...]
> 
> The RAG table schema is 
> 
> CREATE TABLE "RAG" (
>        "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
>        "Count" integer NOT NULL DEFAULT 0,
>        "Text" TEXT,
>        "Peak" integer,
>        "Calculation" integer NOT NULL DEFAULT 0,
>        "Red" integer DEFAULT 0,
>        "Amber" integer DEFAULT 0,
>        "Green" integer DEFAULT 0,
>        "BayesAttributes" TEXT
> );
> 
> It has four indexes on it 
> 
> CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC);

Do you ever depend on any indexing on the "Text" column which is not COLLATE 
NOCASE ASC ?  If not, then you should be able to speed up your search by 
defining the column the way you think of it.  So in your table definition use

         "Text" TEXT COLLATE NOCASE ASC,

You may find that this immediately speeds up the search.  Or you may find that 
you may have to change your SELECT to

SELECT Id,Calculation,Peak,Red,Amber,Green FROM RAG WHERE text = ? ORDER BY 
Text COLLATE NOCASE ASC

Simon.

Reply via email to