On 2 Jul 2015, at 11:16pm, Rob Willett <rob.sqlite at robertwillett.com> wrote:
> We process this XML feed and pull out chunks of the XML, process it and > update our database with it. This database is currently 16GB in size. Our > concern is the time taken to process each file every five minutes. It has > been getting steadily longer and longer. It started out at around 6 seconds > and is now around 22-24 seconds. Whilst we expect processing time to get > bigger, we are getting concerned about the performance and decided to have a > look. This suggests fragmentation. There can be fragmentation at various levels for a SQLite database. Fortunately it's simple to get rid of them. First, take some timings. Using the '.dump' and '.read' commands from the command-line tool to create a new database. It will be created with each row in each table in primary-key order. Once you've done that defragment the disk the database is stored on using your OS tools (if possible). Once you've done that take the same timings and see whether anything improved. > The database (DB) we use is only ever read and written by a single process, > we do not network the database Don't forget that write access to the database require exclusive access to the entire database. You could implement some fancy multi-processing scheme only to find that it's defeated by the SQLite library. > The actual SQL called 5,000 times is > > ?select Id,Calculation,Peak,Red,Amber,Green from RAG where text = ?? [...] > > The RAG table schema is > > CREATE TABLE "RAG" ( > "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, > "Count" integer NOT NULL DEFAULT 0, > "Text" TEXT, > "Peak" integer, > "Calculation" integer NOT NULL DEFAULT 0, > "Red" integer DEFAULT 0, > "Amber" integer DEFAULT 0, > "Green" integer DEFAULT 0, > "BayesAttributes" TEXT > ); > > It has four indexes on it > > CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC); Do you ever depend on any indexing on the "Text" column which is not COLLATE NOCASE ASC ? If not, then you should be able to speed up your search by defining the column the way you think of it. So in your table definition use "Text" TEXT COLLATE NOCASE ASC, You may find that this immediately speeds up the search. Or you may find that you may have to change your SELECT to SELECT Id,Calculation,Peak,Red,Amber,Green FROM RAG WHERE text = ? ORDER BY Text COLLATE NOCASE ASC Simon.