On 3 Jul 2015, at 11:35am, Rob Willett <rob.sqlite at robertwillett.com> wrote:

> CREATE TABLE "RAG" (
>         "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
>         "Count" integer NOT NULL DEFAULT 0,
>         "Text" TEXT COLLATE NOCASE ,
>         "Peak" integer,
>         "Calculation" integer NOT NULL DEFAULT 0,
>         "Red" integer DEFAULT 0,
>         "Amber" integer DEFAULT 0,
>         "Green" integer DEFAULT 0,
>         "BayesAttributes" TEXT
> );

[...]

> CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "RAG_Idx2" ON RAG ("Calculation" COLLATE NOCASE ASC, 
> "Text" COLLATE NOCASE ASC);
> CREATE INDEX "Rag_Idx3" ON RAG ("Calculation" COLLATE NOCASE ASC, 
> "BayesAttributes" COLLATE NOCASE ASC);

To explain further, now that you have declared your "Text" column as being 
COLLATE NOCASE, NOCASE is now the default collation for it, and all sorting and 
matching on Text will be NOCASE unless you state otherwise.  So it's not 
necessary to state COLLATE NOCASE when you create indexes on it or mention it 
elsewhere.  It won't do any harm, though.  Shout out to Clemens for his 
corrections to my earlier post.

I notice that "Calculation" is an INTEGER column, and that "BayesAttributes" is 
a TEXT column but in "Rag_Ids3" you have used "Calculation" COLLATE NOCASE.  I 
was wondering whether you had meant to use "BayesAttributes" TEXT COLLATE 
NOCASE in your table definition instead, and rewrite the index accordingly.  
There doesn't seem to be any point in having "BayesAttributes" be 
case-sensitive.

Since you've significantly re-done your table definition and indexes, now is a 
good time to run ANALYZE.  It may or may not help, but it will overwrite data 
which is no longer corrent.

> The second biggest improvement was the 10 secs saved by dumping the database 
> and rebuilding from the dump file. We may have got a better increase if we 
> did that first and then the COLLATE NOCASE which would probably then have a 
> lesser increase :) 

While the improvement from rewriting the table definition will be permanent, 
the improvement from defragmentation will gradually reduce as the database gets 
changed in a fragmented way.  So you may find out !

However, it should not be necessary to dump and rebuild your database on a 
monthly basis.  SQLite is unusually efficient at dealing with fragmented 
database pages, and OS X, of course, has background defragmentation built in.  
I doubt that any difference .dump & .read would make is worth doing that amount 
of clunky data processing on a regular basis.  However, if you do decide to do 
it, you should do an ANALYZE before using the newly remade database.  Including 
when you do this to your production database after implementing the changes 
you've decided on in this thread.

Delighted we could make such a big difference.

Simon.

Reply via email to