[sqlite] Slow real world performance - Any suggestions please (warning long)

Keith Medcalf Sat, 04 Jul 2015 13:13:38 -0400

> Clearly, in this case, using COLLATE NOCASE in the table
> definition is the right thing to do. Under what conditions
> would using it in the index instead be the right
> thing to do?


When you want the default to be case sensitive, and only use collate nocase 
when you specifically specify that you wish the operation to utilize that 
collation for that *ONE* specific operation in which you have specified it.  
For example:

create table junk
(
  data text
);

create index junkindex on junk (data collate nocase);

select data from junk where data = 'yoohoo' collate nocase order by data;

will result in the where clause using the index to do case insensitive 
searching, but the order by clause will not use the index and will cause an 
extra sort step because it requires a case sensitive sort.

select data from junk where data = 'yahoo' order by data collate nocase;

on the other hand will do a scan of the entire index, in order, in order to get 
the "right" result order.  For each row the data will be compared, in a case 
sensitive manner, with the column data.

select data from junk where data = 'yahoo' collate nocase order by data collate 
nocase;

will search the index and return the rows in order.  Assuming that the table 
has a billion rows and that there are a million rows satisfying the query 
constraints for each case above, the first two could take several hours to 
execute.  The later one will use the index and only the index and will take a 
few milliseconds.

The reason for putting the "collate nocase" on the column definition in the 
table is so that it applied to the column in the table.  Then you do not need 
to specify such foolishness as:

select a from x where g in (select b from y where x = 'test' collate nocase) 
collate nocase order by g collate nocase;

where if you forget the "collate nocase" anywhere where it is needed your case 
insensitive indexes are not used and the time taken to satisfy your query 
increases exponentially.

Never in 50 years have I come across an instance where the collation needs 
separate application (for example in an index) rather than application to every 
operation by applying the collation using a column definition.  Either the 
underlying data is case insensitive (or some other collation) or it is not.  I 
have never seen real data stored in a column that is case sensitive (or some 
other collation) in one place, but of different treatment somewhere else.  
Except of course when that was done in error (or an ill-conceived 
implementation by someone not understanding the nature of the data and the 
system requirements).

[sqlite] Slow real world performance - Any suggestions please (warning long)

Reply via email to