[sqlite] Possible substr() optimization?

R.Smith Thu, 23 Jul 2015 21:49:52 +0200


On 2015-07-23 04:56 PM, Igor Tandetnik wrote:
> On 7/23/2015 10:47 AM, Bernardo Sulzbach wrote:
>>> select * from tbl1 where substr(col1,1,1) in ('A','a') and col1 like 
>>> 'a_c%';
>>
>> Just adding to Igor's answer:
>>   col1 between 'a' and 'b' or col1 between 'A' and 'B'
>
> That's not quite the same. BETWEEN is inclusive of both ends. ('b' 
> BETWEEN 'a' and 'b') is true.


To add:

I think someone suggested before to do things like  BETWEEN 'a' AND 
'azzzzzzzzzzzz' or similar because 'a' < 'aa' so alphabetical this 
should be correct. Of course if you wish to catch all words starting 
with 'a' the best is to supply the highest standard-printable character 
after the a, which in the normal ASCII set should be ~ so that BETWEEN 
'a' AND 'a~' should catch all normal ASCII strings, and then the 
extended set goes to 254 (which is hard to type on a keyboard) but you 
can do BETWEEN 'a' AND 'a'||char(254) etc.
ref: http://www.asciitable.com/   [1]

Of course once you get to UNICODE strings, the highest valid character 
that can follow 'a' becomes more complicated and encoding-dependent.

Therefore it is (to me) always the best to do: LIKE 'a%' - which the 
SQLite Query Planner knows to optimize and use against an index - plus 
keep using ci type collations (NOCASE in SQLite).

"WHERE substring(X,1,Y)" could similarly be optimized I guess, though 
that would fall in a category of use-cases that should be exceedingly 
rare (but if easy to do, can't be harmful to add).


Cheers,
Ryan


[1] For fun, notice how the characters [, /, ], ^ and _ follows after 
the capitals but before the lowercase letters, which ends up in some 
unexpected sorting cases where the following are all true:
'A' < '_A'  but 'a' > '_a'
or put another way:
'JohnAnderson' < 'John_Anderson' - but - lowercase('JohnAnderson') > 
lowercase('John_Anderson')
whereas...
'JohnAnderson' > 'John Anderson' - and - lowercase('JohnAnderson') > 
lowercase('John Anderson')

also...
'{A}' > '[b]' > '(z)'
'{A{B}' < '{A|B}' but '{A}B}' > '{A|B}'

[sqlite] Possible substr() optimization?

Reply via email to