On 19 Mar 2014, at 3:36pm, Alex Loukissas <a...@maginatics.com> wrote:

> Thanks everyone for your comments. IIUC, the correct way of going about
> what I want to do is to use BINARY collation on the column I'm interested
> in and when I want to do unicode-aware case-insensitive lookups, they
> should look something like SELECT * FROM table WHERE LOWER(col_name) =
> LOWER(key), correct? It seems like with ICU support, LOWER( ) will call
> u_foldCase under the covers, which is what I want.

This solution suggests that a good compromise for handling Unicode is a hashing 
function.  It would be equivalent to NOCASE, but for Unicode characters, and 
instead of just removing case it would also remove accents and various other 
'hints'.

How it would handle the various unicode characters which have no equivalent in 
the any alphabet, I have no idea.  It would be reasonable for all 'Right Arrow' 
characters to have the same hash, but how much data about Unicode would it take 
to do that ?  Maybe it should just leave all non-alphabetic characters as they 
are.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to