>I think that my problem is in using LIKE expression for non-ascii strings.
>Database encode is UTF-8. When table data in the "base" column (see my 
>first
>message for structure) consists of english symbols (ascii) LIKE works
>correct, but when I'm trying to execute it on strings consists of UTF8
>symbols non compatible with ascii it doesn't work.

As far as I understand it (please correct if this is wrong), SQLite 
will store/retrieve data "as supplied".  If you hand it ANSI strings, 
you get them stored verbatim and can retrieve them verbatim.

But and that's a big BUT without pun, if you use any function acting on 
data, just as LIKE, it reads UTF-* data (replace * by the encoding you 
used for creating the database) and _replace badly formed UTF-* 
characters by the Unicode u+FFFD (invalid codepoint) marker.  From then 
on, the changed data no longer compares with the original string since 
some characters are destroyed in the process.  I suppose indexing with 
anything else than the default BINARY collation is likely to produce 
erroneous results.

The solution is to build a true UTF-8 (or UTF-16) database.  The 
simplest way could be (again correction is welcome) to rebuild the base 
from the original data, once it is converted to valid UTF-* suitable 
for input.  If your only data source now is the current base itself 
(input data no more available) you could try to dump the base with the 
command line utility and work from there.





_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to