Doug Reeder <reeder...@gmail.com> wrote:
> Does SQLite treat strings as sequences of opaque 16-bit values, except
> for the wildcard operators for LIKE and GLOB?  Does it care about
> surrogate code points?  Does it care about FDD0 to FDEF ?

SQLite knows something about surrogate pairs - this knowledge is required in 
order to properly convert between UTF-16 and UTF-8. I'm not sure whether " x 
LIKE '_' " would match a string that consists of a surrogate pair - in other 
words, whether a surrogate pair counts as one character or as two. You can 
figure this out experimentally, if you care.

SQLite knows about some properties of characters in ASCII range. E.g. LIKE is 
case-insensitive by default, and " 'A' LIKE 'a' " is true for plain vanilla 
latin A, but not for, say, cyrillic A or greek alpha or italian A with grave.

All other characters are treated as opaque bits to be shuffled around, in the 
out-of-the-box configuration. However, it's possible to install custom 
collations and custom implementations of LIKE, GLOB and MATCH that are more 
aware of the properties of Unicode characters. ICU extension does just that:

http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt

-- 
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to