Doug Reeder <reeder...@gmail.com> wrote: > Does SQLite treat strings as sequences of opaque 16-bit values, except > for the wildcard operators for LIKE and GLOB? Does it care about > surrogate code points? Does it care about FDD0 to FDEF ?
SQLite knows something about surrogate pairs - this knowledge is required in order to properly convert between UTF-16 and UTF-8. I'm not sure whether " x LIKE '_' " would match a string that consists of a surrogate pair - in other words, whether a surrogate pair counts as one character or as two. You can figure this out experimentally, if you care. SQLite knows about some properties of characters in ASCII range. E.g. LIKE is case-insensitive by default, and " 'A' LIKE 'a' " is true for plain vanilla latin A, but not for, say, cyrillic A or greek alpha or italian A with grave. All other characters are treated as opaque bits to be shuffled around, in the out-of-the-box configuration. However, it's possible to install custom collations and custom implementations of LIKE, GLOB and MATCH that are more aware of the properties of Unicode characters. ICU extension does just that: http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt -- Igor Tandetnik _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users