´¯¯¯ >No, I meant exactly UCS-2. Because UCS-2 guarantees that all symbols >are represented by 2 bytes when UTF-16 does not. And I had an >understanding that Doug said about this 16-bit guarantee. Also if >we're talking about encoding where any character can be represented by >a single variable of type wchar_t then we can talk only about UCS-2 or >UCS-4, not about UTF-* variants. Though of course someone can talk >about UTF-16 keeping in mind and relying on the fact that he will not >ever deal with characters not fitting into 2 bytes in UTF-16 encoding >and thus he effectively will work with UCS-2. `---
I didn't see Doug mention anyhow a fixed-length guarantee. Granted, UCS-2 had the considerable advantage of a fixed-length codepoint representation. Nonetheless this doesn't imply that a single "character" is represented by one codepoint exactly, except if the application declares conformity to ISO 10646-1 (level 1) where only precomposed "starter" characters are allowed (for instance all combining codepoints are forbidden). It is most probably safe to pass such data to other Unicode application, but the reverse is not true: a perfectly valid Unicode sequence containing only low (plane 0) codes may be ill-formed wrt UCS. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users