´¯¯¯
>No, I meant exactly UCS-2. Because UCS-2 guarantees that all symbols
>are represented by 2 bytes when UTF-16 does not. And I had an
>understanding that Doug said about this 16-bit guarantee. Also if
>we're talking about encoding where any character can be represented by
>a single variable of type wchar_t then we can talk only about UCS-2 or
>UCS-4, not about UTF-* variants. Though of course someone can talk
>about UTF-16 keeping in mind and relying on the fact that he will not
>ever deal with characters not fitting into 2 bytes in UTF-16 encoding
>and thus he effectively will work with UCS-2.
`---

I didn't see Doug mention anyhow a fixed-length guarantee.

Granted, UCS-2 had the considerable advantage of a fixed-length 
codepoint representation.

Nonetheless this doesn't imply that a single "character" is represented 
by one codepoint exactly, except if the application declares conformity 
to ISO 10646-1 (level 1) where only precomposed "starter" characters 
are allowed (for instance all combining codepoints are forbidden).

It is most probably safe to pass such data to other Unicode 
application, but the reverse is not true: a perfectly valid Unicode 
sequence containing only low (plane 0) codes may be ill-formed wrt UCS.





_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to