Re: Another Querry

Doug Ewell Tue, 23 Nov 2004 22:08:01 -0800

Harshal Trivedi <harshal dot trivedi at gmail dot com> wrote:

> How can i determine end of UCS-2/UCS-4  string while encoding it in C
> program?
> Normal C string ends with '\0' - ASCII NULL as terminating
> character.What symbol,pattern or character in UCS-2 or UCS-4
> substitutes that ASCII NULL as termination symbol.


You wouldn't normally use the ordinary C string type to encode a UTF-16
(not UCS-2, please) or UCS-4 string.  They're not meant for that, for
exactly the reason your question implies: incidental zero-bytes will
cause premature termination of the string, because almost all C
implementations assume an 8-bit encoding.

The solution is either to use UTF-8, or use "wide character" strings
based on 16-bit (or, less likely, 32-bit) "character" units.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Another Querry

Reply via email to