Harshal Trivedi <harshal dot trivedi at gmail dot com> wrote: > How can i determine end of UCS-2/UCS-4 string while encoding it in C > program? > Normal C string ends with '\0' - ASCII NULL as terminating > character.What symbol,pattern or character in UCS-2 or UCS-4 > substitutes that ASCII NULL as termination symbol.
You wouldn't normally use the ordinary C string type to encode a UTF-16 (not UCS-2, please) or UCS-4 string. They're not meant for that, for exactly the reason your question implies: incidental zero-bytes will cause premature termination of the string, because almost all C implementations assume an 8-bit encoding. The solution is either to use UTF-8, or use "wide character" strings based on 16-bit (or, less likely, 32-bit) "character" units. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

