Also, a few keyboard layouts generate text that is partly decomposed, for ease of typing (e.g., Vietnamese).
Deborah Goldsmith Internationalization, Unicode liaison Apple Computer, Inc. [EMAIL PROTECTED]
On Aug 23, 2004, at 11:51 AM, Doug Ewell wrote:
Problem with accented charactersWilliam Tay wrote:
Can anyone explain why an accented character is sometimes represented as a base character plus its accent? For example, the utf-8 representation for é is 65 CC 81, which is the utf-8 representation for e and the accent, instead of C3 A9? I find that this is how MacOS X represents accented characters.
The two characters U+0065 and U+0301 (é) are canonically equivalent to the single character U+00E9 (é). That is, the two-character combining sequence is supposed to be considered equivalent to the single precomposed character. Apparently MacOS X, or at least one application running under it, does use the combining sequence.
How can a C application that receives such utf-8 encoded characters handle them correctly? Appreciate your comments.
It must understand normalization. See TUS 4.0, section 5.6 for more information.
-Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/