Problem with accented charactersWilliam Tay wrote:

> Can anyone explain why an accented character is sometimes represented
> as a base character plus its accent?  For example, the utf-8
> representation for à is 65 CC 81, which is the utf-8 representation
> for e and the accent, instead of C3 A9?  I find that this is how MacOS
> X represents accented characters.

The two characters U+0065 and U+0301 (eÌ) are canonically equivalent to
the single character U+00E9 (Ã).  That is, the two-character combining
sequence is supposed to be considered equivalent to the single
precomposed character.  Apparently MacOS X, or at least one application
running under it, does use the combining sequence.

> How can a C application that receives such utf-8 encoded characters
> handle them correctly?  Appreciate your comments.

It must understand normalization.  See TUS 4.0, section 5.6 for more
information.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/



Reply via email to