> Jon Hanna wrote: > > >>imported UTF-8 sequences like [U+0065][U+0303] <e, tilde> get > >>remapped > >>internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE. > >> > >>Is this kind of behavior what one would expect? > >> > >> > > > >That's conformant, if it causes problems with any other process (including > >other processes that are part of the system in question) then that other > >process isn't complying with conformance clause C9. > > > >
And Eric Muller asked: > But what if U+1ebd is not part of the repertoire supported by that other > process? Ah, but there is "support" and then there is "support". A conformant implementation can pick and choose the repertoire is supports for some text processes, e.g. for display. No font is required to support display of *all* Unicode characters, and that could perfectly well apply to U+1EBD. However, implementations don't get to pick and choose so easily about aspects of the standard such as encoding forms and normalization. You can't, for example, recognize that <U+006E, U+0303> is canonically equivalent to U+00F1 (ñ), but claim *not* to recognize that <U+0065, U+0303> is likewise canonically equivalent to U+1EBD, simply because U+1EBD is not in a range that your implementation chooses to "interpret" for display. Such, broken, partial recognitions of canonical equivalence would represent non-conformant implementations of normalization. That is also why most implementations should depend on library code for normalization, where the library code specifically claims to be a conformant implementation of normalization -- and handles *all* Unicode characters correctly. --Ken