> Jon Hanna wrote:
> 
> >>imported UTF-8 sequences like [U+0065][U+0303] <e, tilde> get 
> >>remapped 
> >>internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE.
> >>
> >>Is this kind of behavior what one would expect?
> >>    
> >>
> >
> >That's conformant, if it causes problems with any other process (including
> >other processes that are part of the system in question) then that other
> >process isn't complying with conformance clause C9.
> >  
> >

And Eric Muller asked:
 
> But what if U+1ebd is not part of the repertoire supported by that other 
> process?

Ah, but there is "support" and then there is "support".

A conformant implementation can pick and choose the repertoire is
supports for some text processes, e.g. for display. No font is
required to support display of *all* Unicode characters, and
that could perfectly well apply to U+1EBD.

However, implementations don't get to pick and choose so easily
about aspects of the standard such as encoding forms and normalization.
You can't, for example, recognize that <U+006E, U+0303> is canonically
equivalent to U+00F1 (ñ), but claim *not* to recognize that 
<U+0065, U+0303> is likewise canonically equivalent to U+1EBD, simply
because U+1EBD is not in a range that your implementation chooses
to "interpret" for display. Such, broken, partial recognitions of
canonical equivalence would represent non-conformant implementations
of normalization. That is also why most implementations should depend
on library code for normalization, where the library code specifically
claims to be a conformant implementation of normalization -- and
handles *all* Unicode characters correctly.

--Ken



Reply via email to