David J. Perry scripsit:

> U+03AC and U+1F71 both have canonical decompositions to U+03B1 followed
> by U+0301.  (There are other similar pairs in the Greek blocks.)  If an
> application applies normalisation form C both decompose to the same
> string; will the resulting recomposed character be 03AC or 1F71?  I
> suspect the former, but I'd like to know if this is correct and if so,
> how this is determined.

U+1F71 is what is called a singleton, having a single-character canonical
decomposition, which means that it is not used when recomposing.
Such characters are essentially clones that arrived in Unicode either
for roundtrippability or (in this case) because of a misunderstanding,
namely the belief that TONOS and OXIA were distinct accents.

-- 
John Cowan           http://www.ccil.org/~cowan              [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
        --_The Hobbit_

Reply via email to