Thanks a lot for the explanations.

KW> There is no good reason to invent composite combining marks
KW> involving two accents together. (In fact, there are good reasons
KW> *not* to do so.) The few that exist, e.g. U+0344, cause
KW> implementation problems and are discouraged from use.

What are those problems?  As long as they have canonical
decompositions, won't such precomposed characters be discared at
normalisation time, hopefully during I/O?

(I'm not arguing in favour of precomposed characters; I'm just saying
that my gut instinct is that we have to deal with normalisation
anyway, and hence they don't complicate anything further; I'd be
curious to hear why you think otherwise.)

>> As far as I can tell, there is nothing in the Unicode database that
>> relates a ``modifier letter'' to the associated punctuation mark.

KW> Correct. They are viewed as distinct classes.

>> does anyone [have] a map from mathematical characters to the
>> Geometric Shapes, Misc. symbols and Dingbats that would be useful
>> for rendering?

KW> As opposed to the characters themselves? I'm not sure what you
KW> are getting at here.

The user invokes a search for ``f o g'' (the composite of g with f),
and she entered U+25CB WHITE CIRCLE.  The document does contain the
required formula, but encoded with U+2218 RING OPERATOR.  The user's
input was arguably incorrect, but I hope you'll agree that the search
should match.

I'm rendering a document that contains U+2218.  The current font
doesn't contain a glyph associated to this codepoint, but it has a
perfectly good glyph for U+25CB.  The rendering software should
silently use the latter.

Analogous examples can be made for the ``modifier letters''.

I'll mention that I do understand why these are encoded separately[1],
and I do understand why and how they will behave differently in a
number of situations.  I am merely noting that there are applications
(useful-in-practice search, rendering) where they may be identified or
at least related, and I am wondering whether people have already
compiled the data necessary to do so.

Thanks again,

                                        Juliusz

[1] Offtopic: I have mixed feelings on the inclusion of STICS.  On the
one hand it's great to at last have a standardised encoding for math
characters, on the other I feel it is based on very different encoding
principles than the rest of Unicode.

Reply via email to