Thanks a lot for the explanations. KW> There is no good reason to invent composite combining marks KW> involving two accents together. (In fact, there are good reasons KW> *not* to do so.) The few that exist, e.g. U+0344, cause KW> implementation problems and are discouraged from use.
What are those problems? As long as they have canonical decompositions, won't such precomposed characters be discared at normalisation time, hopefully during I/O? (I'm not arguing in favour of precomposed characters; I'm just saying that my gut instinct is that we have to deal with normalisation anyway, and hence they don't complicate anything further; I'd be curious to hear why you think otherwise.) >> As far as I can tell, there is nothing in the Unicode database that >> relates a ``modifier letter'' to the associated punctuation mark. KW> Correct. They are viewed as distinct classes. >> does anyone [have] a map from mathematical characters to the >> Geometric Shapes, Misc. symbols and Dingbats that would be useful >> for rendering? KW> As opposed to the characters themselves? I'm not sure what you KW> are getting at here. The user invokes a search for ``f o g'' (the composite of g with f), and she entered U+25CB WHITE CIRCLE. The document does contain the required formula, but encoded with U+2218 RING OPERATOR. The user's input was arguably incorrect, but I hope you'll agree that the search should match. I'm rendering a document that contains U+2218. The current font doesn't contain a glyph associated to this codepoint, but it has a perfectly good glyph for U+25CB. The rendering software should silently use the latter. Analogous examples can be made for the ``modifier letters''. I'll mention that I do understand why these are encoded separately[1], and I do understand why and how they will behave differently in a number of situations. I am merely noting that there are applications (useful-in-practice search, rendering) where they may be identified or at least related, and I am wondering whether people have already compiled the data necessary to do so. Thanks again, Juliusz [1] Offtopic: I have mixed feelings on the inclusion of STICS. On the one hand it's great to at last have a standardised encoding for math characters, on the other I feel it is based on very different encoding principles than the rest of Unicode.