> 2011/11/14 Philippe Verdy <verd...@wanadoo.fr>: >> And arguably, I have also wanted this since long, instead of the hacks >> introduced by the so called "double" diacritics and "half" diacritics >> that break the character identity of those diacritics and also >> introduce encoding ambiguities. >> >> In fact, those things would have been encoded since long if Unicode >> and ISO 10646 had extended their character model to cover a broader >> range of "structured character clusters". >> >> Two format characters (with combining class 0 for the purpose of >> normalizations) would have been enough for most applications: >> - U+xxx0 BEGIN EXTENDED CLUSTER (BEC) >> - U+xxx1 END EXTENDED CLUSTER (EEC) >> And then you would have encoded the standard diacritics after the >> sequence enclosed by these characters, for example cartouches (using >> an enclosing diacritic). >> >> A third format control would have been used as well to specify that >> two clusters (simple letters or letters with simple diacritics, and >> including extended clusters) would stack vertically instead of >> horizontally. With this third one, the basic structure would be >> encodable really as plain-text. >> >> Yes this would have not worked with today's OpenType specifications, >> but this would have been the place for extending those specifications >> and not something blocking the encoding process. i am still convinced >> that this should not be part of an "upper-layer standard', which is >> not interoperable, and complicates the integration of those >> pseudo-encoded texts. >> >> Once the structure is encoded as such, there is still the possibility >> to create a linear graphical representation as a reasonnable readable >> fallback exhibiting the structure unambiguously, even if the text >> renderer cannot produce the 2D layout (you just need to make those >> format controls visible by themselves with a glyph, or some other >> meaning offered in the text renderer, including with colors or various >> style effects).
We don't need new special characters nor new half-characters nor new ccc as I proposed above. No! We already have the Annotation Characters! It is possible to use something like U+FFF9 ANNOTATION ANCHOR РКГ U+FFFA ANNOTATION SEPARATOR U+0483 COMBINING TITLO U+FFFB ANNOTATION TERMINATOR for Cyrrilic number 123 (РКГ under titlo). This way also titlos wit supralinear leters (like SLOVO TITLO, TVERDO TITLO, see http://ru.wikipedia.org/wiki/Титло) are implementable. The only question is right processing of annotation chunkes that start with nonstarter. I mean a being a combining character, without a base character, chunk of multiline annotation should use previous chunk as base (in best application).