> 2011/11/14 Philippe Verdy <verd...@wanadoo.fr>:
>> And arguably, I have also wanted this since long, instead of the hacks
>> introduced by the so called "double" diacritics and "half" diacritics
>> that break the character identity of those diacritics and also
>> introduce encoding ambiguities.
>>
>> In fact, those things would have been encoded since long if Unicode
>> and ISO 10646 had extended their character model to cover a broader
>> range of "structured character clusters".
>>
>> Two format characters (with combining class 0 for the purpose of
>> normalizations) would have been enough for most applications:
>> - U+xxx0 BEGIN EXTENDED CLUSTER (BEC)
>> - U+xxx1 END EXTENDED CLUSTER (EEC)
>> And then you would have encoded the standard diacritics after the
>> sequence enclosed by these characters, for example cartouches (using
>> an enclosing diacritic).
>>
>> A third format control would have been used as well to specify that
>> two clusters (simple letters or letters with simple diacritics, and
>> including extended clusters) would stack vertically instead of
>> horizontally. With this third one, the basic structure would be
>> encodable really as plain-text.
>>
>> Yes this would have not worked with today's OpenType specifications,
>> but this would have been the place for extending those specifications
>> and not something blocking the encoding process. i am still convinced
>> that this should not be part of an "upper-layer standard', which is
>> not interoperable, and complicates the integration of those
>> pseudo-encoded texts.
>>
>> Once the structure is encoded as such, there is still the possibility
>> to create a linear graphical representation as a reasonnable readable
>> fallback exhibiting the structure unambiguously, even if the text
>> renderer cannot produce the 2D layout (you just need to make those
>> format controls visible by themselves with a glyph, or some other
>> meaning offered in the text renderer, including with colors or various
>> style effects).

We don't need new special characters nor new half-characters nor new
ccc as I proposed above. No!
We already have the Annotation Characters!
It is possible to use something like U+FFF9 ANNOTATION ANCHOR РКГ
U+FFFA ANNOTATION SEPARATOR U+0483 COMBINING TITLO U+FFFB ANNOTATION
TERMINATOR for Cyrrilic number 123 (РКГ under titlo). This way also
titlos wit supralinear leters (like SLOVO TITLO, TVERDO TITLO, see
http://ru.wikipedia.org/wiki/Титло) are implementable.
The only question is right processing of annotation chunkes that start
with nonstarter. I mean a being a combining character, without a base
character, chunk of multiline annotation should use previous chunk as
base (in best application).


Reply via email to