> Message du 24/07/10 09:02 > De : "William_J_G Overington" <wjgo_10...@btinternet.com> > A : unicode@unicode.org > Copie à : wjgo_10...@btinternet.com > Objet : Using Combining Double Breve and expressing characters perhaps as if > struck out. > > > I have been looking at the following thread, which is entitled "Making Fonts > with Diacritical Marks for Phonetics". > > http://forum.high-logic.com/viewtopic.php?f=3&t=3169 > > I am writing here to ask two questions please in relation to the Unicode > aspects of the problem. > > I have looked at http://www.unicode.org/versions/Unicode5.2.0/ch02.pdf in > section 2.11 Combining Characters (page 36 of the pdf) and at > http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf in section 3.6 > Combination (page 24 of the pdf). > > In http://www.unicode.org/charts/PDF/U0300.pdf there is U+035D COMBINING > DOUBLE BREVE and there is U+035E COMBINING DOUBLE MACRON. > > In http://www.unicode.org/charts/PDF/U0000.pdf there is U+006F LATIN SMALL > LETTER O. > > How does one express two letters LATIN SMALL LETTER O with a combining double > breve in a Unicode plain text document please?
First encode each base (unjoined) extended grapheme clusters separately (possibly with their own diacritics or extenders or prependers, including ZWJ and ZWNJ, according to their definition in the UAX defining text segmentations). Then encode the double diacritic between them. So for your examples you get <006F, 035D, 006F> (double breve) or <006F, 035D, 006F> (double macron). Double diacritics have a combining property equal to zero, so they block the reordering for canonical equivalences and the relative order and independance for the encoding of base grapheme clusters will be preserved during normalizations. As a consequence, if there's another diacritic added on top of the double diacritic, it can only be added at end of this sequence, but the bad thing is that it will appear just after the encoding of the second base grapheme cluster, and so it is subject to reordering, as it will be interpreted as being part itself of the second grapheme clusters. Currently you cannot add another diacritic on top of a double diacritic, we lack something for blocking such interpretation in the second cluster. To do that, we would need another base character with combining property 0 (blocking canonical reorderings), and that would have the same grouping semantic as other double diacritics. This character would be abstract (and invisible by itself) and could be something like: U+xyzt DOUBLE DIACRITIC HOLDER For example to add an acute accent above the double breve joining the two letters 'o', we would encode: <006F, 035D, 006F, xyzt, 0301> instead of just <006F, 035D, 006F, 0301> which is canonically equivalent to <006F, 035D, 00F3> and which encodes the letter 'o' and the letter 'o' with an acute accent (centered on this second o) joined with the double breve *above* the acute accent of the second 'o'. My opinion is that such double diacritic holder exists: it's ZWJ, which could be safely used as the needed invisible base for additional diacritics occuring on top (and centered) of a double diacritic. But others may have other preferences about the choice of this character. I don't know if ZWJ has been specified so that it could occur only before a "defective" combining sequence containing only combining diacritics. for this case, this would mean that the semantic of the combining diacritics encoded after it must apply to the full part of the extended grapheme cluster encoded before it. This use of ZWJ effectively allows the interpretation of the encoded sequence as if it was in TeX syntax: \acute{ \breve{oo} } Without the ZWJ, it would be interpreted as: \breve{ o\acute{o} } The double diacritics or just intended to be used between each base grapheme clusters to join. And it could possibly be used to groop more than 2 base grapheme, for example with 3 'o' as: <006F, 035D, 006F, 035D, 006F> interpreted in TeX syntax as: \breve{ooo} But even with this case, you wont be able to encode with the ZWJ trick in plain text, such groupings that are expressed this way in TeX: \breve{ \breve{oo} x \breve{ o\acute{o} } } Because double diacritics encoded in Unicode can't be safely stacked together (for such application you'll need a rich-text layer on top of Unicode, such as TeX here). Philippe.