re: Using Combining Double Breve and expressing characters perhaps as if struck out.

Philippe Verdy Sat, 24 Jul 2010 01:20:14 -0700

> Message du 24/07/10 09:02
> De : "William_J_G Overington" <[email protected]>
> A : [email protected]
> Copie à : [email protected]
> Objet : Using Combining Double Breve and expressing characters perhaps as if 
> struck out.
>
>
> I have been looking at the following thread, which is entitled "Making Fonts 
> with Diacritical Marks for Phonetics".
>
> http://forum.high-logic.com/viewtopic.php?f=3&t=3169
>
> I am writing here to ask two questions please in relation to the Unicode 
> aspects of the problem.
>
> I have looked at http://www.unicode.org/versions/Unicode5.2.0/ch02.pdf in 
> section 2.11 Combining Characters (page 36 of the pdf) and at 
> http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf in section 3.6 
> Combination (page 24 of the pdf).
>
> In http://www.unicode.org/charts/PDF/U0300.pdf there is U+035D COMBINING 
> DOUBLE BREVE and there is U+035E COMBINING DOUBLE MACRON.
>
> In http://www.unicode.org/charts/PDF/U0000.pdf there is U+006F LATIN SMALL 
> LETTER O.
>
> How does one express two letters LATIN SMALL LETTER O with a combining double 
> breve in a Unicode plain text document please?


First encode each base (unjoined) extended grapheme clusters
separately (possibly with their own diacritics or extenders or
prependers, including ZWJ and ZWNJ, according to their definition in
the UAX defining text segmentations).

Then encode the double diacritic between them.

So for your examples you get <006F, 035D, 006F> (double breve) or
<006F, 035D, 006F> (double macron).

Double diacritics have a combining property equal to zero, so they
block the reordering for canonical equivalences and the relative order
and independance for the encoding of base grapheme clusters will be
preserved during normalizations.

As a consequence, if there's another diacritic added on top of the
double diacritic, it can only be added at end of this sequence, but
the bad thing is that it will appear just after the encoding of the
second base grapheme cluster, and so it is subject to reordering, as
it will be interpreted as being part itself of the second grapheme
clusters.

Currently you cannot add another diacritic on top of a double
diacritic, we lack something for blocking such interpretation in the
second cluster.

To do that, we would need another base character with combining
property 0 (blocking canonical reorderings), and that would have the
same grouping semantic as other double diacritics. This character
would be abstract (and invisible by itself) and could be something
like:

  U+xyzt DOUBLE DIACRITIC HOLDER

For example to add an acute accent above the double breve joining the
two letters 'o', we would encode:

  <006F, 035D, 006F, xyzt, 0301>

instead of just <006F, 035D, 006F, 0301> which is canonically
equivalent to <006F, 035D, 00F3> and which encodes the letter 'o' and
the letter 'o' with an acute accent (centered on this second o) joined
with the double breve *above* the acute accent of the second 'o'.

My opinion is that such double diacritic holder exists: it's ZWJ,
which could be safely used as the needed invisible base for additional
diacritics occuring on top (and centered) of a double diacritic. But
others may have other preferences about the choice of this character.

I don't know if ZWJ has been specified so that it could occur only
before a "defective" combining sequence containing only combining
diacritics. for this case, this would mean that the semantic of the
combining diacritics encoded after it must apply to the full part of
the extended grapheme cluster encoded before it.

This use of ZWJ effectively allows the interpretation of the encoded
sequence as if it was in TeX syntax:

  \acute{ \breve{oo} }

Without the ZWJ, it would be interpreted as:

  \breve{ o\acute{o} }

The double diacritics or just intended to be used between each base
grapheme clusters to join. And it could possibly be used to groop more
than 2 base grapheme, for example with 3 'o' as:

  <006F, 035D, 006F, 035D, 006F>

interpreted in TeX syntax as: \breve{ooo}

But even with this case, you wont be able to encode with the ZWJ trick
in plain text, such groupings that are expressed this way in TeX:

  \breve{ \breve{oo} x \breve{ o\acute{o} } }

Because double diacritics encoded in Unicode can't be safely stacked
together (for such application you'll need a rich-text layer on top of
Unicode, such as TeX here).

Philippe.

re: Using Combining Double Breve and expressing characters perhaps as if struck out.

Reply via email to