2017-12-09 15:28 GMT+01:00 Richard Wordingham via Unicode < [email protected]>:
> Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0 > implies that it might be considered desirable to have a word boundary > in 'aquaφοβία' or a grapheme cluster break in a coding such as <006C, > U+0901 DEVANAGARI SIGN CANDRABINDU> for el candrabindu (l̐), which > should be <006C, U+0310 COMBINING CANDRABINDU> in accordance with the > principle of script separation. Why are such breaks desirable? > I don't understand why one would encode a DEVANAGARI SIGN in the middle of a Greek word to mean it implies a word boundary in Greek !?! > There are some who > claim that the Laotian consonant place holder is the letter 'x' rather > than the multiplication sign, U+00D7, which does have > Indic_syllabic_category=Consonant_Placeholder. (I trust no-one is > suggesting that there should be grapheme cluster boundary between > U+00D7 with script=common and a non-spacing Lao vowel any more than > there would be with a Lao consonant.) > Here again the multiplication sign has nothing to do with an Indic consonnant. May be it has been used like this in some texts but this look more like a tweak. If one needs a consonnant holder propose to encode an "empty" letter (like in Hangul or in Arabic), possibly with variant forms (e.g. changing between a circle, dotted circle, cross, or horizontal joiner on the hanging baseline for Devenagari and similar scripts). The usual base letter placeholder for combining diacritics is usually a whitespace (preferably NBSP, not SPACE) or the dotted circle symbol, but not a mathematical symbol which is used also within math formulas with variable names using common letters or even words. The multiplication sign used in the UTS standard was chosen because it normally does not occur within words, and only for defining the breaking rules (to indicate that NO break is allowed here, i.e. the opposite of what you describe): it is notational only and is clearly not meant to combine with what follows: if you encode the multiplication sign then an Indic diacritic, we expect to see the separate multipliation sign (with break opportunities on both sides) then a dotted circle glyph used for defective grapheme clusters to hold the diacritic. So for me Indic_syllabic_category=Consonant_Placeholder is wrong: for such use of the cross, an Indic (or generic) consonant placeholder should better be encoded and used and that property may be added on it, and removed from the multiplication sign.

