On Mon, 14 Jan 2019 06:24:46 +0000 James Kass via Unicode <[email protected]> wrote:
> Unicode doesn't enforce any spelling or punctuation rules. Unicode > doesn't tell human beings how to pronounce strings of text or how to > interpret them. These are not statements that are both honest and true. Unicode lays down rules and recommendations which others may then enforce. In Indic scripts where LETTER A is not also a consonant, Unicode forbids writing <LETTER A, SIGN AA> where LETTER AA would do the same job, and most renderers enforce that rule. Similarly, in phonetically ordered LTR scripts, one can't write a dependent vowel as the first character even if it is the leftmost character. There is a subtler rule about not spelling negative numbers with a hyphen-minus - if one does, one may suddenly find a line break just after what is being used as a negative sign. In scripts where Sanskrit grv and gvr may be rendered identically, Unicode tells us what the two code sequences are, and therefore indirectly what the range of pronunciations is for a given spelling. Now, sometimes the enforcers overstep the mark. For example, the USE tells us that when we write Northern Thai /pʰiaʔ/ 'sound of a smack' which visually is <gSIGN_E, gMEDIAL_RA (/ʰ/), gLOW_PA (/p/), gSAKOT_LOW_YA, gSIGN_A (/ʔ/)>, with <gSIGN_E,...gSAKOT_LOW_YA> denoting /ia/, we should write it ᨻ᩠ᨿᩕᩮᩡ <LOW PA, SAKOT, LOW YA, MEDIAL RA, SIGN E, SIGN A>. So much for phonetic order! Enforcement can be more subtle. TUS says that Farsi should use U+06CC ARABIC LETTER FARSI YEH instead of U+064A ARABIC LETTER YEH although they are identical in initial and medial positions. In this case, the enforcer will be the spell-checker. Richard.

