Re: What is the time frame for USE shapers to provide support for CV+C ?

梁海 Liang Hai via Unicode Sun, 23 Jun 2019 08:36:52 -0700

> (1) When can we anticipate that the USE spec will be updated to provide 
> support for subjoined consonants below vowels (as required for TAI THAM) ?


• The exact scope is actually about allowing conjoined consonant forms (either 
encoded with a stacker, or encoded atomically?) after vowel signs in an encoded 
cluster.

> ** A good use case is the Tai Tham word U+1A27 U+1A6A U+1A60 U+1A37 , 
> transcribed to Central Thai script as จูบ, (to kiss). Currently, people are 
> writing this as U+1A27 U+1A60 U+1A37 U+1A6A ("จบู") which violates the 
> "phonetic ordering" but is the current workaround because USE is still broken 
> for TAI THAM.

• I agree with Richard on that this is really not a good use case. This word 
(as long as it is written with the vowel sign Uu either under or after the 
conjoined consonant sign B) should really be encoded as <High Ca, stacker, Ba, 
sign Uu>, according to our best understanding today.

• The “phonetic ordering” principle of Unicode is a frequently misinterpreted 
one. Note that when there are multiple ways of interpreting the phonetic order 
of a written structure, we try to stick to the more graphically apparent order, 
in order to have a stable encoding order.

> An example of the contrast is shown in the attached files luynam.png, with 
> first orthographic syllable <LA, SIGN U, SAKOT, LOW YA>, and yukya.png, with 
> the first orthographic syllable <HIGH HA, SAKOT, LOW YA, SIGN U>.

• Right. I was always wondering to what extent this distinction happens as an 
orthographically conscious choice.

• Generally I feel, when at least one of the interacting signs (usually a 
consonant one and a vowel one) has inline advance, it should be safe to take a 
graphic order approach. The “6th preliminary recommendation” doesn’t have the 
luynam vs yukya case taken into consideration mostly only because we wasn’t 
sure about what good attestations are there.

> * Create new SAKOT class SAKOT (Sk) based on UISC = Invisible_Stacker
> * Reduced HALANT class Now only HALANT (H) based on UISC = Virama

• This feels like an undesirable Tham-specific relaxation. Note the artificial 
distinction between UISC Invisible_Stacker and Virama has nothing to do with 
whether graphically writing a consonant sign after a vowel sign is attested for 
a script. (কা)

• At least we need to look into USE-applicable (existing and future) scripts 
encoded with a Virama and see if any of them does need the relaxation.

> * Updated Standard cluster mode [< R | CS >] < B | GB > [VS] (CMAbv)* 
> (CMBlw)* (< < H | Sk > B | SUB > [VS] (CMAbv)* (CMBlw)) [MPre] [MAbv] [MBlw] 
> [MPst] (VPre)* (VAbv)* (VBlw)* (VPst)* (VMPre)* (VMAbv)* (VMBlw)* (VMPst)* 
> (Sk B)* (FAbv)* (FBlw)* (FPst)* [FM]


• I’m still trying to think about the possibility of only relaxing the cluster 
when either/both of <vowel sign, consonant sign> has post-base advance…

• The artificial distinction made between < H | Sk > B, SUB, and CM really 
needs to be resolved together with the relaxation.

> * Updated Halant-terminated cluster [< R | CS >] < B | GB > [VS] (CMAbv)* 
> (CMBlw)* (< < H | Sk > B | SUB > [VS] (CMAbv)* (CMBlw)) < H | Sk >


• So, the intention of allowing Sk at the end is only about allowing the glyph 
of Sk to be positioned on the preceding character(s), right?

> * New Sakot-terminated cluster [< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* 
> (< < H | Sk > B | SUB > [VS] (CMAbv)* (CMBlw)) [MPre] [MAbv] [MBlw] [MPst] 
> (VPre)* (VAbv)* (VBlw)* (VPst)* (VMPre)* (VMAbv)* (VMBlw)* (VMPst)* (Sk B 
> [VS] (CMAbv)* (CMBlw)) Sk


• The “(Sk B [VS] (CMAbv)* (CMBlw)) Sk” part doesn’t seem to align with the 
updated Standard cluster’s “(Sk B)*”?

> I trust you'll be reclassifying U+1A55 TAI THAM CONSONANT SIGN MEDIAL RA and 
> U+1A56 TAI THAM CONSONANT SIGN MEDIAL LA into the category SUB so that we can 
> write about bananas forever (ᨠᩖ᩠ᩅ᩠᩶ᨿᨲᩕ᩠ᩃᩬᨯ): <HIGH KA, MEDIAL LA, SAKOT, WA, 
> TONE-2, SAKOT, LOW YA> /kluai/ 'banana' <HIGH TA, MEDIAL RA, SAKOT, LA, SIGN 
> OA BELOW, DA> /tʰalɔːt/ 'for ever' The issues here are that WA in a medial 
> rôle is indistinguishable from a coda ('sakot') consonant and that MEDIAL RA 
> can act as a consonant aspirator.

• The issues here are:

        • Medial consonant sign characters of Tham are not encoded based on a 
clear phono-orthographical distinction.

        • Tham allows syllable chaining that does not rely on a preceding 
inline coda letter.

• Consonant sign Medial Ra being a consonant aspirator here is not relevant to 
its appearance before a non-medial consonant sign here.

Best,
梁海 Liang Hai
https://lianghai.github.io

Re: What is the time frame for USE shapers to provide support for CV+C ?

Reply via email to