RE: What is the time frame for USE shapers to provide support for CV+C ?

Andrew Glass via Unicode Mon, 13 May 2019 18:00:41 -0700

Here is the essence of the initial changes needed to support CV+C. Open to 
feedback.



  *   Create new SAKOT class
SAKOT (Sk) based on UISC = Invisible_Stacker
  *   Reduced HALANT class
Now only HALANT (H) based on UISC = Virama
  *   Updated Standard cluster mode

[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] 
(CMAbv)* (CMBlw)*)* [MPre] [MAbv] [MBlw] [MPst] (VPre)* (VAbv)* (VBlw)* (VPst)* 
(VMPre)* (VMAbv)* (VMBlw)* (VMPst)* (Sk B)* (FAbv)* (FBlw)* (FPst)* [FM]


The only required component of a standard cluster is a BASE or BASE_OTHER. A 
cluster may optionally begin with a REPH or CONS_WITH_STACKER. A BASE or 
BASE_OTHER may be followed immediately by a VARIATION_SELECTOR and/or multiple 
CONS_MOD characters in the order CONS_MOD_ABOVE CONS_MOD_BELOW. Multiple 
sequences of a HALANT BASE or SAKOT BASE with optional VARIATION_SELECTOR or 
optional CONS_MOD can occur. The sequence can continue with zero or one 
CONS_MED for each cardinal position (Pre, Above, Below, Post); zero to many 
VOWEL characters in each cardinal position; zero to many VOWEL_MODs in each 
cardinal position; zero to many sequences of SAKOT BASE; zero to many 
CONS_FINALs in each of Above, Below, and Post; and lastly, an optional 
FINAL_MOD.



  *   Updated Halant-terminated cluster
[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] 
(CMAbv)* (CMBlw)*)* < H | Sk >



This is similar to the Standard cluster but terminates in a final HALANT or 
SAKOT after a BASE, BASE_OTHER, or CONS_MOD. When such a HALANT or SAKOT it 
will form a cluster. When any character other than a BASE or BASE_OTHER follows 
the HALANT or SAKOT there will be a cluster break between the HALANT or SAKOT 
and the following character. Multiple sequences of a HALANT BASE or SAKOT BASE 
with optional VARIATION_SELECTOR or optional CONS_MOD can occur. A CONS_SUBJ is 
equivalent to the sequence HALANT BASE.



  *   New Sakot-terminated cluster

[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS] 
(CMAbv)* (CMBlw)*)*

    [MPre] [MAbv] [MBlw] [MPst]

    (VPre)* (VAbv)* (VBlw)* (VPst)*

    (VMPre)* (VMAbv)* (VMBlw)* (VMPst)*

    (Sk B [VS] (CMAbv)* (CMBlw)*)* Sk



This is similar to the Standard cluster but terminates in a final SAKOT after a 
VOWEL or VOWEL_MOD. When such a SAKOT follows a VOWEL or VOWEL_MOD it will form 
a cluster. When any character other than a BASE or BASE_OTHER follows this 
SAKOT there will be a cluster break between the SAKOT and the following 
character. Multiple sequences of a SAKOT BASE with optional VARIATION_SELECTOR 
or optional CONS_MOD can occur. A CONS_SUBJ is equivalent to the sequence 
HALANT BASE.

This would allow a consonant to follow a vowel when joined with a Sakot. It 
would support multiple final consonants. It would not support polysyllabic 
chaining of CV+CV+CV etc.

Cheers,

Andrew


From: Behdad Esfahbod <[email protected]>
Sent: 10 May 2019 11:32
To: Ed Trager <[email protected]>
Cc: Andrew Glass <[email protected]>; Unicode Mailing List 
<[email protected]>
Subject: Re: What is the time frame for USE shapers to provide support for CV+C 
?

I'm open to doing that if there's consensus on how it should be done.

On Thu, May 9, 2019 at 8:55 AM Ed Trager 
<[email protected]<mailto:[email protected]>> wrote:
Hi, Andrew and Behdad,

Prompted by a conversation I had with Liang Hai yesterday, I am just curious to 
get some idea about the following:

(1) When can we anticipate that the USE spec will be updated to provide support 
for subjoined consonants below vowels (as required for TAI THAM) ?

(2) Once the USE spec is updated, how much lag time can we expect until 
Microsoft actually releases an implementation with said support for CV+C ?

(3a) And the related question —for Behdad and the HarfBuzz development group— 
is when can we expect to see CV+C support (at least for TAI THAM) in HarfBuzz ?

(3b) Would the HarfBuzz team consider providing CV+C support for TAI THAM even 
before the USE spec gets updated, so that we could test things out ? * **

---------------------------------------
* PLEASE AND THANKYOU?

** A good use case is the Tai Tham word U+1A27 U+1A6A U+1A60 U+1A37 , 
transcribed to Central Thai script as จูบ, (to kiss). Currently, people are 
writing this as U+1A27 U+1A60 U+1A37 U+1A6A ("จบู") which violates the 
"phonetic ordering" but is the current workaround because USE is still broken 
for TAI THAM.

REFERENCE DOCUMENT:
http://www.unicode.org/L2/L2018/18332-tai-tham-ad-hoc-report.pdf<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.unicode.org%2FL2%2FL2018%2F18332-tai-tham-ad-hoc-report.pdf&data=02%7C01%7CAndrew.Glass%40microsoft.com%7Cc068e18210314e1e3c3208d6d575d3ac%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636931099374714582&sdata=U6xDQJs6Srh8dfwogdoH4yr%2FrkAoxspXpSWNcYEo0f0%3D&reserved=0>




--
behdad
http://behdad.org/<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbehdad.org%2F&data=02%7C01%7CAndrew.Glass%40microsoft.com%7Cc068e18210314e1e3c3208d6d575d3ac%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636931099374724592&sdata=LIJyn9L1qVTUSi14GQoSXLt0nBL%2Bp%2BWa5Ua9NZTqPYI%3D&reserved=0>

RE: What is the time frame for USE shapers to provide support for CV+C ?

Reply via email to