Here is the essence of the initial changes needed to support CV+C. Open to
feedback.
* Create new SAKOT class
SAKOT (Sk) based on UISC = Invisible_Stacker
* Reduced HALANT class
Now only HALANT (H) based on UISC = Virama
* Updated Standard cluster mode
[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS]
(CMAbv)* (CMBlw)*)* [MPre] [MAbv] [MBlw] [MPst] (VPre)* (VAbv)* (VBlw)* (VPst)*
(VMPre)* (VMAbv)* (VMBlw)* (VMPst)* (Sk B)* (FAbv)* (FBlw)* (FPst)* [FM]
The only required component of a standard cluster is a BASE or BASE_OTHER. A
cluster may optionally begin with a REPH or CONS_WITH_STACKER. A BASE or
BASE_OTHER may be followed immediately by a VARIATION_SELECTOR and/or multiple
CONS_MOD characters in the order CONS_MOD_ABOVE CONS_MOD_BELOW. Multiple
sequences of a HALANT BASE or SAKOT BASE with optional VARIATION_SELECTOR or
optional CONS_MOD can occur. The sequence can continue with zero or one
CONS_MED for each cardinal position (Pre, Above, Below, Post); zero to many
VOWEL characters in each cardinal position; zero to many VOWEL_MODs in each
cardinal position; zero to many sequences of SAKOT BASE; zero to many
CONS_FINALs in each of Above, Below, and Post; and lastly, an optional
FINAL_MOD.
* Updated Halant-terminated cluster
[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS]
(CMAbv)* (CMBlw)*)* < H | Sk >
This is similar to the Standard cluster but terminates in a final HALANT or
SAKOT after a BASE, BASE_OTHER, or CONS_MOD. When such a HALANT or SAKOT it
will form a cluster. When any character other than a BASE or BASE_OTHER follows
the HALANT or SAKOT there will be a cluster break between the HALANT or SAKOT
and the following character. Multiple sequences of a HALANT BASE or SAKOT BASE
with optional VARIATION_SELECTOR or optional CONS_MOD can occur. A CONS_SUBJ is
equivalent to the sequence HALANT BASE.
* New Sakot-terminated cluster
[< R | CS >] < B | GB > [VS] (CMAbv)* (CMBlw)* (< < H | Sk > B | SUB > [VS]
(CMAbv)* (CMBlw)*)*
[MPre] [MAbv] [MBlw] [MPst]
(VPre)* (VAbv)* (VBlw)* (VPst)*
(VMPre)* (VMAbv)* (VMBlw)* (VMPst)*
(Sk B [VS] (CMAbv)* (CMBlw)*)* Sk
This is similar to the Standard cluster but terminates in a final SAKOT after a
VOWEL or VOWEL_MOD. When such a SAKOT follows a VOWEL or VOWEL_MOD it will form
a cluster. When any character other than a BASE or BASE_OTHER follows this
SAKOT there will be a cluster break between the SAKOT and the following
character. Multiple sequences of a SAKOT BASE with optional VARIATION_SELECTOR
or optional CONS_MOD can occur. A CONS_SUBJ is equivalent to the sequence
HALANT BASE.
This would allow a consonant to follow a vowel when joined with a Sakot. It
would support multiple final consonants. It would not support polysyllabic
chaining of CV+CV+CV etc.
Cheers,
Andrew
From: Behdad Esfahbod <[email protected]>
Sent: 10 May 2019 11:32
To: Ed Trager <[email protected]>
Cc: Andrew Glass <[email protected]>; Unicode Mailing List
<[email protected]>
Subject: Re: What is the time frame for USE shapers to provide support for CV+C
?
I'm open to doing that if there's consensus on how it should be done.
On Thu, May 9, 2019 at 8:55 AM Ed Trager
<[email protected]<mailto:[email protected]>> wrote:
Hi, Andrew and Behdad,
Prompted by a conversation I had with Liang Hai yesterday, I am just curious to
get some idea about the following:
(1) When can we anticipate that the USE spec will be updated to provide support
for subjoined consonants below vowels (as required for TAI THAM) ?
(2) Once the USE spec is updated, how much lag time can we expect until
Microsoft actually releases an implementation with said support for CV+C ?
(3a) And the related question —for Behdad and the HarfBuzz development group—
is when can we expect to see CV+C support (at least for TAI THAM) in HarfBuzz ?
(3b) Would the HarfBuzz team consider providing CV+C support for TAI THAM even
before the USE spec gets updated, so that we could test things out ? * **
---------------------------------------
* PLEASE AND THANKYOU?
** A good use case is the Tai Tham word U+1A27 U+1A6A U+1A60 U+1A37 ,
transcribed to Central Thai script as จูบ, (to kiss). Currently, people are
writing this as U+1A27 U+1A60 U+1A37 U+1A6A ("จบู") which violates the
"phonetic ordering" but is the current workaround because USE is still broken
for TAI THAM.
REFERENCE DOCUMENT:
http://www.unicode.org/L2/L2018/18332-tai-tham-ad-hoc-report.pdf<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.unicode.org%2FL2%2FL2018%2F18332-tai-tham-ad-hoc-report.pdf&data=02%7C01%7CAndrew.Glass%40microsoft.com%7Cc068e18210314e1e3c3208d6d575d3ac%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636931099374714582&sdata=U6xDQJs6Srh8dfwogdoH4yr%2FrkAoxspXpSWNcYEo0f0%3D&reserved=0>
--
behdad
http://behdad.org/<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbehdad.org%2F&data=02%7C01%7CAndrew.Glass%40microsoft.com%7Cc068e18210314e1e3c3208d6d575d3ac%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636931099374724592&sdata=LIJyn9L1qVTUSi14GQoSXLt0nBL%2Bp%2BWa5Ua9NZTqPYI%3D&reserved=0>