Re: Choosing the Set of Renderable Strings

Richard Wordingham via Unicode Wed, 16 May 2018 14:42:07 -0700

On Wed, 16 May 2018 05:23:08 -0800
James Kass via Unicode <[email protected]> wrote:


> Note that although the proposal gave canonical combining class
> zero to both the tone marks and the vowel signs, the on-line Unicode
> data gives canonical combining class 230 to the tone marks.

There were several changes from ccc=0 to non-zero that were sneaked in
between the UTC agreeing to proceed with the proposal and Unicode 5.2
being published.  That may have been a test of vigialnce; we failed.  I
have seen no benefit from the changes - U+A160 TAI THAM SIGN SAKOT is
not a virama (it should not appear in valid text), and having the tone
marks and the invisible stacker have distinct non-zero classes has
caused lots of irritation.

We should probably have risked Tai Tham being excluded from the BMP and
gone for the Tibetan model; normalised would not then damage Tai tham
text.

> > **The placement may be different to that of MAI KANG
> > in /bɔː waː/ ᨷᩴ᩠᩵ᩅᩣ <BA, MAI KANG, TONE-1, SAKOT, WA,
> > SIGN AA> or ᨷᩴ᩠ᩅ᩵ᩣ <BA, MAI KANG, SAKOT, WA, TONE-1,
> > SIGN AA> - I don't know whether the first or the second
> > tone mark is dropped.  

> FWIW, neither is dropped in the display here, although they don't
> display identically.  The first string shows TONE-1 positioned to the
> right of MAI KANG, the second string superimposes them.  (Windows 7
> running LibreOffice in order to enable the USE from HarfBuzz.)

The full uncontracted writing is <BA, MAI KANG, TONE-1, WA, TONE-1, SIGN
AA>.  Both syllables have TONE-1, but I have not seen two identical
tone marks from different phonetic syllables in the same stack.  The
person typing the contraction drops a tone mark, not the rendering
system.

> Substituting U+1A36 TAI THAM LETTER NA for BA in the above strings,
> ᨶᩴ᩠᩵ᩅᩣ  ᨶᩴ᩠ᩅ᩵ᩣ, and trying to get the ligature are in the attached
> *.PNG file. Here's the four strings for the PNG:
> 
> \u1A36\u1A74\u1A75\u1A60\u1A45\u1A63
> \u1A36\u1A74\u1A60\u1A45\u1A75\u1A63
> \u1A36\u1A75\u1A63\u1A74
> \u1A36\u1A63\u1A74\u1A75

A lot of fonts have trouble ligating NA and AA when there is material
between them.  (Hint: Classify all non-spacing subscript consonants as
marks, and spacing subscript consonants as bases, and set the ligating
lookup to ignore marks.)

Your example appears to be using the font called 'A Tai Tham KH New'.
While the only way to type Pali _bho_ 'O' after other text in this font
or 'A Tai Tham KH' is to enter the correct sequence <LOW PHA, SIGN E,
SIGN AA>, the former font cannot render Pali _mano_ 'mind' (also used in
Northern Thai and probably also Tai Khuen) if one types the correct
sequence <MA, NA, SIGN E, SIGN AA>.  One has to type <MA, NA, SIGN AA,
SIGN E>!  The *older* font 'A Tai Tham KH (at Version 2.0) does render the
correct spelling properly.  As an example of correct rendering, I
include the Pali for 'O mind!', _bho mano_, encoded  <LOW PHA, SIGN E,
SIGN AA, MA, NA, SIGN AA, SIGN E>, as rendered by the Lamphun font.

Richard.

Re: Choosing the Set of Renderable Strings

Reply via email to