I have two questions, but I begin with some preliminaries in case I am labouring under any misapprehensions.
Firstly, I assume that any legible text in the Tai Tham script with a well-defined pronunciation in one of the main languages using the Tai Tham script (Pali, Tai Khün, Tai Lue, Northern Tai and Lao) either: 1) Contains an unencoded character; 2) Has a unique (up to canonical equivalence) correct encoding; 3) Has a glyph with multiple encodings; or 4) Reveals a deficiency in the specification of the encoding of the script. Glyphs with multiple encodings most commonly occur with styles that do not distinguish U+1A62 TAI THAM VOWEL SIGN MAI SAT and U+1A76 TAI THAM SIGN TONE-2. These can generally be resolved on the basis of the pronunciation. Secondly, what is the definition of the encoding? Is it just the Unicode standard, or is it the sequence of approved proposals plus the Unicode standard (with the latest approval taking precedence)? I presume the proposals are relevant, as otherwise there might not be a defined coding difference between the second syllable of /ɲaʔɲuʔ/ 'shaky' and the word /hui/ 'to sprinkle'. The proposals lead to the encoding <U+1A49 TAI THAM LETTER HIGH HA, U+1A60 TAI THAM SIGN SAKOT, U+1A3F TAI THAM LETTER LOW YA, U+1A69 TAI THAM VOWEL SIGN U> for the former and <U+1A49, U+1A69, U+1A60, U+1A3F> for the latter. The visual difference lies in the positioning of the vowel; there is no visual justification for claiming that the dependent consonant is subjoined to the vowel in either case. Similarly, there is nothing in TUS itself to specify whether /kuː/ (Lao /kʰuː/) 'pair' is spelt <U+1A23 TAI THAM LETTER LOW KA, U+1A6A TAI THAM VOWEL SIGN UU, 1A76 TAI THAM SIGN TONE-2> or <U+1A23, U+1A76, U+1A6A>. Unlike Thai, these two sequences are not canonically equivalent. Before LANNA VOWEL SIGN AM and LANNA VOWEL SIGN TALL AM were rejected, the basic syllable structure for encoding was <pre-vowel_consonant_stack, vowels_before, vowels_below, vowels_above, tones_etc, vowels_after, post-vowel_consonant_stack>. Apart from the first element of the pre-vowel consonant stack, each elements of the consonant stacks was either a pair of SAKOT and consonant letter or a consonant sign. The script has made use of three consonant letters to indicate vowels - U+1A3F LETTER LOW YA, U+1A45 LETTER WA and LETTER A. The subscript form of LETTER A has for most purposes evolved into a vowel symbol, U+1A6C TAI THAM VOWEL SIGN OA BELOW, and presents no known issues. The combinations <U+1A60, U+1A3F> and <U+1A60, U+1A45> represent vowels, generally /e/ and /o/ in Tai Khün and Tai Lue and /i:a/ and /u:a/ in Northern Thai and Lao. These may reasonably be regarded as matres lectionis. The question then arose of how to order them with respect to any other vowels or tone marks. Thai suggested that the mater lectionis should come last, treating the syllable as a pair of chained syllables, but because of Tai Khün feedback they were included in the pre-vowel consonant stack. For interaction with other vowel symbols, this decision in reflected in the 2007 proposal http://www.unicode.org/L2/L2007/07007r-n3207r-lanna.pdf . I have been indexing the Lanna script spellings in the 'Northern Thai Diction of Palm-Leaf Manuscripts', and I have encountered puzzles with some very Siamese spellings. Q1. Should I treat the mater lectionis as part of the initial stack or as starting a chained syllable when an unexpected written vowel appears to proceed it? Specifically: Q1a. Should I encode a certain writing of /kuaʔ/ 'a wooden or woven-bamboo tray as <U+1A20 TAI THAM LETTER HIGH KA, U+1A60, U+1A45, U+1A62 TAI THAM VOWEL SIGN MAI SAT, U+1A61 TAI THAM VOWEL SIGN A> or as <U+1A20 TAI THAM LETTER HIGH KA, U+1A62 TAI THAM VOWEL SIGN MAI SAT, U+1A60, U+1A45, U+1A61 TAI THAM VOWEL SIGN A>? The usual spelling of this word would be <U+1A20 TAI THAM LETTER HIGH KA, U+1A60, U+1A45, U+1A6B TAI THAM VOWEL SIGN O, U+1A61 TAI THAM VOWEL SIGN A>. Q1b. Should I encode a certain writing of /luːa/ 'firewood' other than as <U+1A49 TAI THAM LETTER HIGH HA, U+1A56 TAI THAM CONSONANT SIGN MEDIAL LA, U+1A62 TAI THAM VOWEL SIGN MAI SAT, U+1A60, U+1A45>, and if so, how. The usual writing of the word would be encoded as <U+1A49, U+1A56, U+1A60, U+1A45, U+1A6B TAI THAM VOWEL SIGN O>. Q1c. I see three reasonable encodings of the writing of /sawiːan/ 'a large woven basketfor holding unhusked rice'. The choice between (ii) and (iii) depends on the answer to Q2. The three choices are: (i) <U+1A48 TAI THAM LETTER HIGH SA, U+1A60, U+1A45, U+1A7B TAI THAM SIGN MAI SAM, U+1A66 TAI THAM VOWEL SIGN II, U+1A60, U+1A3F, U+1A41 TAI THAM LETTER RA> (ii) <U+1A48 TAI THAM LETTER HIGH SA, U+1A60, U+1A45, U+1A7B TAI THAM SIGN MAI SAM, U+1A60, U+1A3F, U+1A66 TAI THAM VOWEL SIGN II, U+1A41 TAI THAM LETTER RA> and (iii) <U+1A48 TAI THAM LETTER HIGH SA, U+1A60, U+1A45, U+1A60, U+1A3F, U+1A7B TAI THAM SIGN MAI SAM, U+1A66 TAI THAM VOWEL SIGN II, U+1A41 TAI THAM LETTER RA>. Which encoding should I choose? Q2. Where should I put the MAI SAM in the encoding of the fuller usual writing of /sawiːan/? Should I write (i) <U+1A48 TAI THAM LETTER HIGH SA, U+1A60, U+1A45, U+1A7B TAI THAM SIGN MAI SAM, U+1A60, U+1A3F, U+1A41 TAI THAM LETTER RA> or (ii) <U+1A48 TAI THAM LETTER HIGH SA, U+1A60, U+1A45, U+1A60, U+1A3F, U+1A7B TAI THAM SIGN MAI SAM, U+1A41 TAI THAM LETTER RA>? The TUS does not specify where the MAI SAM representing the typically anaptyctic vowel /a/ should go. (In this case, /swiːan/ *is* a possible Northern Thai word.) The previously cited 2007 proposal says, "it is stored following the subjoined form to indicate the consonant being at the start of a new syllable". However, this moves a mark which is positioned like a vowel or tone mark into the consonant cluster's sequence of code points. The Maefahluang dictionary (p719 of Revision 1) actually writes the mai sam after the RA. Should this be regarded as a typographical error? I have not been able to discern a pattern in the positioning in that dictionary of mai sam used to indicate a hidden syllable boundary. Richard. _______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

