Attached is a humble attempt at Lao syllabication rules in the hopes for Lao integration with TeX.
I am sending this to the tex-hyphen list, and CCing the xetex list as a lengthy discussion regarding this subject occurred there during the last couple of weeks. I will be happy to work with the group in tweaking this and running tests. Thank you, -- Brian Wilson, Director Asia-Pacific International University Translation Center _____________ I have a new blog!! http://tc4asia.org/wpblog "He hath shewed thee, O man, what is good; and what doth the LORD require of thee , but to do justly, and to love mercy, and to walk humbly with thy God." Micah 6:8
The following is a brief sketch of the syllabification rules in Lao. My apologies for not using standard conventions. Feel free to edit. On the most basic level of word-wrapping, syllables should never be split. Lao syllables consist of 1. Beginning Consonant (bC) [required] 2. Secondary Beginning Consonant (sbC) [for consonant clusters] 3. Vowel (V) [required] 4. Tone Mark (T) [The order of 3 and 4 can be reversed] 5. Final Consonant (fC) 6. Extra Final Consonant (efC) 7. galan (g) ########## ########## Consonants and consonant clusters that can begin a syllable. 1. ກ 0E81 (1) ກຣ 0E81 + 0EA3 [uncommon] (2) ກລ 0E81 + 0EA5 [uncommon] (3) ກວ 0E81 + 0EA7 (4) ກຼ 0E81 + 0EBC [uncommon] 2. ຂ 0E82 (1) ຂຣ 0E82 + 0EA3 [uncommon] (2) ຂລ 0E82 + 0EA5 [uncommon] (3) ຂວ 0E82 + 0EA7 (4) ຂຼ 0E82 + 0EBC [uncommon] 3. ຄ 0E84 (1) ຄຣ 0E84 + 0EA3 [uncommon] (2) ຄລ 0E84 + 0EA5 [uncommon] (3) ຄວ 0E84 + 0EA7 (4) ຄຼ 0E84 + 0EBC [uncommon] 4. ງ 0E87 5. ຈ 0E88 6. ຊ 0E89 7. ຍ 0E80 8. ດ 0E94 (1) ດຣ 0E94 + 0EA3 [uncommon] 9. ຕ 0E95 (1) ຕຣ 0E95 + 0EA3 [uncommon] 10. ຖ 0E96 11. ທ 0E97 12. ນ 0E99 13. ບ 0E9A (1) ບຣ 0E9A + 0EA3 [uncommon] (2) ບລ 0E9A + 0EA5 [uncommon] (3) ບຼ 0E9A + 0EBC [uncommon] 14. ປ 0E9B (1) ປຣ 0E9B + 0EA3 [uncommon] (2) ປລ 0E9B + 0EA5 [uncommon] (3) ປຼ 0E9B + 0EBC [uncommon] 15. ຜ 0E9C 16. ຝ 0E9D (1) ຝຣ 0E9D + 0EA3 (2) ຝຼ 0E9D + 0EBC 17. ພ 0E9E 18. ຟ 0E9F 19. ມ 0EA1 20. ຢ 0EA2 21. ຣ 0EA3 22. ລ 0EA5 23. ວ 0EA7 24. ສ 0EAA (1) ສຣ 0E81 + 0EA3 [uncommon] (2) ສລ 0E81 + 0EA5 [uncommon] (3) ສວ 0E81 + 0EA7 (4) ສຼ 0E81 + 0EBC [uncommon] 25. ຫ 0EAB (1) ຫງ 0EAB + 0E87 (2) ຫນ 0EAB + 0E99 [This is uncommon as it has its own character, see below] (3) ຫຍ 0EAB + 0E8D (4) ຫມ 0EAB + 0EA1 [This is uncommon as it has its own character, see below] (5) ຫຣ 0EAB + 0EA3 [uncommon] (6) ຫລ 0EAB + 0EA5 (7) ຫວ 0EAB + 0EA7 (8) ຫຼ 0EAB + 0EBC 26. ອ 0EAD 27. ຮ 0EAE [my mac is rendering this the same as 0EA3, shame on it] 28. ໜ 0EDC 29.ໝ 0EDD ############ ############ Consonants that commonly end a syllable 1. ກ 0E81 2. ງ 0E87 3. ຍ 0E8D [This is a /y/ and acts as a semivowel in certain constructions that will be explained later] 4. ດ 0E94 5. ມ 0EA1 6. ນ 0E99 7. ບ 0E9A 8. ວ 0EA7 [This is a /w/ and acts as a semivowel in certain constructions that will be explained later] ############ ############ Consonants that could conceivably end a syllable in rare occasions when transcribing certain foreign words. 1. ຂ 0E82 2. ຄ 0E84 3. ຈ 0E88 4. ຊ 0E89 5. ດ 0E94 6. ຕ 0E95 7. ຖ 0E96 8. ທ 0E97 9. ປ 0E9B 10. ຜ 0E9C 11. ຝ 0E9D 12. ພ 0E9E 13. ຟ 0E9F 14. ມ 0EA1 15. ຣ 0EA3 16. ລ 0EA5 17. ສ 0EAA ############ ############ Consonants that can never end a syllable [unless followed immediately by the silencer 0ECC] 1. ຫ 0EAB 2. ຢ 0EA2 3. ອ 0EAD 4. ຮ 0EAE 5. ຼ 0EBC 6. ໜ 0EDC 7. ໝ 0EDD ############ ############ Extra final consonant In order to type foreign words, Lao adds 0ECC to extra final consonants. Every consonant but 1. ຼ 0EBC 2. ໜ 0EDC 3. ໝ 0EDD] are theoretically possible with some more common than others. ############ ############ Vowels that are written before the beginning consonant [syllable breaks ALWAYS occur before these characters and NEVER occur after these characters] 1. ເ 0EC0 2. ແ 0EC1 3. ໄ 0EC2 4. ໃ 0EC3 5. ໂ 0EC4 ############ ############ Vowels that are written after the beginning consonant [syllable breaks NEVER occur before these characters. Some vowels in this section and the proceeding section can be stacked. I can specify if necessary.] 1. ະ 0EB0 2. າ 0EB2 3. ຳ 0EB3 [can also be written as 0ECD followed by 0EB2] 4. ິ 0EB4 5. ີ 0EB5 6. ຶ 0EB6 7. ື 0EB7 8. ຸ 0EB8 9. ູ 0EB9 10. ໍ 0ECD ############ ############ Vowels that are written between two consonants [syllable breaks NEVER occur before or after these characters] 1. ັ 0EB1 [The following character must be a consonant or 0EBD semi-vowel] 2. ົ 0EBB [The following character must be (an optional T marker) 1. consonant or 2. າ 0EB2 vowel when used in the /ow/ diphthong ( <0EC0> <bC> <(sbC)> <0EBB> <(T)> <0EBD>) or 3. ວ 0EA7 semi-vowel when used in the /ua/ diphthong (Note that the ວ may be followed by ະ 0EB0 for the shortened version of this diphthong. <bC> <sbC> <0EBB> <(T)> <0EA7> <(0EB0)>)] ############ ############ Vowels that can't take a final consonants 1. ະ 0EB0 [syllable break ALWAYS occurs after this character] 2. ໍ 0ECD [syllable break ALWAYS occurs after this character or the optional tone mark immediately following it.] ############ ############ /ia/ Vowel and in old orthography /y/ which can replace the final ຍ 0E8D - see above 1. ຽ [can NEVER break before. If it is a final /y/, then can break after] ############ ############ Tones. There are four tone marks that can sit on top of the initial consonant or on ິ ີ ຶ ື ໍ 0EB4 - 0EB5 - 0EB6 - 0EB7 - 0ECD (Note that 0EB5 and 0EB7 also part of diphthongs—see below) Breaks can NEVER occur before these. 1. ່ 0EC8 2. ້ 0EC9 3. ໊ 0ECA 4. ໋ 0ECB ############ ############ The silencer—a mark placed on a consonant rendering it silent. Only used to write foreign words. Usually placed on the last letter of a syllable, although it can occur in the middle of a syllable when placed on a ຣ 0EA3 or ລ 0EA5. A break can NEVER occur before the consonant upon which this character sits as a consonant containing this character (galan) can not begin a syllable. 1. ໌ 0ECC ############ ############ The following punctuation marks can never begin a new line. Also not that English and French punctuation symbols and rules apply. ( Lao tends to add a space around punctuation as in French, but not always. ) Quotes can be with " " or << >> 1. ໆ 0EC6 2. 0EAF [Sorry, I can't find this on my unmarked mac keyboard] ############ ############ Vowel Diphthongs. Here is where it gets hairy as three consonant semi-vowels are involved. [See my explanation at the beginning of this document. Parentheses refer to optional characters)] 1. <0EC0> <bC> <(sbC)> <0EB6 or 0EB7> <(T)> <0EAD> <(fC)> [eua vowel. Note that the beginning consonant is in the middle] [Well, that wasn't so bad. I think that the other diphthongs are taken care of in previous rules and notes.] ############ ############ Consonants used as vowels between consonants. 1. ວ 0EA7 2. ອ 0EAD [If ວ|ອ is preceded by a consonant (note optional tone mark) and followed immediately by a consonant that is not followed by a vowel or tone mark then consider C(T)ວ|ອC to be a syllable.] ############ ############ Yeah. The end.
-------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex