Attached is a humble attempt at Lao syllabication rules in the hopes for Lao
integration with TeX.

I am sending this to the tex-hyphen list, and CCing the xetex list as a
lengthy discussion regarding this subject occurred there during the last
couple of weeks.

I will be happy to work with the group in tweaking this and running tests.

Thank you,

-- 
Brian Wilson, Director
Asia-Pacific International University Translation Center
_____________

I have a new blog!! http://tc4asia.org/wpblog

"He hath shewed thee, O man, what is good; and what doth the LORD require of
thee , but to do justly, and to love mercy, and to walk humbly with thy
God."  Micah 6:8
The following is a brief sketch of the syllabification rules in Lao. My 
apologies for not using standard conventions.  Feel free to edit.

On the most basic level of word-wrapping, syllables should never be split.

Lao syllables consist of
        1. Beginning Consonant (bC) [required]
        2. Secondary Beginning Consonant (sbC) [for consonant clusters]
        3. Vowel (V) [required]
        4. Tone Mark (T) [The order of 3 and 4 can be reversed]
        5. Final Consonant (fC) 
        6. Extra Final Consonant (efC)
        7. galan (g)

##########
##########
Consonants and consonant clusters that can begin a syllable. 
        1. ກ 0E81
                (1) ກຣ 0E81 + 0EA3 [uncommon]
                (2) ກລ 0E81 + 0EA5 [uncommon]
                (3) ກວ 0E81 + 0EA7 
                (4) ກຼ 0E81 + 0EBC [uncommon]

        2. ຂ 0E82
                        (1) ຂຣ 0E82 + 0EA3 [uncommon]
                        (2) ຂລ 0E82 + 0EA5 [uncommon]
                        (3) ຂວ 0E82 + 0EA7 
                        (4) ຂຼ 0E82 + 0EBC [uncommon]
                        
        3. ຄ 0E84
                        (1) ຄຣ 0E84 + 0EA3 [uncommon]
                        (2) ຄລ 0E84 + 0EA5 [uncommon]
                        (3) ຄວ 0E84 + 0EA7 
                        (4) ຄຼ 0E84 + 0EBC [uncommon]
                
        4. ງ 0E87
        
        5. ຈ 0E88
        
        6. ຊ 0E89
        
        7. ຍ 0E80
        
        8. ດ 0E94
                (1) ດຣ 0E94 + 0EA3 [uncommon]
        
        9. ຕ 0E95
                (1) ຕຣ ​0E95 + 0EA3 [uncommon]
                
        10. ຖ 0E96
        
        11. ທ 0E97
        
        12. ນ 0E99

        13. ບ 0E9A
                        (1) ບຣ ​0E9A + 0EA3 [uncommon]
                        (2) ບລ 0E9A + 0EA5  [uncommon]
                        (3) ບຼ 0E9A + 0EBC  [uncommon]
                        
        14. ປ 0E9B
                        (1) ປຣ ​0E9B + 0EA3 [uncommon]
                        (2) ປລ 0E9B + 0EA5  [uncommon]
                        (3) ປຼ 0E9B + 0EBC  [uncommon]
                        
        15. ຜ 0E9C
        
        16. ຝ 0E9D
                (1) ຝຣ 0E9D + 0EA3
                (2) ຝຼ 0E9D + 0EBC
                
        17. ພ 0E9E
        
        18. ຟ 0E9F
        
        19. ມ 0EA1
        
        20. ຢ 0EA2
        
        21. ຣ 0EA3
        
        22. ລ 0EA5
        
        23. ວ 0EA7
        
        24. ສ 0EAA
                (1) ສຣ 0E81 + 0EA3 [uncommon]
                (2) ສລ 0E81 + 0EA5 [uncommon]
                (3) ສວ 0E81 + 0EA7 
                (4) ສຼ 0E81 + 0EBC [uncommon]
        
        25. ຫ 0EAB
                (1) ຫງ 0EAB + 0E87
                (2) ຫນ 0EAB + 0E99 [This is uncommon as it has its own 
character, see below]
                (3) ຫຍ 0EAB + 0E8D
                (4) ຫມ 0EAB + 0EA1 [This is uncommon as it has its own 
character, see below]
                (5) ຫຣ 0EAB + 0EA3 [uncommon]
                (6) ຫລ 0EAB + 0EA5
                (7) ຫວ 0EAB + 0EA7
                (8)  ຫຼ 0EAB + 0EBC
        
        26. ອ 0EAD
        
        27. ຮ 0EAE [my mac is rendering this the same as 0EA3, shame on it]
        
        28. ໜ 0EDC
        
        29.ໝ 0EDD
        
############
############    
Consonants that commonly end a syllable
        1. ກ 0E81
        2. ງ 0E87
        3. ຍ 0E8D [This is a /y/ and acts as a semivowel in certain 
constructions that will be explained later]
        4. ດ 0E94
        5. ມ 0EA1
        6. ນ 0E99
        7. ບ 0E9A
        8. ວ 0EA7  [This is a /w/ and acts as a semivowel in certain 
constructions that will be explained later]
        
############
############    
Consonants that could conceivably end a syllable in rare occasions when 
transcribing certain foreign words.

        1. ຂ 0E82
        2. ຄ 0E84
        3. ຈ 0E88
        4. ຊ 0E89
        5. ດ 0E94
        6. ຕ 0E95
        7. ຖ 0E96
        8. ທ 0E97
        9. ປ 0E9B
        10. ຜ 0E9C
        11. ຝ 0E9D
        12. ພ 0E9E
        13. ຟ 0E9F
        14. ມ 0EA1
        15. ຣ 0EA3
        16. ລ 0EA5
        17. ສ 0EAA

############
############    
Consonants that can never end a syllable [unless followed immediately by the 
silencer 0ECC]
        1. ຫ 0EAB
        2. ຢ 0EA2
        3. ອ 0EAD
        4. ຮ 0EAE
        5.    ຼ 0EBC
        6. ໜ 0EDC
        7. ໝ 0EDD
############
############    
Extra final consonant
In order to type foreign words, Lao adds  0ECC to extra final consonants. 
Every consonant but
                1.    ຼ 0EBC
                2. ໜ 0EDC
                3. ໝ 0EDD]
are theoretically possible with some more common than others.

############
############    
Vowels that are written before the beginning consonant [syllable breaks ALWAYS 
occur before these characters and NEVER occur after these characters]
        1. ເ 0EC0
        2. ແ 0EC1
        3. ໄ 0EC2
        4. ໃ 0EC3
        5. ໂ 0EC4

############
############    
Vowels that are written after the beginning consonant [syllable breaks NEVER 
occur before these characters. Some vowels in this section and the proceeding 
section can be stacked. I can specify if necessary.]
        1. ະ 0EB0
        2. າ 0EB2
        3. ຳ 0EB3 [can also be written as 0ECD followed by 0EB2]
        4.   ິ 0EB4
        5.   ີ  0EB5
        6.   ຶ 0EB6
        7.   ື 0EB7
        8.   ຸ 0EB8
        9.   ູ 0EB9
        10.   ໍ 0ECD

############
############    
Vowels that are written between two consonants [syllable breaks NEVER occur 
before or after these characters]

        1.    ັ 0EB1 [The following character must be a consonant or 0EBD 
semi-vowel]
        2.    ົ 0EBB [The following character must be (an optional T marker) 1. 
consonant or 2. າ 0EB2 vowel when used in the /ow/ diphthong ( <0EC0> <bC> 
<(sbC)> <0EBB> <(T)> <0EBD>)  or 3.  ວ 0EA7 semi-vowel when used in the /ua/ 
diphthong (Note that the ວ may be followed by ະ 0EB0 for the shortened version 
of this diphthong. <bC> <sbC> <0EBB> <(T)> <0EA7> <(0EB0)>)]
        
############
############    
Vowels that can't take a final consonants
        
        1. ະ 0EB0 [syllable break ALWAYS occurs after this character]
        2.   ໍ 0ECD [syllable break ALWAYS occurs after this character or the 
optional tone mark immediately following it.]
        
        
############
############    
/ia/ Vowel and in old orthography /y/ which can replace the final ຍ 0E8D - see 
above

1. ຽ [can NEVER break before.  If it is a final /y/, then can break after]

############
############    
Tones. There are four tone marks that can sit on top of the initial consonant 
or on   ິ  ີ  ຶ  ື   ໍ   0EB4 - 0EB5 - 0EB6 - 0EB7 - 0ECD  (Note that 0EB5 and 
0EB7 also part of diphthongs—see below) Breaks can NEVER occur before these.

        1.   ່ 0EC8
        2.    ້  0EC9
        3.    ໊  0ECA
        4.    ໋  0ECB
        
############
############    
The silencer—a mark placed on a consonant rendering it silent. Only used to 
write foreign words. Usually placed on the last letter of a syllable, although 
it can occur in the middle of a syllable when placed on a ຣ 0EA3 or ລ  0EA5. A 
break can NEVER occur before the consonant upon which this character sits as a 
consonant containing this character (galan) can not begin a syllable.

        1.  ໌  0ECC
        
############
############    
The following punctuation marks can never begin a new line. Also not that 
English and French punctuation symbols and rules apply. ( Lao tends to add a 
space around punctuation as in French, but not always.  )  Quotes can be with " 
" or << >>
        1.   ໆ 0EC6
        2.      0EAF [Sorry, I can't find this on my unmarked mac keyboard]

############
############    
Vowel Diphthongs. Here is where it gets hairy as three consonant semi-vowels 
are involved. [See my explanation at the beginning of this document. 
Parentheses refer to optional characters)]
        1.  <0EC0>  <bC> <(sbC)>  <0EB6 or 0EB7>  <(T)> <0EAD> <(fC)> [eua 
vowel. Note that the beginning consonant is in the middle]

[Well, that wasn't so bad. I think that the other diphthongs are taken care of 
in previous rules and notes.]

############
############    
Consonants used as vowels between consonants.

        1. ວ 0EA7
        2. ອ 0EAD
        
[If ວ|ອ is preceded by a consonant (note optional tone mark) and followed 
immediately by a consonant that is not followed by a vowel or tone mark then 
consider C(T)ວ|ອC to be a syllable.]

############
############    
Yeah. The end.

--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Reply via email to