UAX 15 hangul composition

Theo Veenker Tue, 03 Aug 2004 05:13:38 -0700

Don't know if this has been asked/reported before, but is the example code
for hangul composition in UAX 15 correct?

The code is:
    public static String composeHangul(String source) {
        int len = source.length();
        if (len == 0) return "";
        StringBuffer result = new StringBuffer();
        char last = source.charAt(0);            // copy first char
        result.append(last);

        for (int i = 1; i < len; ++i) {
            char ch = source.charAt(i);

            // 1. check to see if two current characters are L and V

            int LIndex = last - LBase;
            if (0 <= LIndex && LIndex < LCount) {
                int VIndex = ch - VBase;
                if (0 <= VIndex && VIndex < VCount) {

                    // make syllable of form LV

                    last = (char)(SBase + (LIndex * VCount + VIndex) * TCount);
                    result.setCharAt(result.length()-1, last); // reset last
                    continue; // discard ch
                }
            }

            // 2. check to see if two current characters are LV and T

            int SIndex = last - SBase;
            if (0 <= SIndex && SIndex < SCount && (SIndex % TCount) == 0) {
                int TIndex = ch - TBase;
                if (0 <= TIndex && TIndex <= TCount) {

                    // make syllable of form LVT

                    last += TIndex;
                    result.setCharAt(result.length()-1, last); // reset last
                    continue; // discard ch
                }
            }

            // if neither case was true, just add the character

            last = ch;
            result.append(ch);
        }
        return result.toString();
    }

Suppose I feed it 0xAC00 0x11C3. 0xAC00 is an LV.
This will do step 2:

SIndex = 0xAC00 - 0xAC00 = 0
TIndex = 0x11C3 - 0x11A7 = 28

Which causes the "(0 <= TIndex && TIndex <= TCount)" to be true.
And the resulting output is 0xAC00 + 28 = 0xAC1C which is not
an LVT but an LV syllable!

The TIndex <= TCount should be TIndex < TCount I think. IMO the
example would be more clear if the Hangul_Syllable_Type property
would be used.


A somewhat related question. I know next to nothing about Hangul
[de]composition so forgive me for asking silly questions. In the
UnicodeData.txt file there are much more than the 19 L, 21 V, and
28 L jamos. Are the other jamos not use to compose syllables, or
does the syllable block represent an incomplete set of compatibility
characters? What's is it?

Theo

UAX 15 hangul composition

Reply via email to