On Tue, 26 Aug 2003, Kent Karlsson wrote: Kent,
Thank you for your work on Korean sorting and sorry for my late reply. I'll be very brief because I have something urgent to take care of. > Jungshik Shin wrote: > > You may wish to look at > http://std.dkuug.dk/JTC1/SC22/WG20/docs/n1051-hangulsort.pdf > which contains a much updated version of my paper on the subject. > The table entries are also found in plain text form at > http://std.dkuug.dk/JTC1/SC22/WG20/docs/n1051t-table-hangulctt6.txt Wow, you've created all these entries. Thanks. > > After a thread of emails exchanged, Mark Davis and I found > > that both of us > > are more or less in the same page as to how Hangul letters be > > collated. > > In summary, > > > > 1. Weights for T, V, and L should be assigned in such a way that > > T < V < L for all T, V, and L's > > That would be L < T < V; but that is complicated by the actual need for > (the superficially contradictory) V < L < T < V, with the latter T and V > after all scripts. I'm not following you here. 'T < V < L' works well in Mark's and my scheme for the most generic form of Korean syllables, 'L+V+T*' as far as South Korean collation rules are concerned. > The Vs at two radically different positions in the table > is for different positions of the V in a syllable; V < L is for first V in > a syllable, T < V is for non-first Vs in a syllable. Aha, you're talking about your scheme. > > 2. Expand precomposed (cluster) Jamos into sequences of component > > basic Jamos > > Needed for covering all combinations of Jamos. If limited to (a superset) > of modern Jamo, this expansion can be avoided. Absolutely. > referenced above, which lists the weightings and contractions needed for > avoiding this expansion in many (but not all) cases. > > > 3. Terminate every syllable with 'TERM' that has a lower weight than > > all T's (there's an alternative to this, but both favors this > > more than the alternative) > > This can be avoided if the weighting is done in a particular way. > See my paper for details. Indeed. However, I'm wondering if avoiding TERM is a better trade-off than avoiding seemingly more complex(than Mark's and mine) scheme of yours that also requires pre-handling. Could you give me some rationale behind your preferring yours to the other? Is it because it's better suited to tailoring for North Korean? I haven't given much thought to North Korean collation rules recently (at the moment, I have to look them up again to refresh my memory.) Jungshik

