Doug Ewell writes: > > I just have another question for Korean: many jamos are in fact > > composed from other jamos: this is clearly visible both in their name > > and in their composed glyph. What would be the linguistic impact of > > decomposing them (not canonically!)? Do Korean really learn these > > jamos without breaking them into their components? I think here about > > SSANG (double) consonnants, or the initial Y or final E of some > > vowels... > > This would be a good question for Jungshik or another native Korean. I > have read that Korean children learn the syllables as whole units, > rather than as an arrangement of jamos as I would see them, leading some > to think of Hangul as a featural syllabary instead of an alphabet.
The interesting part of this question is that Unicode allows Hangul syllables of the form L+L+V and L+V, which can sometime represent exactly the same abstract Korean grapheme cluster. For example the <SSANKIYEOK CHOSEONG> leading consonnant (L) is normally decomposable as <KIYEOK CHOSEONG, KIYEOK CHOSEONG> (L+L) which would be interpreted in Unicode as being in the same Korean syllable, and thus rendered as a single (and probably identical) grapheme cluster. However, Unicode does not handle this decomposition as canonically equivalent, and not even compatibility equivalent. So this may leave some place for additional folding operations for searches, which may be needed if some legacy charset was used to encode a text without the current precomposed (and currently not decomposable) double consonnants or double vowels. Mapping these simpler charsets, where the presence of a more complex character layout engine to render syllables was assumed the same way that Unicode assumes a composition engine for LV or LVT syllables if they are not directly implemented as distinct glyphs in Hangul fonts, could require such complex design choice for the mapping converter: Should the converter recognize double vowels and double consonnants in the legacy 8-bit charset as candidate for composition into a single Unicode jamo instead of two? Using two Unicode jamos would allow better interoperability with texts generally encoded with KSC5601 or Unicode. But this would break things if the compatibility mapping was not reversible. But nothing seems to forbid the mapping to separate Unicode jamos (thus excluding mapping to the "ligatured" double vowels or double consonnants encoded as undecomposable jamos in Unicode), to preserve an exact bijective mapping to/from that legacy mapping using more basic leading consonnant or trailing consonnant or vowel jamos. I think that you could even imagine a encoding where the distinction between leading and trailing consonnants is not made, assuming the (unmarked) phonology of Korean to recognize syllables, exactly the same way as it is done in Latin (with hyphenation dictionnaries), or using a _marked_ syllable break (mapped for example as ZWNJ in Unicode). Similar questions happen with Unicode text using "defective" Hangul syllables (for example just V+T or T) sometimes made less defective by marking the missing L or V jamos with explicit Lf or Vf fillers as <Lf,V,T> or <Lf,Vf,T> which cannot be composed today. The interesting case is <Lf,Vf,T> which will be noramlly rendered exactly as if it was a single <L> jamo, so a 8bit charset may simply choose to not encode the difference between leading and final consonnants if they are rendered the same, and if no filler is used in the 8bit mapping. In that case, the 8bit mapping will really have the effect of representing Hangul as a true alphabet, exactly similar to the Latin alphabet with simple vowels and consonnants, and ligatures created on the fly to create the printed syllables, using the horizontal and vertical composition rule inherent to that script for representing only graphically the syllables. In reality, the Hangul script seems to be really an alphabet that marks explicitly in the printed form the separation of effective syllables (as if we had to use a SHY between each syllable in the Latin script to print Latin text correctly). And neither the "johab" subset chosen by Unicode, not even the "choseong"/"jungseong" and "jongseong" subsets represent correctly the inherent structure of the Hangul script. That's why I am wondering if Korean children are really learning the jamos the way they are shown (with ligatures) in Unicode, or if they don't simply learn to recognize the non ligated forms in the bidimensional syllable layout. In that case, the script is much simpler to learn, as it has much less letters than what can be seen in Unicode. Isn't Unicode making a unnecessarily too complex representation of Hangul jamos? __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>