Re: Hanzi trad-simp folding and z-variants

Stephan Stiller Sat, 08 Jun 2013 22:37:20 -0700

So we both agree that Unihan is not designed to tell people how tocovert between traditional and simplified characters.

Yep.

Though some confusion as what other questions are being discussed here.

I think I misused the expression "folding" at some point. But theoriginal query explicitly asked about "do[ing] traditional to simplifiedfolding for indexing and query processing (/when the mapping isunambiguous/)" (emph added) so I wasn't really sure where parts of thediscussion were going :-)

Japanese has well established traditions for simplifying CJKideographs which are not identical to Chinese if one was to use afolding approach to deal with simplifications then there should bedifferences for Chinese and Japanese.

I think the kyūjitai-shinjitai mappings are not in Unihan. (Compare theentries of 廣 (U+5EE3) and the characteristically Japanese character 広(U+5E83).) I know that certain contexts retain older forms (KenL talksabout this somewhere too). Btw if you know about other mappings or goodresources, I'll be curious to know.

"quite well documented" is a relative term

I highly respect the work in Cheung & Bauer, but it makes no attempt totell us how easily understood the characters are. Many of them aread-hoc coinages that are not understood by any of my informants;sometimes for say 6 ways of writing a syllable-morpheme, I can make myinformants tell me that perhaps /one/ of them is passable. This problemisn't easily solved, but then the source isn't helpful in knowing whichout of the approx 1000 characters are actually used nowadays. I won'tgive you a number, as I'd have to check more carefully to be quotable.The number of morphemes for which there truly seems to be no writtenrepresentation is /very/ low, but often the characters in existencearen't exactly comprehensible to many native speakers either, and notall of them are unambiguous. This will give you an idea.

Zhuang Sawndip

Sounds exciting.

By best choice do you mean (a) the person producing the electronicform was unable to use the character they wishedbecause either it is not yet in Unicode (b) even though in Unicodethe person was did not know how to type it so type another characterinstead (c) a less than perfect, or ambiguous, 'spelling' . All ofwhich are found both for Sinitic languages and non-Sinitic languageswhen written in CJK ideographs, be it printed publications, web-pagesor text messages between native speakers.

Nearly all of Cantonese is in Unicode and therefore typeable in theory(though some people will not be used to such writing, but I'm sure youknow this), so it's not (a). I would say it's largely (c) (people willoften make up their own plausible thing), even though (b) is a reason too.

Not standardize does not mean totally beyond analysis or processing,or even necessarily that confusing to a native speaker, they are notrandom, though admittedly more complex than a standardized locale.

Yes. And we both agree that standardization is desirable.

Stephan

Re: Hanzi trad-simp folding and z-variants

Reply via email to