On Apr 19, 2004, at 8:40 PM, Ernest Cline wrote:
For example, if there is a value of kIRGKungXi of the form XXXX.YY0 there will always be the same value for the kKangXi for that character and vice versa.
This is not a safe assumption. There are 37 cases where the kIRGKangXi field ends in 0 but the kKangXi field is different. (There are 252 instances total where the two fields differ.)
I'm trying to pare Unihan.txt down to a less unwieldy size for my own use by eliminating properties that are of no interest to me and would like to be certain that eliminating the four properties containing the actual values for those dictionaries can be done safely because the information can be reconstituted if necessary from the kIRG* properties since I'm not certain if those properties are of interest to me.
I'm not sure why you feel a need to recreate the four-dictionary sorting algorithm in the first place because it's really arbitrary and not all that useful in real life. In any even, it's (theoretically) based on the kIRGxxxx fields. The others are needed really only if you want to look the character up in the dictionary in question.
Also, even though the full Unihan database is 25+ Mb in size, given the cheapness of disk space nowadays, it's not all *that* big, surely.
======== John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/