Re: Unihan.txt and the four dictionary sorting algorithm

John Jenkins Tue, 20 Apr 2004 13:52:13 -0700

On Apr 19, 2004, at 8:40 PM, Ernest Cline wrote:

For example, if there is a value of kIRGKungXi of the form
XXXX.YY0 there will always be the same value for the
kKangXi for that character and vice versa.

This is not a safe assumption. There are 37 cases where the kIRGKangXi field ends in 0 but the kKangXi field is different. (There are 252 instances total where the two fields differ.)

I'm trying to pare Unihan.txt down to a less unwieldy size
for my own use by eliminating properties that are of no
interest to me and would like to be certain that eliminating
the four properties containing the actual values for those
dictionaries can be done safely because the information
can be reconstituted if necessary from the kIRG*
properties since I'm not certain if those properties
are of interest to me.

I'm not sure why you feel a need to recreate the four-dictionary sorting algorithm in the first place because it's really arbitrary and not all that useful in real life. In any even, it's (theoretically) based on the kIRGxxxx fields. The others are needed really only if you want to look the character up in the dictionary in question.

Also, even though the full Unihan database is 25+ Mb in size, given the cheapness of disk space nowadays, it's not all *that* big, surely.

========
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/

Re: Unihan.txt and the four dictionary sorting algorithm

Reply via email to