Hello,

I recently looked up the relationships traditional-simplified in the Unihan database (Unihan_Variants.txt).

I knew it had mistakes and I wanted to help correct some of them, but the first thing that stand out and surprised me was the large number of lines like :

U+346F  kSimplifiedVariant  U+3454
U+346F  kTraditionalVariant U+3454

which should be (if I didn't mix them up ...)

U+3454  kTraditionalVariant  U+346F
U+346F  kSimplifiedVariant U+3454

My quickly done parsing program counted 1154 such pairs, where the head character was the same as the character above. It seems to be always in the order "kTraditionalVariant" then "kSimplifiedVariant", so can maybe be automatically corrected. It seems to be a very evident mistake, and the correction should be easy. I can help with that, I am just waiting to see if this is the right place to report problems in Unihan. I also considered http://www.unicode.org/reporting.html , would it be better ?

I have a lot of other questions and comments on these simplified/traditional relationships, but I guess it will wait the resolution of this problem, this would make for a too long email.

Regards,

Koxinga




Reply via email to