> > This is the kind of mess that has discouraged anybody from doing a
> > systematic survey of simplifications for the Unihan database.
> 
> Part of this is because there is the orthogonal complexity of
> variant TC forms.  Before converting TC to SC, one should resolve
> all TC variants to the most "common" or "standard" TC form (good
> luck deciding what that means).  e.g., in the above case, resolve to
> U+9EBD.

I think that any mapping will fail.  As so many things with CJK
characters, the usage depends on constraints beyond a character
encoding: time, location, purpose, etc.  This is the very reason why
CCCII hasn't succeeded.  As a consequence, the available fields are
not enough to really represent the interdependencies correctly.

Either increase the number of available keywords (e.g. kZVariant1,
kZVariant2) to be able to fine-tune the dependencies (something like
`character a in the meaning of b is a variant of character c', or add
a remark to the description of keywords that the fields can't be
exhaustive due to such and such reasons.


    Werner

Reply via email to