> > This is the kind of mess that has discouraged anybody from doing a > > systematic survey of simplifications for the Unihan database. > > Part of this is because there is the orthogonal complexity of > variant TC forms. Before converting TC to SC, one should resolve > all TC variants to the most "common" or "standard" TC form (good > luck deciding what that means). e.g., in the above case, resolve to > U+9EBD.
I think that any mapping will fail. As so many things with CJK characters, the usage depends on constraints beyond a character encoding: time, location, purpose, etc. This is the very reason why CCCII hasn't succeeded. As a consequence, the available fields are not enough to really represent the interdependencies correctly. Either increase the number of available keywords (e.g. kZVariant1, kZVariant2) to be able to fine-tune the dependencies (something like `character a in the meaning of b is a variant of character c', or add a remark to the description of keywords that the fields can't be exhaustive due to such and such reasons. Werner