I've developped an open-source, multi-platform desktop application called 
Unicode Plus <https://github.com/tonton-pixel/unicode-plus>, which is a set of 
utilities related to Unicode, Unihan and emoji.

The basic Unihan-related utilities are almost completed, and now I would like 
to add more useful information about the Unihan variants:

1. First option: "Linear Information"

- A linear list of all the variants *related* to one given Unihan character 
would be displayed, similar to what can be found in Apple's Character Viewer 
(or Palette), or in the "Unihan Variant Dictionary" application.

- Two sources of data could be merged:

        1. The information provided by the "Variants table for Unicode" data 
file UniVariants.txt 
<http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/ftp/CJKtable/UniVariants.Z> by 
Prof. Kōichi Yasuoka.
        
        2. The information extracted from the relevant Unihan DB tag 
properties: kSemanticVariant, kSimplifiedVariant, kSpecializedSemanticVariant, 
kTraditionalVariant, kZVariant.

- Discarding self-variants, assuming that Z-variants are somehow symmetrical, 
and possibly merge the different types of variants tags would result into 
independant sets of *related* Unihan characters. Acessing the info would then 
simply imply testing which set a given character belongs to, and omit the 
character itself for display.

- This kind of information is most certainly user-friendly, however it lacks 
structural information about the relationships between the different variants.

2. Second option: "Structured Information"

- This is probably more ambitious and challenging: ideally, the information 
could be displayed graphically as a diagram of characters joined by arrowed 
links, indicating the type of variant. It would support one-to-one, one-to-many 
and many-to-one relationships...


Any ideas, comments, suggestions are most welcome...

-- Michel MARIANI

Reply via email to