Hello,

In the Unihan.txt database, in the kMandarin field there are entries
with duplicate pronunciations. For example:

U+4E21  kMandarin       1 LIANG3 2 LIANG3 3 LIANG4
U+4E4E  kMandarin       1 HU1 HU2 2 HU1
U+4E86  kMandarin       1 LIAO3 2 LE LIAO3

Is there a reason for these duplicates? If this is the case, the
format of this field should be documented better in the header. If
these duplications are errors, I can supply a list of them.

Also, what's the meaning of the isolated numbers?

----------------

Other entries certainly contains errors, for example:

U+5594  kMandarin       1 WO1 2 01
                                ^ this is zero.

U+4EC0  kMandarin       1 SHI2 2 SHEN2 3 SHI2 SHIU2SHEN2 SHI2
                                              ^^^^ ?? --> shi2 shen2 ??

Regards,
  Pierpaolo Bernardi

Reply via email to