On the errors in kMandarin:
Apart from the kMandarin errors of the kind that Andrew West has noted,
there is another corruption, namely the loss of ü, and this happened
between 3.0b1 and 3.0b2, when the ü became the two bytes C393.
As to Han/Yi, U+6C49 YI4 HAN4 is found not only in 3.0b1 and
At 08:44 AM 12/20/2002 -0700, you wrote:
That's because the file was converted to UTF-8. Previously it had not
been in any single encoding, which was creating problems
Well, OK, but should you have created by now some sort of program that
checks the file whenever you make a change - a sort
I have tried to follow the discussion about the errors in field kMandarin
of file Unihan.txt but, after a while, I lost my way with all those
dictionary references...
Could someone kindly make a short summary of the situation? Here are my
biggest ???'s:
- Are the errors really there?
- Any
On Thu, 19 Dec 2002 04:58:08 -0800 (PST), Marco Cimarosti wrote:
I have tried to follow the discussion about the errors in field kMandarin
of file Unihan.txt but, after a while, I lost my way with all those
dictionary references...
Could someone kindly make a short summary of the
On Tuesday, December 3, 2002, at 03:17 AM, Andrew C. West wrote:
BTW, is it possible for Unicode to provide a Unihan.xml version of the
Unihan
database ? The first thing I do is convert the Unihan.txt file into
XML format
for ease of processing.
As a rule, we tend to stick to older formats
John H. Jenkins wrote:
Certainly in the Unicode 4.0 time-frame we can improve things. I can't
make any guarantees, however.
Thanks for the response. I've got an old 3.1 version of the Unihan database at
home, and I was going to complain that the Radical.Stroke index values given for
U+20003
Whilst writing a CJK pinyin lookup utility over the weekend I noticed that for
some CJK ideographs in the Unihan database that have multiple Mandarin readings,
the secondary reading(s) have been wrongly associated with adjacent or nearby
ideographs. For example :
U+543E kMandarin WU2 YA5
Is it possible to regenerate the Unihan database with the correct
secondary
Mandarin readings ?
Certainly in the Unicode 4.0 time-frame we can improve things. I can't
make any guarantees, however.
==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/
8 matches
Mail list logo