Re: Status of Unihan Mandarin readings?

2002-12-20 Thread Raymond Mercier
On the errors in kMandarin: Apart from the kMandarin errors of the kind that Andrew West has noted, there is another corruption, namely the loss of ü, and this happened between 3.0b1 and 3.0b2, when the ü became the two bytes C393. As to Han/Yi, U+6C49 YI4 HAN4 is found not only in 3.0b1 and

Re: Status of Unihan Mandarin readings?

2002-12-20 Thread Raymond Mercier
At 08:44 AM 12/20/2002 -0700, you wrote: That's because the file was converted to UTF-8. Previously it had not been in any single encoding, which was creating problems Well, OK, but should you have created by now some sort of program that checks the file whenever you make a change - a sort

Status of Unihan Mandarin readings?

2002-12-19 Thread Marco Cimarosti
I have tried to follow the discussion about the errors in field kMandarin of file Unihan.txt but, after a while, I lost my way with all those dictionary references... Could someone kindly make a short summary of the situation? Here are my biggest ???'s: - Are the errors really there? - Any

Re: Status of Unihan Mandarin readings?

2002-12-19 Thread Andrew C. West
On Thu, 19 Dec 2002 04:58:08 -0800 (PST), Marco Cimarosti wrote: I have tried to follow the discussion about the errors in field kMandarin of file Unihan.txt but, after a while, I lost my way with all those dictionary references... Could someone kindly make a short summary of the

Re: Unihan Mandarin Readings

2002-12-07 Thread John H. Jenkins
On Tuesday, December 3, 2002, at 03:17 AM, Andrew C. West wrote: BTW, is it possible for Unicode to provide a Unihan.xml version of the Unihan database ? The first thing I do is convert the Unihan.txt file into XML format for ease of processing. As a rule, we tend to stick to older formats

Re: Unihan Mandarin Readings

2002-12-03 Thread Andrew C. West
John H. Jenkins wrote: Certainly in the Unicode 4.0 time-frame we can improve things. I can't make any guarantees, however. Thanks for the response. I've got an old 3.1 version of the Unihan database at home, and I was going to complain that the Radical.Stroke index values given for U+20003

Unihan Mandarin Readings

2002-12-02 Thread Andrew C. West
Whilst writing a CJK pinyin lookup utility over the weekend I noticed that for some CJK ideographs in the Unihan database that have multiple Mandarin readings, the secondary reading(s) have been wrongly associated with adjacent or nearby ideographs. For example : U+543E kMandarin WU2 YA5

Re: Unihan Mandarin Readings

2002-12-02 Thread John H. Jenkins
Is it possible to regenerate the Unihan database with the correct secondary Mandarin readings ? Certainly in the Unicode 4.0 time-frame we can improve things. I can't make any guarantees, however. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/