In specific cases you may use one character conversion mapping instead of
two, but you should be very careful about that. See
http://www.unicode.org/unicode/reports/tr22/, especially "1.2.1 Best-Fit
Mappings"
Mark
----- Original Message -----
From: "Lars Marius Garshol" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Monday, January 08, 2001 06:53
Subject: Re: GBK, HZ and EUC-TW
>
> * Tom Emerson
> |
> | Ken Lunde's "CJKV Information Processing" has a good description of
> | the evolution and interrelationships between the GB standards.
>
> Actually, I disagree with that. It has a description, but IMHO it
> leaves much to be desired. I can't understand why people keep
> praising this book. You can get the information you need from it, but
> in my experience doing so involves a lot of flipping back and forth,
> several rereadings and some guesswork at the end.
>
> | As far as mapping tables go, the best one you'll find is the
> | Microsoft or ICU mapping tables. I personally have not seen an
> | official mapping table from GB 13000. As others have noted,
> | Microsoft has extended the "pure" GBK with Euro, and perhaps other
> | code points.
>
> Hmmm. Does this mean that it is best to support the Microsoft
> extensions, or that it is best not to do so? I guess we will be
> forced to support them sooner or later, and that we might as well do
> it now to save everyone some bother.
>
> | GB 2312:80 is a proper subset of GBK, so you can map EUC-CN encoded
> | text to Unicode using a GBK mapping table. Be aware, though, that
> | going the other direction can be problematical: GBK can contains
> | code points that do not exist within GB 2312:80, so you need to be
> | careful going the other direction.
>
> I was thinking of having a single X->Unicode converter for both GBK
> and EUC-CN. I am still uncertain as to whether that really is a good
> idea, though.
>
> --Lars M.
>