On Fri, 5 Jan 2001, Lars Marius Garshol wrote:

> * Thomas Chan
> | One way to find GBK pages is to look for "GB2312" pages (aka EUC-CN)
> | with codepoints outside the EUC ranges.  e.g., pages discusing ZHU
> 
> Can I take this to mean that it is common practice to use GBK in pages
> and to label them as GB2312?

If they aren't unlabeled, or mislabeled as ISO 8859-1 or CP1252...


> Given that the one is a subset of the other, it sounds as though my
> application really should use the GBK converter both for GBK pages and
> for GB2312 pages.

I'd compare the tables first, e.g., current versions of CP936 have a Euro
that snuck in there that isn't part of GBK.  You might want to separate
them anyway for other reasons.
 

> I now have four test pages for GBK, a tiny one for HZ and none at all
> for EUC-TW.  Unless someone knows of something I suppose I will have
> to make test pages myself with some conversion tool.

You can get HZ encoded pages from http://www.cnd.org/HZ/Classics/ .  You
might have hunt around for ones that include rows of English text for
testing purposes.

I don't know where to get EUC-TW encoded data.


Thomas Chan
[EMAIL PROTECTED]


Reply via email to