On Wed, 27 Mar 2002, Anton Tagunov wrote: Hi, Anton,
> Very glad to hear you on this list :-) Me, too :-) > >> When you say gb2312 and ksc5601, EUC-based encoding is assumed. > > JS> Please, don't help spread this misuse. Well, that was not meant to be applied to GB2312 :-). Below is more extensive excerpt where I wrote that sentence: JS> Please, don't help spread this misuse. It might be all right JS> for the ignorant) public to say KS C 5601 in place of EUC-KR, but Perl JS> programmers should learn the difference between KS C 5601/KS X 1001 (coded JS> character set) and encoding/MIME charset/character set encoding scheme/ JS> character coding. JS> As I wrote before, GB 2312 has been so widely (mis)used that there's JS> no way to replace it with EUC-CN. Korean situation is much better JS> although not as good as Japanese case. It could have been misunderstood..... > Jungshik, one little point on GB2312.. Maybe I misunderstand > something, but No, you're absolutely right about IANA. See below. > IANA registry (http://www.iana.org/assignments/character-sets) > has > > Name: GB2312 (preferred MIME name) > MIBenum: 2025 > Source: Chinese for People's Republic of China (PRC) mixed one byte, > two byte set: > 20-7E = one byte ASCII > A1-FE = two byte PRC Kanji > See GB 2312-80 > PCL Symbol Set Id: 18C > Alias: csGB2312 > > do not know when was that put in, but it looks EUC-CN. Is it? > And if yes, then GB2312 is a perfectly valid charset, isn't it? Yes, it's EUC-CN. I was about to add that although EUC-CN is a better name than GB2312, the former has never been registered with IANA while the latter was as 'preferred MIME name, You got there first :-). It's unfortunate that PRC decided to do this way, but that's what we got and I think we have to respect their decision. > And thank you for explaining how it happened that Korean > misuse the name of a CCS for charset :-) You're welcome :-) Actually, I told you only half the story :-). The other half happened before the widespread use of Internet in Korea (i.e late 1980's and early 1990's) when people typically refered to what's now called EUC-KR as 'KS C 5601 Wansung' (= US-ASCII in GL and KS C 5601 in GR). It was not technically correct, but didn't do much harm because there's no need for exchange of data over the internet. EUC (Extended Unix Code: it's not Extended Unix Character) for Korean was first specified in KS C 5861-1992 (now KS X 2901), but the name EUC-KR appeared first in RFC 1557 where ISO-2022-KR was defined. It would have been better if RFC 1557 had been more explicit in its description of EUC-KR so that IANA entry for EUC-KR is patterned after that for EUC-JP(GB2312 -> EUC-CN) with all the code sets and their octet ranges. Perhaps, they thought just refering to KS C 5861-1992 was sufficient. ---------- Name: EUC-KR (preferred MIME name) [RFC1557,Choi] MIBenum: 38 Source: RFC-1557 (see also KS_C_5861-1992) Alias: csEUCKR ---------- Name: Extended_UNIX_Code_Packed_Format_for_Japanese MIBenum: 18 Source: Standardized by OSF, UNIX International, and UNIX Systems Laboratories Pacific. Uses ISO 2022 rules to select code set 0: US-ASCII (a single 7-bit byte set) code set 1: JIS X0208-1990 (a double 8-bit byte set) restricted to A0-FF in both bytes code set 2: Half Width Katakana (a single 7-bit byte set) requiring SS2 as the character prefix code set 3: JIS X0212-1990 (a double 7-bit byte set) restricted to A0-FF in both bytes requiring SS3 as the character prefix Alias: csEUCPkdFmtJapanese Alias: EUC-JP (preferred MIME name) Jungshik Shin