routine doc [PATCH] to Supported.pod, Unicode.pm

2002-04-10 Thread Anton Tagunov
Hello, Dan! - several typos - excludes GBK from that section because it is discussed in Microsoft-related Just routine. /Anton/ P.S. You've done a cozy text aligment-arrangement, nice to look at! :-) --- ext/Encode/lib/Encode/Supported.pod.origWed Apr 10 01:13:28 2002

Re: [Encode] 1.31 in a few hours

2002-04-08 Thread Anton Tagunov
Hello, Dan! Anton> ($name = lc $name) =~ tr/- //d; DK> I'll think about it but the priority is low. 250% fine :-) Anton> jisx0208-raw vs jis0208-raw? DK> Wel, We don't have to be *that* pedantic, methinks. T'was to allow find_encoding('JIS X 0208-raw') if +($name

Re: [Encode 1.30] BOM32LE was incorrect - fixed

2002-04-08 Thread Anton Tagunov
Hello, Dan! >> --- ext/Encode-1.30/lib/Encode/Unicode.pm.orig Mon Apr 8 14:06:28 2002 >> +++ ext/Encode-1.30/lib/Encode/Unicode.pm Mon Apr 8 17:00:47 2002 DK> Thanks. Applied. Always welcome! :) -- Other items in my '[PATCH]s and questions [Encode] 1.30' mail were: - a consmeti

one of the [PATCH]'s revised ( [Encode] 1.30 )

2002-04-08 Thread Anton Tagunov
Hello, Dan! AT> 1) [PATCH] AT>Justification: http://www.unicode.org/unicode/faq/utf_bom.html#25 AT> --- ext/Encode-1.30/lib/Encode/Unicode.pm.orig Mon Apr 8 14:06:28 2002 AT> +++ ext/Encode-1.30/lib/Encode/Unicode.pm Mon Apr 8 14:49:24 2002 Patch has been revised again a bit: also fi

[PATCH]s and questions [Encode] 1.30

2002-04-08 Thread Anton Tagunov
Hello, Dan! 1) [PATCH] Justification: http://www.unicode.org/unicode/faq/utf_bom.html#25 --- ext/Encode-1.30/lib/Encode/Unicode.pm.orig Mon Apr 8 14:06:28 2002 +++ ext/Encode-1.30/lib/Encode/Unicode.pm Mon Apr 8 14:49:24 2002 @@ -12,7 +12,7 @@ sub FBCHAR(){ 0xFFFd } sub BOM_BE(){ 0

A modest patch [Encode] 1.26

2002-04-07 Thread Anton Tagunov
Hello, Dan! Very modest: typos, C<>, wording, uhc, x-windows-949, Windows-31J /Anton/ --- E:\anth\tmp\perl\b2\ext\Encode\lib\Encode\Supported.pod.origSun Apr 7 20:39:07 2002 +++ E:\anth\tmp\perl\b2\ext\Encode\lib\Encode\Supported.pod Mon Apr 8 03:22:03 +2002 @@ -454,7 +454,7 @@

Re[2]: - charset + character set + coded character set + CCS (?) (was: [Encode] Encode::Supported revised)

2002-04-06 Thread Anton Tagunov
Hello, Jungshik! http://tagunov.tripod.com/survey2.html is largely an answer, so, if you allow, I will comment with links into this page :) JS> On the other hand, no one with *sufficient understanding* JS> of the issue uses 'character set' to mean encoding. ISO> coded character set; code ISO>

"Chracter set terminology survey" ready

2002-04-05 Thread Anton Tagunov
Hello Dan! Hello Jungshik! Hello other developers and experts! I have finally completed my survey named CHARACTER SET" TERMINOLOGY SURVEY, CLASSIFICATON OF CJK AND NON-CJK CHARACTER SET STANDARDS. VERSION 0.95 It is available at http://tagunov.tripod.com/survey2.html The main purpose of this s

Add Windows-31J => cp932 alias? Windows-31J ever used?

2002-04-05 Thread Anton Tagunov
Hello, gentlemen! Have just stumbled over Name: Windows-31J MIBenum: 2024 Source: Windows Japanese. A further extension of Shift_JIS to include NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119). The CC

Re[2]: [PATCH 1/2 + 0.1] Supported.pod

2002-04-05 Thread Anton Tagunov
Hello Dan! DK>I am now working on the new revision of Supported.pod AFTER this patch DK> is applied. I will post the whole thing tonight. I'm very glad! :-)) Could we have a couple more patching cycles then? I already see two typo's in my text :-( - Anton

Re: [PATCH 1/2 + 0.1] Supported.pod

2002-04-05 Thread Anton Tagunov
Hello! Have just read Jungshik's mail and have patched Supported.pod a bit more: added (x-)windows-949 aliases stuff. --- ext/Encode/lib/Encode/Supported.orig.podFri Apr 5 01:00:36 2002 +++ ext/Encode/lib/Encode/Supported.pod Fri Apr 5 15:18:25 2002 @@ -63,7 +63,7 @@ ascii US-as

windows-949 vs cp949 and misc

2002-04-05 Thread Anton Tagunov
Hello, Jungshik! JS> One thing I don't agree with him is what designation JS> to use for CP949. I think it'd better be 'windows-949' To me that's no problem. Currently I have written Proper name: C. Proper names: C, C. Proper name: C. How do you advice to rewrite this? JS> because that's J

[PATCH 1/2] Supported.pod

2002-04-05 Thread Anton Tagunov
Hello, experts! Have splitted my patch to Supported.pod into two levels. This is the general utility patch that does not have my arguable changes [level 1/2]. - fixes some typos - rewords section on UTF-16 - adds 'charset (MIME context)' to glossary - adds a reference to Ken's CJKV book Dan?

Re: [PATCH] Supported.pod: cleanup/UTF-16/CJK.inf + an invasion to the Glossary

2002-04-04 Thread Anton Tagunov
Hello, Dan and Jungshik! Speaking of the patch.. AT> +=item Jungshik Shin's Hangul FAQ AT> + AT> +L AT> + AT> +And especially it's subject 8 AT> + AT> +L AT> + AT> +has a comprehensive overview of the C (Korean) standards. AT> +Tha author cla

[PATCH] Supported.pod: cleanup/UTF-16/CJK.inf + an invasion to the Glossary

2002-04-04 Thread Anton Tagunov
Hello, Dan! Hello, Jungshik and Autrijus! Here's a new patch for Supported.pod. It does - minor cleanup - rewrites section on UTF-16 (I hope you and Jungshik will like it :-) - adds a link to Jungshiks reference on Korean character set standards - tries to add a link to Ken Lunde's offline book

qr/^UCS2-le$/i => '"UCS-2"' -- what is it?

2002-04-04 Thread Anton Tagunov
Hello, Dan! A two-pence question (very quick and probably foolish :-) What is the marked alias about? define_alias( qr/^UCS-2LE$/i=> '"UTF-16LE"', qr/^UCS2-le$/i=> '"UCS-2"', ); define_alias( qr/^UTF-16BE$/i

Re[2]: [PATCH] Re: [Encode] Encode::Supported revised

2002-04-04 Thread Anton Tagunov
Hello Dan! Double glad to hear from you ;-) Anton> This patch .. DK> Spahsseebah. ??? :-)) DK> Will be reflected in the next revision. :-) DK> __ DK>/ | DK> /---+ AH I guess this is some Ideograph :-) DK> P.S. I tried to compose UTF-8 version thereof, Dan, you're so mysterious tod

Is this a bug?

2002-04-04 Thread Anton Tagunov
Hello, Dan! 1) This is a small reminder: --- ext/Encode/t/Aliases.t.orig Sat Mar 30 01:05:57 2002 +++ ext/Encode/t/Aliases.t Thu Apr 4 15:56:21 2002 @@ -29,6 +29,7 @@ 'arabic' => 'iso-8859-6', 'greek'=> 'iso-8859-7', 'hebrew' => 'iso-8859-8', +

- charset + character set + coded character set + CCS (?) (was: [Encode] Encode::Supported revised)

2002-04-04 Thread Anton Tagunov
Hello Jungshik! Our comments go in the same direction, but will you let me strengthen your statements a bit? >> =head1 Encoding vs. Charset JS> Whether you like it or not, 'charset' is overloaded by MIME to mean JS> 'encoding' (Character set Encoding Scheme=CES as defined in RFC 2130). Indeed it

[PATCH] Re: [Encode] Encode::Supported revised

2002-04-04 Thread Anton Tagunov
Hello, Dan! 1) This my second portion of comments on the renewed Supported.pod. This part is 100% orthogonal to the first part 2) This patch - changes status of KOI8-U on Jungshik's comment (sorry, I have never tested that myself :-( - upgrades GB2312 to the "first class citizen" (why not?)

Re: [Encode] Encode::Supported revised

2002-04-03 Thread Anton Tagunov
Hello, Dan! None of the Encode team knows Hebrew enough (ISO-8859-8, cp1255 and - MacHebrew are supported because and just because there were mappings + MacHebrew are supported just because there were mappings available at L). Contribution welcome. ? - Anton

Re[2]: [Encode] Encode::Supported revised

2002-04-03 Thread Anton Tagunov
DK>> "Encoding vs Charset" AT> Hmm.. I seem to have a "special opinion" on this! AT> Though I'm still rewriting this I'm making half-cooked variant AT> available: AT> http://tagunov.tripod.com/survey2.html AT> (under construction) AT> (http://tagunov.tripod.com/survye.html A typo: http://tagun

Re: [Encode] Encode::Supported revised

2002-04-03 Thread Anton Tagunov
Hello, Dan! DK>Encode is near completion. I am still bulding djgpp environment for DK> possible fixes needed but anything else is over. My congratulations! :-) DK>(*9) Nicknamed Latin0; Euro sign as well as French and Finnish DK> letters that are missing from 8859-1 are added.

Re[2]: Are GB 18030 and CNS 11643-1992 the best spellings?

2002-03-30 Thread Anton Tagunov
Hello, Jungshik! 1) GB 18030, CNS 11643-1992 JS> I guess the official designation is GB 18030-2000. JS> I believe they'reCNS 11643-1992 or CNS 11643-1986. Ken's online has the later one CNS 11643-1992 (the reason I asked you, Junghik, was I was not sure if Ken's online has t

Are GB 18030 and CNS 11643-1992 the best spellings?

2002-03-28 Thread Anton Tagunov
Hello, Autrijus! Hello Jungshik! Hello, developers and experts! Writing a bit of an article, putting in there all I have learnt about CJK encodings on the Internet and at [EMAIL PROTECTED] Has already taken me a week :-) Could you help me with this: Is GB 18030 the best spelling for this encodi

Re[2]: let's cook it!

2002-03-27 Thread Anton Tagunov
Hello Jungshik! JS> MS should have registered CP949/950 as Windows-949/950 JS> instead of labeling them misleadingly as ks_c_5601-1987 and big5, In case JS> of gb2312, gbk should be registered and used. I don't know about big5, JS> but in Korean case, apparently they tried to pretend that they

Re: Encode::CJKguide (> 500 lines Long!)

2002-03-27 Thread Anton Tagunov
Hello, Dan! 1) That's been a great job! Especially the way you have explained the 0x21-0x7E and 0xA1-0xFE ranges via the tables, I like it! :-) 2.1) And.. maybe just strip off the Unicode part and we'll get a good guide on CJK? It's a great thing to have the CJK explanation bundled, isn't it?

Re: [GB2312] Please, don't help spread this misuse

2002-03-26 Thread Anton Tagunov
Hello Jungshik! Very glad to hear you on this list :-) >> When you say gb2312 and ksc5601, EUC-based encoding is assumed. JS> Please, don't help spread this misuse. Jungshik, one little point on GB2312.. Maybe I misunderstand something, but IANA registry (http://www.iana.org/assignments/cha

Re: [Encode] Encoding vs. Charset

2002-03-26 Thread Anton Tagunov
Hello Dan! DK> ... I have found that most of Chinese (Continental; seems like DK> Taiwanese are much more technically correct) and Korean mails and web DK> pages confuse "charset" and "encodings". I'm fixing a small article on that right now (maybe you have already read the first edition, but I

Re: roman8 -> hp-roman8 ?

2002-03-26 Thread Anton Tagunov
Hello, Dan! DK> Encode hackers, DK>I blindly generated roman8.ucm out of roman8.enc, not knowing what it DK> is. I am now convinced this is hp-roman8 but I am not 100% sure yet. DK> http://www.iana.org/assignments/character-sets >> Name: hp-roman8 [HP-PCL5

Re: Encode-0.98 available

2002-03-26 Thread Anton Tagunov
DK> ! JP/JP.pm DK>Now Encode::JP is more strict on the difference between ISO-2022-JP DK>and ISO-2022-JP-1. See JP/JP.pm for details. I hope this move DK>makes Anton happier :) It has :-))) DK>FYI the previous version implements DK>ISO-2022-JP as ISO-2022-JP-1 since it had

maito: for comments on the Supported.pod Re[2]: URL in L<>, non-ascii text in pod

2002-03-24 Thread Anton Tagunov
Hello, Dan! Anton> Dan, what should we do about the L<...> in Anton> Anton> Please feel free to send your comments, disagreements and Anton> additions to L<...>. DK>0.98 or above does use L<> for URIs; Anton> Dan, I wanted to ask what email should we put in Supported.pod, Anton> requesting

Re: URL in L<>, non-ascii text in pod

2002-03-24 Thread Anton Tagunov
Hello Dan! Anton> Ooops, here goes the patch :-) DK>I will apply and fix some. Thanks, I see it there :-) Dan, what should we do about the L<...> in Please feel free to send your comments, disagreements and additions to L<...>. ? I meant this to replaced by some mail address. Probably [

A technical note for review: Using CJK coded character sets in raw encoding: a common source of confusion

2002-03-24 Thread Anton Tagunov
Hello, my friends! Now you'll see my dirty face and bad intentions finally! I will dare to ask you have a glance to a small technical note I have written (inspired by long-long.. talks on this list and by trying to understand the mysteries of CJK encodings) If you have enough patience to read my

Re[2]: gb2312 (whatever it is) refuses to encode space, \n, latin letters?

2002-03-24 Thread Anton Tagunov
Hello Autrijus! Anton> On Sun, Mar 24, 2002 at 10:32:40AM +0300, Anton Tagunov wrote: Anton> Maybe I'm overusing you kindness :-) Autrijus> nope. :-) Anton> perl15452 -MEncode -we "print Encode::encode('gb2312',' ')" Anton> perl154

Fwd: Better names for JIS X 0201/0208/0212? (was: ISO-8859-1 vs ISO 8859-1 (typo + UTF8 case too :)

2002-03-24 Thread Anton Tagunov
Hello, experts! I'm sorry if you receive this message for a second time - I am not sure if it reached [EMAIL PROTECTED] at my previous attempt. --- I certainly think that the names like 'JIS 0201' are embarrassing. Here's rfc1345 &charset JIS_X0201 &alias X0201 ...8-bit, JIS-Rom

Re: [PATCH][Supported.pod] Encoding classification updata

2002-03-23 Thread Anton Tagunov
Ooops, here goes the patch :-) --- ext/Encode/lib/Encode/Supported.pod.origSat Mar 23 01:51:30 2002 +++ ext/Encode/lib/Encode/Supported.pod Sun Mar 24 10:12:25 2002 @@ -202,55 +202,87 @@ =head1 Encoding Classification (by Anton Tagunov) -Encodings +This section tries to classify the

gb2312 (whatever it is) refuses to encode space, \n, latin letters?

2002-03-23 Thread Anton Tagunov
Hello, Autrijus! Maybe I'm overusing you kindness :-) but still (just a single word of reply - 'bugreport' and I will bugreport this :-) perl15452 -MEncode -we "print Encode::encode('gb2312',' ')" perl15452 -MEncode -we 'print Encode::encode('gb2312',"\n")' perl15452 -MEncode -we 'print Encode::

[PATCH][Details.pod] reword your letter, put a 'partially obsolet, under rework' note

2002-03-23 Thread Anton Tagunov
Hello, Dan! Please excuse me if I've been too bold here.. Just thought it would be nice to retell your mail and give a note to everyone.. I do not insist in any way to be it this, just as a variant.. BTW, if we do not want much feedback from outside our mail lists we may remove the 'Feedback ..

[PATCH][Supported.pod] Encoding classification updata

2002-03-23 Thread Anton Tagunov
Hello, Dan! Hello, Autrijus! I'm sorry that you have just proposed patches to the Supported.pod and mine are coming just in the line.. I'm afraid it will be necessary to patch to copies and merge manually.. :-( If my changes are accepted, of course ;-) Thank you, Dan! Thank you, Autrijus! As I

[PATCH] again! (was: Encode alias implementation fixed!)

2002-03-23 Thread Anton Tagunov
Hello Dan! I think DK> if (ref($alias) eq 'Regexp' && $k =~ $alias) DK> { DK> $DEBUG and warn $k; DK> delete $Alias{$k}; This line is ok ^ DK> } DK> elsif (ref($alia

Introduce new alias for GB 18030

2002-03-23 Thread Anton Tagunov
Hello Dan! Maybe we would be better off with having something like define_alias( qr/^GB(?:\s|-)?(.+)/i => '"gb$1"' ); this will make things like GB 18030 GB 12345 work. My best regards, Anton

Recognise canonical names even if they are given in the wrong case? koi8-r vs KOI8-R

2002-03-23 Thread Anton Tagunov
Hello, Dan! Please kindly excuse me for asking me naive questions - I'm a freshman here :-) The documentation says that only aliases are case insensitive, while the canonical names are not. Maybe it would be a good idea to capitalize/lowercase the name of encoding somewhere internally to make a

Re[2]: Fwd: GB 1833 name abiguous

2002-03-23 Thread Anton Tagunov
Hello Autrijus! Hello, developers! I'm sorry not to know any of the CJK languages.. The only thing I have to help me understand encodings currently are the IANA registry and RFC 1345.. And some pages on the internet.. So please excuse me if I will be talking nonsense! :-) AUTRIJUS> - Alias /

Re[2]: Do Chinese use Katakana? What is 1988.enc?

2002-03-23 Thread Anton Tagunov
Hello, Jean-Michel, Andreas, Dan, Autrijus, Mark! (In the order of message arrival :-) Special hello to Larry! Anton> Do Chinese use Katakana? Jean-Michel> Not as far as I know. Hiragana / Katakana is Jean-Michel> exclusively a Japanese Jean-Michel> thing ain't it? Andreas> In Chin

The correct way to spell an ISO standard (ISO 8859-1) Re[2]: Fwd: [PATCH][docs] Encode.pm

2002-03-23 Thread Anton Tagunov
Hello Markus! I'm very glad to hear your clarification :-) Still there's something that bothers me: Details.pod =head2 Encoding Names Encoding names are case insensitive. White space in names is ignored. In addition an encoding may have aliases. Each encoding has one "canonical" name. The "ca

Fwd: Do Chinese use Katakana? What is 1988.enc (was: ISO-8859-1 vs ISO 8859-1 )

2002-03-20 Thread Anton Tagunov
This is a forwarded message From: Anton Tagunov <[EMAIL PROTECTED]> To: Nick Ing-Simmons <[EMAIL PROTECTED]> Date: Tuesday, March 19, 2002, 7:35:43 PM Subject: Do Chinese use Katakana? What is 1988.enc (was: ISO-8859-1 vs ISO 8859-1 ) ===8<==Original message text==

Fwd: [PATCH][docs] Encode.pm

2002-03-20 Thread Anton Tagunov
This is a forwarded message From: Anton Tagunov <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> Date: Tuesday, March 19, 2002, 9:48:06 PM Subject: [PATCH][docs] Encode.pm ===8<==Original message text=== Hello, developers! With my upgrad

Fwd: GB 1833 name abiguous

2002-03-20 Thread Anton Tagunov
This is a forwarded message From: Anton Tagunov <[EMAIL PROTECTED]> To: Nick Ing-Simmons <[EMAIL PROTECTED]> Date: Tuesday, March 19, 2002, 7:12:26 PM Subject: ISO-8859-1 vs ISO 8859-1 (typo + UTF8 case too :) ===8<==Original message text=== Hello N

Re[2]: [tagunov@motor.ru: Better names for JIS X 0201/0208/0212? (was: ISO-8859-1 vs ISO 8859-1 (typo + UTF8 case too :)]

2002-03-19 Thread Anton Tagunov
given encoding. It switches the DK> "current" encoding by escape sequence. Since escape sequence is used, DK> "raw" encodings are directly applied, while EUC turns Most significant DK> bit (MSB) on. DK>As a transfer encoding, ISO-2022 is great because in the