[PATCH 1/2] Supported.pod

2002-04-05 Thread Anton Tagunov
Hello, experts! Have splitted my patch to Supported.pod into two levels. This is the general utility patch that does not have my arguable changes [level 1/2]. - fixes some typos - rewords section on UTF-16 - adds 'charset (MIME context)' to glossary - adds a reference to Ken's CJKV book Dan?

Re: [PATCH] Supported.pod: cleanup/UTF-16/CJK.inf + an invasion tothe Glossary

2002-04-05 Thread Jungshik Shin
On Fri, 5 Apr 2002, Anton Tagunov wrote: Hi Anton, > Speaking of the patch.. > > > AT> +=item Jungshik Shin's Hangul FAQ > AT> +L . > AT> +L > AT> +has a comprehensive overview of the C (Korean) standards. > AT> +Tha author claims howeve

windows-949 vs cp949 and misc

2002-04-05 Thread Anton Tagunov
Hello, Jungshik! JS> One thing I don't agree with him is what designation JS> to use for CP949. I think it'd better be 'windows-949' To me that's no problem. Currently I have written Proper name: C. Proper names: C, C. Proper name: C. How do you advice to rewrite this? JS> because that's J

Re: [PATCH 1/2 + 0.1] Supported.pod

2002-04-05 Thread Anton Tagunov
Hello! Have just read Jungshik's mail and have patched Supported.pod a bit more: added (x-)windows-949 aliases stuff. --- ext/Encode/lib/Encode/Supported.orig.podFri Apr 5 01:00:36 2002 +++ ext/Encode/lib/Encode/Supported.pod Fri Apr 5 15:18:25 2002 @@ -63,7 +63,7 @@ ascii US-as

Re: [PATCH 1/2 + 0.1] Supported.pod

2002-04-05 Thread Dan Kogai
Anton, I am now working on the new revision of Supported.pod AFTER this patch is applied. I will post the whole thing tonight. Dan

[Encode] In what character encoding legacy scripts are written?

2002-04-05 Thread Dan Kogai
jhi and porters, With Encode done, I am now focusing on other codes and documentations that are related. Naturally there are many but before just sending patches, I would like to call for an attention. Many documents in the core state that legacy encoding defaults to ISO-8859-1. Though

Re: [Encode] In what character encoding legacy scripts are written?

2002-04-05 Thread Dan Kogai
On Friday, April 5, 2002, at 08:49 , Dan Kogai wrote: > > I REPEAT. until perl 6, PERL KNEW NOTHING ABOUT ENCODING. =~ s/6/5.6/ Dan

[Encode] Endian consistency and missing raw encodings for TK

2002-04-05 Thread Dan Kogai
On Friday, April 5, 2002, at 08:40 , Nick Ing-Simmons wrote: > It is _really_ sad, Tk only realy _needs_ one encoding which it expects > it to be called ucs-2be or iso10464-1 > > We don't support either name. > Instead we claim UCS-2 without specifying and endian :-( As a matter of fact, I don

Re: [Encode] Endian consistency and missing raw encodings for TK

2002-04-05 Thread Jarkko Hietaniemi
On Fri, Apr 05, 2002 at 10:24:04PM +0900, Dan Kogai wrote: > On Friday, April 5, 2002, at 08:40 , Nick Ing-Simmons wrote: > > It is _really_ sad, Tk only realy _needs_ one encoding which it expects > > it to be called ucs-2be or iso10464-1 > > > > We don't support either name. > > Instead we claim

[Encode] Farsi is Okay. The problem is in Indics!

2002-04-05 Thread Dan Kogai
On Friday, April 5, 2002, at 11:18 , Jarkko Hietaniemi wrote: > Since it seems that we won't make it for Monday the 8th (MakeMaker is > still unfinished, and UTF-8 keys are still a bit dodgy, and so on), I > guess small updates on Encode (docs certainly, and obvious bugs) are > still okay-- and ev

Re: [Encode] Farsi is Okay. The problem is in Indics!

2002-04-05 Thread Jarkko Hietaniemi
On Sat, Apr 06, 2002 at 12:27:20AM +0900, Dan Kogai wrote: > On Friday, April 5, 2002, at 11:18 , Jarkko Hietaniemi wrote: > > Since it seems that we won't make it for Monday the 8th (MakeMaker is > > still unfinished, and UTF-8 keys are still a bit dodgy, and so on), I > > guess small updates on

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Jungshik Shin
On Fri, 5 Apr 2002, Jarkko Hietaniemi wrote: > > P.S. Does utf8 support surrogates? Surrogate pair is definitely the > > No. Surrogates are solely for UTF-16. There's no need for surrogates > in UTF-8 -- if we wanted to encode U+D800 using UTF-8, we *could* -- > BUT we should not. Encoding

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Jarkko Hietaniemi
On Fri, Apr 05, 2002 at 10:35:29AM -0500, Jungshik Shin wrote: > On Fri, 5 Apr 2002, Jarkko Hietaniemi wrote: > > > > P.S. Does utf8 support surrogates? Surrogate pair is definitely the > > > > No. Surrogates are solely for UTF-16. There's no need for surrogates > > in UTF-8 -- if we wanted

Re: [Encode] Farsi is Okay. The problem is in Indics!

2002-04-05 Thread Mark Leisher
Dan> Here I am talking about Devanagari and its variants. See this. Something that might be of interest as an example of the complexities of Indic encoding: http://crl.nmsu.edu/~mleisher/nai2ucs.pl. The script converts Naidunia web pages to UTF-16, but is more useful in demonstrating the g

Re: [Encode] Farsi is Okay. The problem is in Indics!

2002-04-05 Thread Mark Leisher
Jarkko> No, I'm not mistaken, I know that Farsi and Indics are different. Jarkko> While Googling for the Farsi encodings I just got worried by the Jarkko> frequent mentions of the bidi complications. But Roozbeh would Jarkko> know for certain, instead of us non-Farsi trying to so

Re: [Encode] Farsi is Okay. The problem is in Indics!

2002-04-05 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/DEVANAGA.TXT >> ## >> >> # Section 1: Map the following byte pairs as indicated: >> # (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER) >> # (Also see note about 0xF0 in commen

Re[2]: [PATCH 1/2 + 0.1] Supported.pod

2002-04-05 Thread Anton Tagunov
Hello Dan! DK>I am now working on the new revision of Supported.pod AFTER this patch DK> is applied. I will post the whole thing tonight. I'm very glad! :-)) Could we have a couple more patching cycles then? I already see two typo's in my text :-( - Anton

Add Windows-31J => cp932 alias? Windows-31J ever used?

2002-04-05 Thread Anton Tagunov
Hello, gentlemen! Have just stumbled over Name: Windows-31J MIBenum: 2024 Source: Windows Japanese. A further extension of Shift_JIS to include NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119). The CC

[Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Dan Kogai
On Friday, April 5, 2002, at 11:10 , Jarkko Hietaniemi wrote: > Change 15745 by jhi@alpha on 2002/04/05 13:07:21 > > Integrate perlio; > > Not only did UCS-2 have dodgy name it was buggy. > > Affected files ... > > ... //depot/perl/ext/Encode/lib/Encode/10646_1.pm#4 integrate >

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Jarkko Hietaniemi
> P.S. Does utf8 support surrogates? Surrogate pair is definitely the No. Surrogates are solely for UTF-16. There's no need for surrogates in UTF-8 -- if we wanted to encode U+D800 using UTF-8, we *could* -- BUT we should not. Encoding U+D800 as UTF-8 should not be attempted, the whole surr

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Dan Kogai
On Saturday, April 6, 2002, at 12:18 , Jarkko Hietaniemi wrote: >> P.S. Does utf8 support surrogates? Surrogate pair is definitely the > > No. Surrogates are solely for UTF-16. There's no need for surrogates > in UTF-8 -- if we wanted to encode U+D800 using UTF-8, we *could* -- > BUT we should

A FIX. [Re: qr/^UCS2-le$/i => '"UCS-2"' -- what is it?]

2002-04-05 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Friday, April 5, 2002, at 11:33 , Dan Kogai wrote: >> - qr/^UCS2-le$/i=> '"UCS-2"', ); >> + qr/^UCS-2LE$/i=> '"UTF-16LE"'); > ^^^aaaggh! > >Forget the last one. This one is correct. Do w

A TEST. [Re: qr/^UCS2-le$/i => '"UCS-2"' -- what is it?]

2002-04-05 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Friday, April 5, 2002, at 11:39 , Dan Kogai wrote: >> Forget the last one. This one is correct. >> >> Dan-the-Encode-Maintainer > >.And this is a two-pence patch that protects the source from Dan. > >Dan the Encode Maintainer > >P.S. I hate U(TF|CS)

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Jarkko Hietaniemi
On Sat, Apr 06, 2002 at 01:08:11AM +0900, Dan Kogai wrote: > On Saturday, April 6, 2002, at 12:18 , Jarkko Hietaniemi wrote: > >> P.S. Does utf8 support surrogates? Surrogate pair is definitely the > > > > No. Surrogates are solely for UTF-16. There's no need for surrogates > > in UTF-8 -- if

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: > >P.S. Does utf8 support surrogates? Surrogate pair is definitely the >ugliest SOB of Unicode but without it, we can't print >\x{8000}-\x{10ff} to the stream UTF-8 does not _need_ to support surogates - it can do full range without them. What

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Nick Ing-Simmons
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes: >Well, there seems to be > > Perl_utf16_to_utf8(pTHX_ U8* p, U8* d, I32 bytelen, I32 *newlen) > >in utf8.c that seems to be doing surrogate arithmetics, but I think >that's not much used (if at all), and I cannot see utf8_to_utf16. >(There's also > >

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Dan Kogai
On Saturday, April 6, 2002, at 01:16 , Jarkko Hietaniemi wrote: >> Yes. I know that. My question is whether we support CONVERSION. >> Internals have nothing to do with that. When we say UCS-2, >> \x{1}-\x{10} must be discarded or croak for error. When we say > > I suggest croak. > >> U

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Dan Kogai
On Saturday, April 6, 2002, at 01:29 , Nick Ing-Simmons wrote: >> Perl_utf16_to_utf8_reversed(pTHX_ U8* p, U8* d, I32 bytelen, I32 >> *newlen) >> > > Should be a good starting point for the XS version ;-) Okay. But for now, good old mammalian implementation first. Your code contribution is v

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Brian Stell
Dan Kogai wrote: > ... > Okay, here is my strategy. > > decode("\x{8C00}-\0x{8}") encode("\x{1}-\x{10}") The Unicode consortium does discuss this: http://www.unicode.org/versions/corrigendum1.html Corrigendum #1: UTF-8 Shortest Form The conformance clau

Re: what now? (background)

2002-04-05 Thread Dan Kogai
On Saturday, April 6, 2002, at 03:08 , Jarkko Hietaniemi wrote: > After integrating NIck's tweak (#15745) the Aliases.t started failing. That will be fixed with the next version of Encode which implements all of UCS-2(BE|BL) and UTF-(16|32)(BE|LE)? Yes, I have carefully put ? in the last one.

"Chracter set terminology survey" ready

2002-04-05 Thread Anton Tagunov
Hello Dan! Hello Jungshik! Hello other developers and experts! I have finally completed my survey named CHARACTER SET" TERMINOLOGY SURVEY, CLASSIFICATON OF CJK AND NON-CJK CHARACTER SET STANDARDS. VERSION 0.95 It is available at http://tagunov.tripod.com/survey2.html The main purpose of this s

Re: Change 15689: What started as a small nit (the charnames test, nit found

2002-04-05 Thread Philip Newton
On Tue, 2 Apr 2002 13:45:06 -0800, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote: > Change 15689 by jhi@alpha on 2002/04/02 20:35:13 > > What started as a small nit (the charnames test, nit found > be Hugo), ballooned a bit... the goal is Larry's wish that > illegal Unicode (such

Re: Change 15689: What started as a small nit (the charnames test, nit found

2002-04-05 Thread Jarkko Hietaniemi
> On the other hand, it might make sense to have one flag that allows both > 0xyyFFFE and 0xyy simultaneously (for yy = (0 .. 0x10)) -- perhaps > modify the UNICODE_ALLOW_ flag or simly extend it to allow both > yy and yyFFFE.. I think this is the best choice: it makes no sense to con

Re: Change 15689: What started as a small nit (the charnames test, nit found

2002-04-05 Thread Philip Newton
On 5 Apr 02, at 23:03, Jarkko Hietaniemi wrote: > > On the other hand, it might make sense to have one flag that allows both > > 0xyyFFFE and 0xyy simultaneously (for yy = (0 .. 0x10)) -- perhaps > > modify the UNICODE_ALLOW_ flag or simly extend it to allow both > > yy and yyFFFE.. >