Re: Always setting UTF-8 flag - am I bad?

2004-08-05 Thread Jungshik Shin
On Thu, 5 Aug 2004, Nick Ing-Simmons wrote: > >Alright, I failed to say that this is an XS module, so I convert with > >WideCharToMultiByte, a Windows routine(*), put the result in an SV, and > >then say SvUTF8_on. > > The possible danger here is if the "multi byte" encoding for > user's environme

Re: Unicode filenames on Windows with Perl >= 5.8.2

2004-06-22 Thread Jungshik Shin
Jan Dubois wrote: On Mon, 21 Jun 2004, Steve Hay wrote: I must confess that 2 doesn't really bother me since the "9x" type systems are now a thing of the past (XP onwards are all "NT" type systems, even XP Home Edition). While I also wish that Win 9x would just cease to exist, I don't thin

Re: AL32UTF8

2004-05-01 Thread Jungshik Shin
Tim Bunce wrote: On Fri, Apr 30, 2004 at 10:58:19PM +0700, Martin Hosken wrote: IIRC AL32UTF8 was introduced at the behest of Oracle (a voting member of Unicode) because they were storing higher plane codes using the surrogate pair technique of UTF-16 mapped into UTF-8 (i.e. resulting in 2 UTF

Re: Info required - "Wide API calls" in Win32 Perl >= 5.8.2

2004-02-19 Thread Jungshik Shin
Jan Dubois wrote: On Thu, 19 Feb 2004 22:03:14 +0200, Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote: But even just for core Perl, things become more complicated as long as you want to support Windows 95/98/Me. Those platforms (I'm using the term loosely) do not support the wide APIs. So you can

Re: Status of -C

2004-01-11 Thread Jungshik Shin
Paul Hoffman wrote: Er, never mind. I found that I was doing something quite silly with the -C. All is OK, and it is now causing STDIN to be UTF8ish. Would you mind sharing your experience? That way, others will be able to avoid repeating your mistake. Jungshik

Re: perlunicode comment - when Unicode does not happen

2003-12-28 Thread Jungshik Shin
On Sun, 28 Dec 2003, Nick Ing-Simmons wrote: > Jungshik Shin <[EMAIL PROTECTED]> writes: > > > > Then, he should switch to en_GB.UTF-8. > > I probably will. Good ! > >Besides, he implied that > >he still uses ISO-8859-1 for files whose names can be c

Re: perlunicode comment - when Unicode does not happen

2003-12-25 Thread Jungshik Shin
On Thu, 25 Dec 2003, Jarkko Hietaniemi wrote: > >> Whoa! It's the other way round here. Nick is using a locale that > >> suits him for other reasons (e.g. getting time and data formats in > >> proper British ways), but why should he be constrained not to use for his > >> filenames whatever he

Re: perlunicode comment - when Unicode does not happen

2003-12-25 Thread Jungshik Shin
On Thu, 25 Dec 2003, Jarkko Hietaniemi wrote: > >> What I wish is that the whole current locale system would curl up and > >> die. > > > > As you'd agree, it's only 'encoding' part that has to die. > > Oh no, there are plenty of parts in it that I wish would die :-) Wishing it to die is diffe

Re: perlunicode comment - when Unicode does not happen

2003-12-25 Thread Jungshik Shin
On Thu, 25 Dec 2003, Jungshik Shin wrote: > locale definition. The fact that it is on Unix is just an artifact of > Unix file system and we want to leave it behind us if possible. Of course, Of course, it's rather a whole lot of different things that bind locale and encoding on Unix,

Re: perlunicode comment - when Unicode does not happen

2003-12-25 Thread Jungshik Shin
On Thu, 25 Dec 2003, Jarkko Hietaniemi wrote: > What I wish is that the whole current locale system would curl up and > die. As you'd agree, it's only 'encoding' part that has to die. Everybody should switch to UTF-8 on Unix and end-users should never worry about 'encoding'. In an ideal world,

Re: perlunicode comment - when Unicode does not happen

2003-12-25 Thread Jungshik Shin
On Thu, 25 Dec 2003, Jarkko Hietaniemi wrote: > > locale. Why does Perl have to be held responsible for your intentional > > act that is bound to break things? > > Whoa! It's the other way round here. Nick is using a locale that suits > him for other reasons (e.g. getting time and data formats

Re: perlunicode comment - when Unicode does not happen

2003-12-25 Thread Jungshik Shin
On Tue, 23 Dec 2003, Nick Ing-Simmons wrote: > Ed Batutis <[EMAIL PROTECTED]> writes: > >> I don't think we understand common practice (or that such practices > >> are even established yet) well enough to specify that yet. Common practice is that file names on 'local disks' are assumed to be in

Re: perlunicode comment - when Unicode does not happen

2003-12-24 Thread Jungshik Shin
On Tue, 23 Dec 2003, Jarkko Hietaniemi wrote: > > I don't see how introducing a new LC_* would help here. Whether > > Limit the mess of CTYPE controlling Yet Another Feature. I don't think it's yet another feature. It's one of features that's commonly assigned to it. Well, I guess you'd ask ho

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jungshik Shin
On Tue, 23 Dec 2003, Nick Ing-Simmons wrote: > Jungshik Shin <[EMAIL PROTECTED]> writes: > >On Mon, 22 Dec 2003, Jarkko Hietaniemi wrote: > > > >> (AFAIK) W2K and later _are able_ to use UTF-16LE encoded Unicode for > >> filenames, > >> but becau

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jungshik Shin
On Tue, 23 Dec 2003, Jarkko Hietaniemi wrote: > > It works because it relies > > on iconv(3) to convert between the current locale codeset and UTF-16 > > (used internally by Mozilla) if/wherever possible. 'wc*to*mb/mb*to*wc' > > is only used only where iconv(3) is not available. Anyway, yes, that

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jungshik Shin
On Tue, 23 Dec 2003, Jarkko Hietaniemi wrote: > >> (AFAIK) W2K and later _are able_ to use UTF-16LE encoded Unicode for > >> filenames, > >> but because of backward compatibility reasons using 8-bit codepages is > >> much > >> more likely. > > > > No. _Both_ NTFS (only supported by Win 2k/XP) an

Re: perlunicode comment - when Unicode does not happen

2003-12-22 Thread Jungshik Shin
On Mon, 22 Dec 2003, Jarkko Hietaniemi wrote: > (AFAIK) W2K and later _are able_ to use UTF-16LE encoded Unicode for > filenames, > but because of backward compatibility reasons using 8-bit codepages is > much > more likely. No. _Both_ NTFS (only supported by Win 2k/XP) and VFAT (supported by W

Re: perlunicode comment - when Unicode does not happen

2003-12-22 Thread Jungshik Shin
On Mon, 22 Dec 2003, Ed Batutis wrote: > "Jarkko Hietaniemi" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] > > > You do know that ... > Yes. > > If wctomb or mbtowc are to be used, then Perl's Unicode must be converted > either to the locale's wide char or to its multibyte. This is

Re: Bidirectional (bidi) Support?

2003-10-25 Thread Jungshik Shin
On Fri, 24 Oct 2003, Chris Whiting wrote: > >"Bob Hallissy" <[EMAIL PROTECTED]> wrote in message >> I presume your algorithm depends on the Arabic presentation forms available >> as separately encoded >characters in Unicode. If this is the case, > The algorithm, and all that I have seen, conver

Re: Mixing Unicode and Byte output on a Unicode enabled Perl 5.8.0

2003-10-09 Thread Jungshik Shin
On Thu, 9 Oct 2003, Guido Flohr wrote: > BTW, Windows editors also insert that BOM at the beginning when writing > XML files encoded in UTF-8. In other words: If you edit a UTF-8 XML > file with Windows Notepad, it will be corrupted. MSIE and Mozilla (!) > still treat it as well-formed XML but a

Re: Mixing Unicode and Byte output on a Unicode enabled Perl 5.8.0

2003-10-09 Thread Jungshik Shin
On Thu, 9 Oct 2003, Frank Smith wrote: > I am trying to use the £ (pound sterling) symbol in a script that > produces both TEXT and HTML the html handles the Unicode fine, all the > browsers seem to work. However, once the text file arrives on the Windowz > box the Unicode £ screws Excel. > Can y

Re: Inverse of /\p{script}/

2003-08-29 Thread Jungshik Shin
On Fri, 29 Aug 2003, Nick Ing-Simmons wrote: > What I am hoping to do for Tk804 is put some kind of callback to perl > hook in so that when Tk wants a font for a particular character it > can call to perl and perl will give it strong push in a particular direction. > Thus for someone expecting Jap

Re: Quick question: viscii vs. iscii? NEVERMIND

2003-06-03 Thread Jungshik Shin
On Mon, 2 Jun 2003, David Graff wrote: > Does 5.8 have any conversion functionality for ISCII? If not, is > anyone working on this (and is there a notion when it may be ready)? Encode doesn't support ISCII (there may be a separate module for ISCII, though), yet. I'm planning to work on it (se

Encode::_utf8_on and output

2003-05-30 Thread Jungshik Shin
On Sat, 18 Jan 2003, Jarkko Hietaniemi wrote: > Now Perl-5.8.1-to-be has been changed to > > (1) not to do any implicit UTF-8-ification of any filehandles unless > explicitly asked to do so (either by the -C command line switch > or by setting the env var PERL_UTF8_LOCALE to a true value,

Re: How to name CJK ideographs

2002-10-25 Thread Jungshik Shin
On Sat, 26 Oct 2002, Dan Kogai wrote: > On Saturday, Oct 26, 2002, at 03:55 Asia/Tokyo, Jungshik Shin wrote: > > Another possibility is 'meaning-pronunciation' index. I believe > > this is one of a few ways to refer to CJK characters (say, over the > >

Re: Unicode. Perl does the right thing?

2002-10-25 Thread Jungshik Shin
On Fri, 25 Oct 2002, Autrijus Tang wrote: > On Fri, Oct 25, 2002 at 02:53:43PM +0900, Dan Kogai wrote: > > use charanames ":zh"; > > print "\N{sheng1}"; > > 17 characters from the Big5 range has the 'sheng1' pronounciation; > no doubt many more in the Unihan range. > > use charanames ":zh"; >

RFC 2231 (was Re: Encode::MIME::Header...)

2002-10-08 Thread Jungshik Shin
On Mon, 7 Oct 2002, Dan Kogai wrote: > As I said, Encode::MIME::Header has those restrictions; > > * the Encode API > * RFC 2047 I'm not sure if Encode::MIME::Header is the best place to implement RFC 2231 because RFC 2231 encoding/decoding involves two parameters, 'MIME charset' and 'langua

Re: README.cjk?

2002-05-06 Thread Jungshik Shin
On Tue, 7 May 2002, Dan Kogai wrote: Hi Dan, > pumpking is calling for the (hopefully) the last chance to update > README.cjk. > > On Tuesday, May 7, 2002, at 02:48 , Jarkko Hietaniemi wrote: > > Do I have the latest versions of the README.{cn,jp,ko,tw}? > > I do think so but I am calling fo

Re: http://bleedperl.dan.co.jp:8080/

2002-04-27 Thread Jungshik Shin
On Sat, 27 Apr 2002, Dan Kogai wrote: > I have set up an experimental mod_bleedperl server which URI is shown in > the subject. > To demonstrate the power of Perl 5.8, I have written a small cgi/pl (.pl > runs on Apache::Registry) called piconv.pl, a web version of piconv(1). > > http://bleedperl

Re: README.jp, README.tw, README.cn, README.kr

2002-04-15 Thread Jungshik Shin
Hi, Attached is README.ko (per Jarkko's suggestion, I used 'ko' instead of 'kr') in EUC-KR encoding. North Korea has its own 94 x 94 coded character set(KPS 9566-97: ISO-IR 202), but a few web pages set up for/by North Korean companies(and possibly government?) of which URLs I happened know

piconv and EUC :-)

2002-04-10 Thread Jungshik Shin
On Sun, 31 Mar 2002, Dan Kogai wrote: Hi Dan, > >piconv -- iconv(1), reinvented in perl > > > >piconv is perl version of iconv, a character encoding con- > >verter widely availabe for various unixen today. This > >script was primarily a technology demostrator f

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Jungshik Shin
patibility Encoding Scheme for UTF-16 (CESU-8) (http://www.unicode.org/unicode/reports/tr26). Does Encode need to support this monster? I hope not. Jungshik Shin

Re: [PATCH] Supported.pod: cleanup/UTF-16/CJK.inf + an invasion tothe Glossary

2002-04-05 Thread Jungshik Shin
On Fri, 5 Apr 2002, Anton Tagunov wrote: Hi Anton, > Speaking of the patch.. > > > AT> +=item Jungshik Shin's Hangul FAQ > AT> +L . > AT> +L > AT> +has a comprehensive overview of the C (Korean) standards. > AT> +Tha author claims howeve

Re: - charset + character set + coded character set + CCS (?) (was:[Encode] Encode::Supported revised)

2002-04-04 Thread Jungshik Shin
On Thu, 4 Apr 2002, Anton Tagunov wrote: Hi Anton !! AT> Our comments go in the same direction, but will you AT> let me strengthen your statements a bit? Thank you ! JS> On the other hand, no one with *sufficient understanding* JS> of the issue uses 'character set' to mean encoding. AT> [E

Re: [PATCH] Re: [Encode] Encode::Supported revised

2002-04-04 Thread Jungshik Shin
h I interpret as having support for it. >UTF-16 > - KOI8-U(http://www.faqs.org/rfcs/rfc2319.html) > > -are IANA-registered (C even as a preferred MIME name) > +=for comment > +waiting for comments from Jungshik Shin to soften this - Anton > + > +is a IANA-regist

Re: [Encode] Encode::Supported revised

2002-04-04 Thread Jungshik Shin
On Thu, 4 Apr 2002, Dan Kogai wrote: Konnichiha ! (hope I got this one right). > On Thursday, April 4, 2002, at 03:06 , Jungshik Shin wrote: > >> o The MIME name as defined in IETF RFCs. > >>UCS-2 ucs2, iso-10646-1[IANA, e

Re: [Encode] Encode::Supported revised

2002-04-03 Thread Jungshik Shin
On Wed, 3 Apr 2002, Dan Kogai wrote: Dan, Thank you for your write-up. Below are some comments. > o The MIME name as defined in IETF RFCs. >UCS-2 ucs2, iso-10646-1[IANA, et al] >UCS-2le >UTF-8 utf8 [

Re: Are GB 18030 and CNS 11643-1992 the best spellings?

2002-03-28 Thread Jungshik Shin
On Fri, 29 Mar 2002, Anton Tagunov wrote: Hi Anton, > Writing a bit of an article, putting in there all I have learnt > about CJK encodings on the Internet and at [EMAIL PROTECTED] > Has already taken me a week :-) I strongly recommend you get CJKV Information Processing by Ken Lunde. It has

Re: [Encode/ISO-2022] KR is done. CN to go.

2002-03-27 Thread Jungshik Shin
I am now hopeful that 1.00 will be shipped in next 24 hours. > Coincidentally, it is 09:00 JST, meaning 00:00 Zulu. I'd rather say UTC instead of Zulu :-) > On Thursday, March 28, 2002, at 08:01 , Jungshik Shin wrote: > > Yeah, that's a common mistake made by (Japanese)

Re: Encoding vs Charset

2002-03-27 Thread Jungshik Shin
On Wed, 27 Mar 2002, Dan Kogai wrote: > On Wednesday, March 27, 2002, at 11:22 , Jungshik Shin wrote: > > IMHO, you're also misusing the term 'charset' here. MIME charset > > can be used synonymously with 'encodings' (or > > character set en

Re: let's cook it!

2002-03-27 Thread Jungshik Shin
On Wed, 27 Mar 2002, Nick Ing-Simmons wrote: > Autrijus Tang <[EMAIL PROTECTED]> writes: > >On Tue, Mar 26, 2002 at 06:28:07PM -0500, Jungshik Shin wrote: > >> Microsoft products use 'ks_c_5601-1987' as an encoding name/MIME > >> charset/character set

Re: Encode: CJK-Guide

2002-03-26 Thread Jungshik Shin
On Wed, 27 Mar 2002, Autrijus Tang wrote: > On Tue, Mar 26, 2002 at 11:40:56PM -0500, Jungshik Shin wrote: > > for this? In case of Johab, the easiest way to add support for it is to > > just generate the mapping table for it, but I feel uncomfotable bloating > > the cod

Re: Encode: CJK-Guide

2002-03-26 Thread Jungshik Shin
support those in Encode, > but for space considerations one has to install an additional module, > Encode::HanExtra. I found that Big5-HKSCS is included in 'plain Encode' and GBK, GB18030, EUC-TW, and Big5plus are in HanExtra. Jungshik Shin

Re: GB2312 and EUC-CN : IANA registry

2002-03-26 Thread Jungshik Shin
) restricted to A0-FF in both bytes code set 2: Half Width Katakana (a single 7-bit byte set) requiring SS2 as the character prefix code set 3: JIS X0212-1990 (a double 7-bit byte set) restricted to A0-FF in both bytes requiring SS3 as the character prefix Alias: csEUCPkdFmtJapanese Alias: EUC-JP (preferred MIME name) Jungshik Shin

Re: Encoding vs Charset

2002-03-26 Thread Jungshik Shin
On Tue, 26 Mar 2002, Jungshik Shin wrote: > > really means euc-cn and charset="ks_c_5601-1987" really menas euc-kr. > > Sadly this misconception is enbedded to popular browsers. > M$ OE, M$ Frontpage keep producing html docs. However, > it also has to be noted th

Re: Encoding vs Charset

2002-03-26 Thread Jungshik Shin
ust stuck. As for Taiwan, the reason there's no confusion between coded character set and encoding is not because they're technically correct but because in their case EUC-TW has never been used widely while the popular encoding Big5 has much more complex relationship with CNS 11xxx than EUC-KR with KS X 1001 and EUC-CN with GB 2312. (Big5 vs CNS 11xxx is similar to Shift_JIS vs JIS X 0208) Jungshik Shin

Re: Encode: CJK-Guide

2002-03-26 Thread Jungshik Shin
ecessary because Hangul precomposed syllable mapping (to Unicode) is algorithmic while Hanjas and symbols can be mapped to KS X 1001 algorithmically and then mapped to Unicode using KS X 1001 mapping table. BTW, how about Big5-HKSCS(Hongkong), GBK, and GB18030(PRC)? Jungshik Shin

Re: Encode::CJKguide

2002-03-26 Thread Jungshik Shin
on because some characters appear to need to be unified in my eyes. (perhaps, the source separation rule kept them distinct.) Jungshik Shin

Re: Encode: CJK-Guide

2002-03-26 Thread Jungshik Shin
English word with multiple meanings by just looking at its computer representation without context/grammatical/linguistic/lexical analysis, can you? How do you know what 'fly' means without context? Jungshik Shin

Re: let's cook it!

2002-03-26 Thread Jungshik Shin
arset for it although it has no place in Korean nat'l standard. Mozilla has to accept 'ks_c_5601-1987' as an alias to 'X-Windows-949' because MS IE, OE and frontpage are so widely used. Jungshik Shin