Re: Unicode::Collate string replacements and case sensitivity

2011-05-05 Thread SADAHIRO Tomoyuki
parameter (ignore_level2) will allow it. (However the behavior of ignore_level2 is quite different from so-called caseLevel in UCA etc.) Regards, SADAHIRO Tomoyuki

Re: Unicode::Collate string replacements and whitespace

2010-09-22 Thread SADAHIRO Tomoyuki
{ + } elsif ($to_be_pushed) { push @subWt, [ \...@wt ]; } } Regards, SADAHIRO Tomoyuki dear all, most probably I'm missing something quite obvious and very simple, but I am no expert with Perl and Unicode yet. I'm making some string replacements with Unicode::Collate

[ANN] Unicode::Collate 0.54 released

2010-07-26 Thread SADAHIRO Tomoyuki
for Unicode 5.2.0. Thank you, SADAHIRO Tomoyuki

Re: Unicode::Collate, useful but useless

2007-04-15 Thread SADAHIRO Tomoyuki
U.C.D. 5.0.0. Regards, SADAHIRO Tomoyuki

Re: range operator vs. unicode

2006-06-08 Thread SADAHIRO Tomoyuki
[CD]) isn't treated as the sixth letter following epsilon. P.S. 11 is represented by iota-alpha, not by kappa, with the greek numeral system. cf. http://en.wikipedia.org/wiki/Greek_numerals Regards, SADAHIRO Tomoyuki

Re: iso-2022-jp encoding on EBCDIC

2005-12-20 Thread SADAHIRO Tomoyuki
CPAN: http://rt.cpan.org/NoAuth/Bugs.html?Dist=Encode Regards, SADAHIRO Tomoyuki

Re: case folding failure on EBCDIC

2005-10-17 Thread SADAHIRO Tomoyuki
. regards, SADAHIRO Tomoyuki

Re: case folding failure on EBCDIC

2005-10-10 Thread SADAHIRO Tomoyuki
{ff}] CUR = 2 LEN = 4 where PV stands for string and \303\277 is U+00FF in UTF-8. In UTF-EBCDIC, the output should be different. regards, SADAHIRO Tomoyuki

Re: utf8::upgrade,utf8::encode and utf8::is_utf8 on EBCDIC platform

2005-09-01 Thread SADAHIRO Tomoyuki
in UTF-EBCDIC too. If you want to convert an interger to a character according to Unicode scalar values, you can use pack('U'), but not chr(). For example, pack('U', 0xFF) should correspond to U+00FF (y with diaeresis), everywhere (both on ASCII and on EBCDIC). Regards, SADAHIRO Tomoyuki Hi

Re: Transliteration operator(tr//)on EBCDIC platform

2005-08-09 Thread SADAHIRO Tomoyuki
in the case of EBCDIC. Sastry, would you please do the following codelet on your EBCDIC? ($a = \x89\x8a\x8b\x8c\x8d\x8f\x90\x91) =~ s/[\x89-\x91]/X/g; is($a, ); Does that work similarly to yours? ($a = \x89\x8a\x8b\x8c\x8d\x8f\x90\x91) =~ tr/\x89-\x91/X/; is($a, ); Regards, SADAHIRO

[PATCH] bytes.pm doesn't check undefined subroutine calling

2005-05-26 Thread SADAHIRO Tomoyuki
subroutine is defined or not. So goto() causes recursive AUTOLOAD calling inifinitely. The following patch changes utf8.pm and utf8.t. Regards, SADAHIRO Tomoyuki diff -ur perl~/lib/bytes.pm perl/lib/bytes.pm --- perl~/lib/bytes.pm Wed Sep 03 18:39:15 2003 +++ perl/lib/bytes.pm Thu May 26 23

Re: How to use Unicode::Collate in multilinguage apps?

2004-03-30 Thread SADAHIRO Tomoyuki
::Collate::Locale::_locale() does the same thing as canonical_name(), but that function is internal and not public. Regards, SADAHIRO Tomoyuki

Re: How to use Unicode::Collate in multilinguage apps?

2004-03-27 Thread SADAHIRO Tomoyuki
. It may be enhanced sooner or later... [prerelease] This will be released *after* Perl 5.8.4 (or its RC) will be out. http://homepage1.nifty.com/nomenclator/perl/Unicode-Collate-0.40.tar.gz regards, SADAHIRO Tomoyuki

Re: removing accents

2004-01-02 Thread SADAHIRO Tomoyuki
TO) with '=' (EQUALS SIGN) since a mathematic negation slash is encoded by U+0338 COMBINING LONG SOLIDUS OVERLAY which is to be removed. sub remove_accent { use Unicode::Normalize; my $s = NFD(shift); $s =~ s/\pM//g; return $s; } Regards, SADAHIRO Tomoyuki

Re: removing accents

2003-12-27 Thread SADAHIRO Tomoyuki
removal is provisional and its definition has not been specified yet, I suppose it to have mapping of Ø to O, etc. Regards, SADAHIRO Tomoyuki

Re: perlunicode comment - when Unicode does not happen

2003-12-22 Thread SADAHIRO Tomoyuki
know whether Perl supports multibyte file/path names or not.) in a Japanese Perlers' mail list. http://www.freeml.com/message/[EMAIL PROTECTED]/0004467 (in Japanese) Here is a brief summary (in Japanese). http://homepage1.nifty.com/nomenclator/perl/shiftjis.htm#file Regards, SADAHIRO Tomoyuki

Re: perlunicode comment - when Unicode does not happen

2003-12-22 Thread SADAHIRO Tomoyuki
On Tue, 23 Dec 2003 09:47:32 +0900 SADAHIRO Tomoyuki [EMAIL PROTECTED] wrote: I had talked on this problem (well, I don't know whether Perl supports multibyte file/path names or not.) in a Japanese Perlers' mail list. http://www.freeml.com/message/[EMAIL PROTECTED]/0004467 (in Japanese

Re: Unicode::Collate question

2003-12-02 Thread SADAHIRO Tomoyuki
(Perl native) http://search.cpan.org/~pne/Lingua-Klingon-Collate-1.01/ Lingua::JA::Sort::JIS, Japanese, UTF-8 http://search.cpan.org/~sadahiro/Lingua-JA-Sort-JIS-0.04/ ShiftJIS::Collate, Japanese, Shift-JIS http://search.cpan.org/~sadahiro/ShiftJIS-Collate-1.02/ Regards, SADAHIRO

Re: Unicode::Collate question

2003-11-29 Thread SADAHIRO Tomoyuki
in advance for any insight or pointers you can contribute. Regards, -- Eric Cholet Regards, SADAHIRO Tomoyuki

terminator weight for Hangul

2003-10-12 Thread SADAHIRO Tomoyuki
] # CHOSEONG NIEUN 1113 ; [.18A2.0020.0002][.18A1.0020.0002] # CHOSEONG NIEUN-KIYEOK Regards, SADAHIRO Tomoyuki

roundtrip conversion for Mac OS CJK encodings

2003-09-28 Thread SADAHIRO Tomoyuki
-MacKorean.html Regards, SADAHIRO Tomoyuki

Re: big trubble perl encoding DBD/DBI

2003-09-26 Thread SADAHIRO Tomoyuki
(BOn Wed, 24 Sep 2003 12:12:37 +0200 (Budo [EMAIL PROTECTED] wrote: (B (B hello you, (B (B excuse me, please help! (B (B precondition: (B * linux redhat 9.0 (B * perl, v5.8.0 (B * dbms sybase 11.9.x (support iso-8859-1 ) (B * string contains german special chars "$B!&!&!&(B

Re: UCM file and combining character sequences

2003-09-22 Thread SADAHIRO Tomoyuki
above + circumflex, o + dot above + grave, and o + dot above + macron. SADAHIRO Tomoyuki

Hangul decomposition and composition

2003-09-12 Thread SADAHIRO Tomoyuki
-unicode/2003-04/msg00028.html [3] http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt [4] http://std.dkuug.dk/JTC1/SC22/WG20/docs/N954.PDF (full); http://std.dkuug.dk/JTC1/SC22/WG20/docs/N953.PDF (summary) regards, SADAHIRO Tomoyuki

[ANN] Unicode::Collate 0.28 released

2003-09-07 Thread SADAHIRO Tomoyuki
is always neglected. - Fixed: according to S2.1 in UTS #10, a blocked combining character should not be contracted. One test in test.t was wrong, then removed. - Added contract.t. - (normalization = prenormalized) is able to be used. Regards, SADAHIRO Tomoyuki

Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings

2003-08-24 Thread SADAHIRO Tomoyuki
.nifty.com/nomenclator/perl/Encode-UnicodeNormalization.html Regards, SADAHIRO Tomoyuki

Re: [perl #22814] Non-deterministic problem with unicode regexps

2003-06-30 Thread SADAHIRO Tomoyuki
Hello. Well, Unicode::Normalize 0.23 is released. http://search.cpan.org/author/SADAHIRO/Unicode-Normalize-0.23/ Documentations are also revised at some points. Thank you, SADAHIRO Tomoyuki Call ID Numbers (via RT) [EMAIL PROTECTED] writes: This is the closest I've been able to come

Re: utf8_heavy noise

2003-06-21 Thread SADAHIRO Tomoyuki
) { #End of Patch SADAHIRO Tomoyuki

Re: utf8_heavy noise

2003-06-21 Thread SADAHIRO Tomoyuki
. :-) 1946..194F; Nd # [10] LIMBU DIGIT ZERO..LIMBU DIGIT NINE 104A0..104A9 ; Nd # [10] OSMANYA DIGIT ZERO..OSMANYA DIGIT NINE SADAHIRO Tomoyuki

Re: [FYI] Intersection and Removal of Character Class

2003-06-15 Thread SADAHIRO Tomoyuki
On Sat, 14 Jun 2003 22:05:24 +0900 SADAHIRO Tomoyuki [EMAIL PROTECTED] wrote: I write a module that parses a character class including grouping, intersection, union, and removal (subtraction), according to Unicode Regular Expression (e.g. [A B], [A-Z - XYZ]) and converts it into a regular

[FYI] Intersection and Removal of Character Class

2003-06-14 Thread SADAHIRO Tomoyuki
/unicode/reports/tr18/ Thank you, SADAHIRO Tomoyuki

[ANNOUNCE] Unicode::Normalize 0.21 and ::Collate 0.24 released

2003-04-05 Thread SADAHIRO Tomoyuki
true, Unicode code points (less than 256) should be translated into native code points for EBCDIC; (3) else, any further work should be abandoned. If a test would failed for EBCDIC, the code might be broken; as well as the test itself might be broken. Thank you, SADAHIRO Tomoyuki

Re: [Patch] Encode.pm : euro sign missing in cp936.ucm

2003-03-27 Thread SADAHIRO Tomoyuki
On Thu, 27 Mar 2003 10:02:28 +0900 Dan Kogai [EMAIL PROTECTED] wrote: SADAHIRO-san and cp9?? experts, On Thursday, Mar 27, 2003, at 00:44 Asia/Tokyo, SADAHIRO Tomoyuki wrote: +U20AC \x80 |0 # EURO SIGN Is this right? Yes, U20AC is indeed missing from cp936.ucm but see this; (snip

Re: [Patch] Encode.pm : euro sign missing in cp936.ucm

2003-03-27 Thread SADAHIRO Tomoyuki
://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT Regards, SADAHIRO Tomoyuki

[Patch] Encode.pm : euro sign missing in cp936.ucm

2003-03-26 Thread SADAHIRO Tomoyuki
|0 # DEGREE FAHRENHEIT End of patch sigh, I've made such a patch long before. cf. http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2001-09/msg01568.html Regards, SADAHIRO Tomoyuki

Re: Warning messages for ill-formed data

2003-03-25 Thread SADAHIRO Tomoyuki
\xB1 |0 U00A8 \xC6\xD8 |0 U00AF \xA1\xC2 |0 @@ -171,7 +164,6 @@ U00F9 \x88\x7B |0 U00FA \x88\x79 |0 U00FC \x88\xA2 |0 -U00FF \xFF |0 # LATIN SMALL LETTER Y WITH DIAERESIS U0100 \x88\x56 |0 U0101 \x88\x67 |0 U0112 \x88\x5A |0 Regards, SADAHIRO Tomoyuki I often encounter lower-ascii codes

Re: Warning messages for ill-formed data

2003-03-25 Thread SADAHIRO Tomoyuki
/bp581/lib/Encode.pm line 156. The message is not 'big5-eten \x88\x71 does not map to Unicode..', of course (big5-eten.ucm does not define \x88\x71 as a double-byte char), that may be what is expected, though. Regards, SADAHIRO Tomoyuki On Tue, 25 Mar 2003 21:53:13 +0900 SADAHIRO Tomoyuki [EMAIL

Re: Warning messages for ill-formed data

2003-03-21 Thread SADAHIRO Tomoyuki
due to the appearance of UDCs. SADAHIRO Tomoyuki On Fri, 21 Mar 2003 10:52:07 -0500 Mark Lewellen [EMAIL PROTECTED] wrote: Hi- I'm looking for recommendations on how to warn about and record problems with ill-formed data. Specifically, I'm reading in Big5 data from multiple files

Re: Warning messages for ill-formed data

2003-03-21 Thread SADAHIRO Tomoyuki
SADAHIRO Tomoyuki [EMAIL PROTECTED] said: P.S. Another problem. How can it be determined whether that user-defined character (UDC hereafter) is single-byte or double-byte? The file big5-eten.ucm does not contain how to determin the character length in bytes for an unmapped UDC

Re: Converting UTF-EBCDIC to UTF-8

2003-03-18 Thread SADAHIRO Tomoyuki
}) ? ok : not ok, 28\n; ### TESTS END And many thanks for discovering STRLEN-UV type mismatchings, too. SADAHIRO Tomoyuki Thank you for your report. I was careless about the trap on a non-ASCII platform like that (a eq \x61) is not true. So the failed tests are fixed, and some tests

Re: Converting UTF-EBCDIC to UTF-8

2003-03-17 Thread SADAHIRO Tomoyuki
-0.20.tar.gz http://homepage1.nifty.com/nomenclator/perl/Unicode-Transform.html SADAHIRO Tomoyuki I have run the Unicode-Transform module on using perl 5.8.0 on z/OS where perl's internal unicode format is UTF-EBCDIC, not UTF-8. The test results are as follows: /defects/brian/unicode

[ANN] Unicode::Normalize 0.20 released

2003-03-02 Thread SADAHIRO Tomoyuki
and will be also distributed from CPAN soon. SADAHIRO Tomoyuki

Re: Need some help in understanding Unicode in Perl...

2003-02-20 Thread SADAHIRO Tomoyuki
) specifications: RFC 2045, 6.7 http://www.ietf.org/rfc/rfc2045.txt?number=2045 perl module: MIME::QuotedPrint http://search.cpan.org/author/GAAS/MIME-Base64-2.16/ SADAHIRO Tomoyuki

Re: [PATCH] viscii.ucm

2003-02-16 Thread SADAHIRO Tomoyuki
://www.vietstd.org/document/unicode.html SADAHIRO Tomoyuki

Re: [PATCH] viscii.ucm

2003-02-16 Thread SADAHIRO Tomoyuki
in the version 2.4 nor in CVS ( http://oss.software.ibm.com/cvs/icu/charset/data/ ). SADAHIRO Tomoyuki

[PATCH] viscii.ucm

2003-02-15 Thread SADAHIRO Tomoyuki
WITH DOT BELOW regards, SADAHIRO Tomoyuki

Re: [PATCH] viscii.ucm

2003-02-15 Thread SADAHIRO Tomoyuki
A WITH CIRCUMFLEX AND HOOK ABOVE U1EAD \xA7 |0 # LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW U1EBD \xA8 |0 # LATIN SMALL LETTER E WITH TILDE U1EB9 \xA9 |0 # LATIN SMALL LETTER E WITH DOT BELOW regards, SADAHIRO Tomoyuki

Re: Handling MacArabic in perl 5.8.0

2003-01-31 Thread SADAHIRO Tomoyuki
/perl/Lingua-FA-MacFarsi-0.02.tar.gz http://homepage1.nifty.com/nomenclator/perl/Lingua-FA-MacFarsi.html Lingua::HE::MacHebrew http://homepage1.nifty.com/nomenclator/perl/Lingua-HE-MacHebrew-0.02.tar.gz http://homepage1.nifty.com/nomenclator/perl/Lingua-HE-MacHebrew.html SADAHIRO Tomoyuki

Re: Handling MacArabic in perl 5.8.0

2003-01-29 Thread SADAHIRO Tomoyuki
.nifty.com/nomenclator/perl/Lingua-AR-MacArabic-0.01.tar.gz HTML-ized POD http://homepage1.nifty.com/nomenclator/perl/Lingua-AR-MacArabic.html SADAHIRO Tomoyuki

Re: Handling MacArabic in perl 5.8.0

2003-01-26 Thread SADAHIRO Tomoyuki
if something wrong. (at least, the version here doesn't support embedding or nesting of direction.) SADAHIRO Tomoyuki Lingua-AR-MacArabic-0.00.tar.gz Description: Binary data

Re: beginniner's 5.6.1 latin1-utf8 question

2003-01-10 Thread SADAHIRO Tomoyuki
'. It is 'dead'. -- Jack Cohen SADAHIRO Tomoyuki

Re: Encode functionality for Perl 5.6.1

2002-09-21 Thread SADAHIRO Tomoyuki
SADAHIRO Tomoyuki

Re: Unicode::Normalize surprise with dotless i

2002-09-05 Thread SADAHIRO Tomoyuki
transliteration of Japanese, called ROMAJI, as long i. (Long i is usually represented by ii or i-macron, though.) If i-circumflex might be dotless-i with circumflex, but not i with circumflex, i-circumflex should be a long sound of dotless i, but not long i. That is also surprising. :) Regards, SADAHIRO

Re: Unicode::Collate 0.23 Released

2002-09-05 Thread SADAHIRO Tomoyuki
Cruces, NM 88003 Regards, SADAHIRO Tomoyuki

[Announce] Unicode::Collate 0.20 - UCA version 9

2002-07-25 Thread SADAHIRO Tomoyuki
method -change() to change some tailoring parameters of the collator Regards, SADAHIRO Tomoyuki

Re: Questions about Unicode Support in 5.6.1

2002-07-09 Thread SADAHIRO Tomoyuki
Unicode intro that will be part of the Perl 5.8.0 release, I have a copy online for easy access: http://www.iki.fi/jhi/perluniintro.pod Regards, SADAHIRO Tomoyuki

Re: Detecting 'narrowest' character set

2002-06-27 Thread SADAHIRO Tomoyuki
return ISO_8859_2; } elsif ($string !~ /\p{^InShift_JIS}/) { return Shift_JIS; } # Trial more ? Well, then add something. # There is room to tune up in the order of trials. return Unicode; # abandoned } __END__ Regards. SADAHIRO Tomoyuki

Another Unicode s/// buglet?

2002-06-26 Thread SADAHIRO Tomoyuki
of string) at bleedperl.pl not ok ok ok Regards, SADAHIRO Tomoyuki

[FYI] JIS X 0213 - Unicode 3.2.0

2002-03-31 Thread SADAHIRO Tomoyuki
Hello. I've prepare Shift_JISX0213 to Unicode 3.2.0 mapping table... http://homepage1.nifty.com/nomenclator/unicode/sjis0213.zip But I think it'd be too early to implement anything on it. Regards, SADAHIRO Tomoyuki

Re: Long name rocks! But how about *.ecm?

2002-03-25 Thread SADAHIRO Tomoyuki
-unicode/2002-03/msg00076.html Dan the Encode Maintainer Regards, SADAHIRO Tomoyuki

Re: perlunicode.pod mention of utf8::upgrade questionable

2002-03-21 Thread SADAHIRO Tomoyuki
). Some functions are slower when working on UTF-8 encoded strings than on byte encoded strings. All functions that need to hop over Another possibility is, of course, that the demonstrated behaviour is a vanilla bug and gets fixed before 5.8.0. :-/ -- andreas Sincerely SADAHIRO

Re: perlunicode.pod mention of utf8::upgrade questionable

2002-03-21 Thread SADAHIRO Tomoyuki
On Thu, 21 Mar 2002 22:12:33 +0900 SADAHIRO Tomoyuki [EMAIL PROTECTED] wrote: Nevertheless, we shouldn't distinguish Unicode-ness of hash keys; otherwise we'd be upset more... :-) #!perl use charnames qw(:full); my $alpha = \N{GREEK SMALL LETTER ALPHA}; # \x{945} = \xCE\xB1 UTF8

Re: Unicode::Normalize 0.15 update

2002-03-19 Thread SADAHIRO Tomoyuki
/sufficient/efficient/; $NFC_string = NFC($string); enjoy. SADAHIRO Tomoyuki Regards, SADAHIRO Tomoyuki

Re: jisx0212 support in Encode::JP is close

2002-03-19 Thread SADAHIRO Tomoyuki
\xA2\xA4\xA4\xA4\xA6\xA4\xA8\xA4\xAA ], - [euc-jp-0212, han_kana, \x8E\xB1\x8E\xB2\x8E\xB3\x8E\xB4\x8E\xB5 ], - [euc-jp-0212, macron, - \x8F\xAA\xA7\x8F\xAA\xB7\x8F\xAA\xC5\x8F\xAA\xD7\x8F\xAA\xE9 ], ); plan test = $n*@encodings + $n*@encodings*@greek #End of Patch sincerely, SADAHIRO

Re: The last pieces of Chinese puzzle

2002-03-05 Thread SADAHIRO Tomoyuki
'; my $class = ref($obj) '::' $subclass; # carp Loading $file; bless $obj,$class; -132,7 +131,6 require Encode::Tcl::Table; require Encode::Tcl::Escape; require Encode::Tcl::Extended; -require Encode::Tcl::HanZi; 1; __END__ #End of Patch Regards, SADAHIRO Tomoyuki

Re: Combining characters in front of base characters after normalization

2002-02-28 Thread SADAHIRO Tomoyuki
://www.cl.cam.ac.uk/~mgk25/ Regards, SADAHIRO Tomoyuki

UnicodeData.txt has an incorrect compat mapping!

2002-02-03 Thread SADAHIRO Tomoyuki
;96FBN; +F951;CJK COMPATIBILITY IDEOGRAPH-F951;Lo;0;L;964BN; F952;CJK COMPATIBILITY IDEOGRAPH-F952;Lo;0;L;52D2N; F953;CJK COMPATIBILITY IDEOGRAPH-F953;Lo;0;L;808BN; F954;CJK COMPATIBILITY IDEOGRAPH-F954;Lo;0;L;51DCN; End of patch Regards, SADAHIRO Tomoyuki

Re: perl-unicode-cgi problem

2001-09-24 Thread SADAHIRO Tomoyuki
. $str = pack 'U*', unpack 'U0U*', $str; If you have some interest in Perl 5.7 (or later), try the utf8::decode() function. There are some other ways to flag a string as UTF8, but one needs an XS code; one doesn't work on later versions Regards, SADAHIRO Tomoyuki

Re: please test Text::Unicode::Normalize + Sort::UCA

2001-08-16 Thread SADAHIRO Tomoyuki
-defined algorithms. - maybe unfamiliar (not only to western people) in comparison with LATIN, GREEK, HAN, etc. scripts. would it be better to gather those functions into one module? - Regards, SADAHIRO Tomoyuki

Re: Sort::UCA 0.04 - Unicode Collation Algorithm

2001-08-15 Thread SADAHIRO Tomoyuki
On Mon, 13 Aug 2001 10:07:42 -0500 Jarkko Hietaniemi [EMAIL PROTECTED] wrote: On Mon, Aug 13, 2001 at 10:35:32PM +0900, SADAHIRO Tomoyuki wrote: Hello, everyone. Sort::UCA 0.04 has been uploaded on CPAN. snip To Do: conformance tests of Unicode 3.1.1 Beta (at present it's DRAFT

Re: Unicode Normalization Forms

2001-08-10 Thread SADAHIRO Tomoyuki
On Thu, 09 Aug 2001 22:30:16 +0200 Bjoern Hoehrmann [EMAIL PROTECTED] wrote: * SADAHIRO Tomoyuki wrote: How about the following interface? | $normalized_string = normalize($raw_string) | | You can use this function only if the normalization form | you require is specified in the Cuse

Re: Unicode Normalization Forms

2001-08-09 Thread SADAHIRO Tomoyuki
no Text::Unicode tree on CPAN but there is a Unicode:: tree and it fits quite well there. The normalize function increases readabilty and looks nicer. Is it expectable, that Perl will normalize everything it puts out by itself or will we have to use this module? Regards, SADAHIRO Tomoyuki

Unicode Normalization Forms

2001-08-08 Thread SADAHIRO Tomoyuki
::Util.pm (available via CPAN) It also runs on Perl 5.6, even if unicode/*.* are for unicode 3.0.1. But NormalizationTest of unicode 3.1 requires those for unicode 3.1.0 in the distribution of Perl 5.7.2. Regards, SADAHIRO Tomoyuki

Re: UTF-8 in web pages

2001-08-07 Thread SADAHIRO Tomoyuki
into Word, but all my other programs receive question marks). How about Outlook Express? Another choice is copying into a TEXTAREA field of a FORM in the browser followed by writing it in a file via CGI on your local machine. Regards, SADAHIRO Tomoyuki E-mail: [EMAIL PROTECTED]

Re: Unicode Collation Algorithm

2001-08-04 Thread SADAHIRO Tomoyuki
$result = $uca-cmp($a, $b); # returns 1, 0, or -1. SEE ALSO http://www.unicode.org/unicode/reports/tr10/ But this is Alpha version. Any feature (including module name) may be changed. Please comment on it. regards, SADAHIRO Tomoyuki

[proposal] utility module for Hangul Syllables

2001-08-03 Thread SADAHIRO Tomoyuki
passing a character outside Hangul syllable in shouldn't be carped or croaked, since it supposes the return value would be *always* checked. regards, SADAHIRO Tomoyuki E-mail: [EMAIL PROTECTED]

Unicode Collation Algorithm

2001-08-03 Thread SADAHIRO Tomoyuki
, SADAHIRO Tomoyuki

Re: [proposal] utility module for Hangul Syllables

2001-08-03 Thread SADAHIRO Tomoyuki
) parseHangulName($name, Short | Medium | Long) * Short allows it to accept names like GA. Medium, like HANGUL SYLLABLE GA. (or Default?) Long, like HANGUL SYLLABLE KIYEOK-A. The mode may be used in getHangulName. BTW, are there any formats other than the above three? Regards, SADAHIRO Tomoyuki

[PATCH] Encode.pm to use escape-sequence encoding

2001-06-29 Thread SADAHIRO Tomoyuki
, SADAHIRO Tomoyuki E-mail: [EMAIL PROTECTED] URL: http://homepage1.nifty.com/nomenclator/perl/

Encode::Tcl for multibyte doesnot work

2001-06-24 Thread SADAHIRO Tomoyuki
' for 'shiftjis' and 'cp932' work. (I found mapping of shiftjis.enc differs from that of Unicode's EASTASIA/JIS/SHIFTJIS.txt by 3 codepoints.) Regards, SADAHIRO Tomoyuki E-mail: [EMAIL PROTECTED] URL: http://homepage1.nifty.com/nomenclator/