Re: Warning messages for ill-formed data

2003-03-21 Thread SADAHIRO Tomoyuki
> SADAHIRO Tomoyuki <[EMAIL PROTECTED]> said: > > > P.S. Another problem. How can it be determined whether that > > user-defined character (UDC hereafter) is single-byte or double-byte? > > > > The file big5-eten.ucm does not contain how to determin the character > > length in bytes for an unmap

Re: Warning messages for ill-formed data

2003-03-21 Thread David Graff
SADAHIRO Tomoyuki <[EMAIL PROTECTED]> said: > P.S. Another problem. How can it be determined whether that > user-defined character (UDC hereafter) is single-byte or double-byte? > > The file big5-eten.ucm does not contain how to determin the character > length in bytes for an unmapped UDC. As I

Re: Warning messages for ill-formed data

2003-03-21 Thread SADAHIRO Tomoyuki
An example, but it's still raw. use Encode; open( IN_FH, $inputFile ) or die; while( $line = ) { eval { $line = decode('big5', $line, Encode::FB_CROAK ) }; if ($@) { warn "ill-formed line at line $. in $inputFile.\n"; printf ERRORLOG "File %s (line %d): %s", $inputFile, $.

Re: Converting UTF-EBCDIC to UTF-8

2003-03-21 Thread SADAHIRO Tomoyuki
(test draft snipped) > I have replace test cases 25 - 28 with the ones listed above. And the > results were as follows: > > /defects/brian/unicode/Unicode-Transform-0.20:>make test > PERL_DL_NONLAZY=1 /defects/brian/nonthreaded/perl-5.8.0/perl > "-I/defects/brian/n > onthreaded/perl-5.8.0/lib"

Warning messages for ill-formed data

2003-03-21 Thread Mark Lewellen
Hi- I'm looking for recommendations on how to warn about and record problems with ill-formed data. Specifically, I'm reading in Big5 data from multiple files and converting it to Perl's utf8, and some of the Big5 double-byte combinations are illegal (they appear to be user-defined special symbo

Re: Converting UTF-EBCDIC to UTF-8

2003-03-21 Thread Brian DePradine
>### TESTS START >$utf8_fe7f_upgraded = ord("A") != 0x41 >? pack('U*', 213, 190, 215) # EBCDIC "\xef\xb9\xbf" >: pack('U*', 239, 185, 191); # ASCII "\xef\xb9\xbf" > >$utf8_fe7f_bytes = pack('C*', 239, 185, 191); > >print "\x{fe7f}" eq utf8_to_unicode($utf8_fe7f_upgraded) >? "ok" : "not ok", " 25\