> SADAHIRO Tomoyuki <[EMAIL PROTECTED]> said:
>
> > P.S. Another problem. How can it be determined whether that
> > user-defined character (UDC hereafter) is single-byte or double-byte?
> >
> > The file big5-eten.ucm does not contain how to determin the character
> > length in bytes for an unmap
SADAHIRO Tomoyuki <[EMAIL PROTECTED]> said:
> P.S. Another problem. How can it be determined whether that
> user-defined character (UDC hereafter) is single-byte or double-byte?
>
> The file big5-eten.ucm does not contain how to determin the character
> length in bytes for an unmapped UDC.
As I
An example, but it's still raw.
use Encode;
open( IN_FH, $inputFile ) or die;
while( $line = ) {
eval { $line = decode('big5', $line, Encode::FB_CROAK ) };
if ($@) {
warn "ill-formed line at line $. in $inputFile.\n";
printf ERRORLOG "File %s (line %d): %s", $inputFile, $.
(test draft snipped)
> I have replace test cases 25 - 28 with the ones listed above. And the
> results were as follows:
>
> /defects/brian/unicode/Unicode-Transform-0.20:>make test
> PERL_DL_NONLAZY=1 /defects/brian/nonthreaded/perl-5.8.0/perl
> "-I/defects/brian/n
> onthreaded/perl-5.8.0/lib"
Hi-
I'm looking for recommendations on how to warn about and record
problems
with ill-formed data. Specifically, I'm reading in Big5 data from
multiple files
and converting it to Perl's utf8, and some of the Big5 double-byte
combinations
are illegal (they appear to be user-defined special symbo
>### TESTS START
>$utf8_fe7f_upgraded = ord("A") != 0x41
>? pack('U*', 213, 190, 215) # EBCDIC "\xef\xb9\xbf"
>: pack('U*', 239, 185, 191); # ASCII "\xef\xb9\xbf"
>
>$utf8_fe7f_bytes = pack('C*', 239, 185, 191);
>
>print "\x{fe7f}" eq utf8_to_unicode($utf8_fe7f_upgraded)
>? "ok" : "not ok", " 25\