Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-25 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Oct 25, 2004, at 03:01, Nick Ing-Simmons wrote: >> But as Dan said at the start \xF6 on its own (say as 1023 octet >> in a 0..1023 1024-octet buffer is not a fail. >> Changing that will make :encoding() layer have problems as buffer >> boundaries can occur

Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-24 Thread Dan Kogai
On Oct 25, 2004, at 03:01, Nick Ing-Simmons wrote: But as Dan said at the start \xF6 on its own (say as 1023 octet in a 0..1023 1024-octet buffer is not a fail. Changing that will make :encoding() layer have problems as buffer boundaries can occur in the middle of characters. Right. Encode-2.07 in

Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-24 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Oct 23, 2004, at 01:04, Bjoern Hoehrmann wrote: >> C12a in Unicode 4.0.1 notes >> >> [...] >> For example, in UTF-8 every code unit of the form 110 must be >> followed by a code unit of the form 10xx. A sequence such as >> 110x 0xxx is

Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-22 Thread Dan Kogai
On Oct 23, 2004, at 01:04, Bjoern Hoehrmann wrote: C12a in Unicode 4.0.1 notes [...] For example, in UTF-8 every code unit of the form 110 must be followed by a code unit of the form 10xx. A sequence such as 110x 0xxx is illformed and must never be generated. When faced with

Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-22 Thread Bjoern Hoehrmann
* Dan Kogai wrote: >> perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))" >> perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rnx))" >Though unicode.org does not assign any character on U+18 (yet), >"\xF6\x80\x80\x80" is a valid UTF-8 character from perl's point of >view. Perl only

Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-22 Thread Dan Kogai
On Oct 22, 2004, at 20:42, Bjoern Hoehrmann wrote: No, you misread the bug report, I expect that perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))" perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rnx))" behave the same in that the malformed sequence \xF6 gets replaced by U+FFFD as docume