Dan Kogai <[EMAIL PROTECTED]> writes:
>On Oct 25, 2004, at 03:01, Nick Ing-Simmons wrote:
>> But as Dan said at the start \xF6 on its own (say as 1023 octet
>> in a 0..1023 1024-octet buffer is not a fail.
>> Changing that will make :encoding() layer have problems as buffer
>> boundaries can occur
On Oct 25, 2004, at 03:01, Nick Ing-Simmons wrote:
But as Dan said at the start \xF6 on its own (say as 1023 octet
in a 0..1023 1024-octet buffer is not a fail.
Changing that will make :encoding() layer have problems as buffer
boundaries can occur in the middle of characters.
Right. Encode-2.07 in
Dan Kogai <[EMAIL PROTECTED]> writes:
>On Oct 23, 2004, at 01:04, Bjoern Hoehrmann wrote:
>> C12a in Unicode 4.0.1 notes
>>
>> [...]
>> For example, in UTF-8 every code unit of the form 110 must be
>> followed by a code unit of the form 10xx. A sequence such as
>> 110x 0xxx is
On Oct 23, 2004, at 01:04, Bjoern Hoehrmann wrote:
C12a in Unicode 4.0.1 notes
[...]
For example, in UTF-8 every code unit of the form 110 must be
followed by a code unit of the form 10xx. A sequence such as
110x 0xxx is illformed and must never be generated. When
faced with
* Dan Kogai wrote:
>> perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))"
>> perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rnx))"
>Though unicode.org does not assign any character on U+18 (yet),
>"\xF6\x80\x80\x80" is a valid UTF-8 character from perl's point of
>view. Perl only
On Oct 22, 2004, at 20:42, Bjoern Hoehrmann wrote:
No, you misread the bug report, I expect that
perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))"
perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rnx))"
behave the same in that the malformed sequence \xF6 gets replaced by
U+FFFD as docume