Re: real UTF-8 vs. utf8n_to_uvuni()

2004-12-06 Thread Gisle Aas
Tim Bunce <[EMAIL PROTECTED]> writes: > On Sun, Dec 05, 2004 at 11:58:54AM +0900, Dan Kogai wrote: > > % perl -Mblib -MEncode -le '$a="\x{}"; print encode("UTF-8", $a, 1)' > > "\x{}" does not map to utf8 at [...] > > Shouldn't that (and similar messages) say "... does not map to UTF-8" ?

Re: Make Encode.pm support the real UTF-8

2004-12-06 Thread Nick Ing-Simmons
Bjoern Hoehrmann <[EMAIL PROTECTED]> writes: > >>> Now that we have this problem, introducing more places where one needs >>> to carefully check the documentation what is considered UTF-8 does not >>> seem like the best option, having decode_utf8() and decode(utf8=>...) >>> mean some- thing differe

Re: real UTF-8 vs. utf8n_to_uvuni()

2004-12-06 Thread Tim Bunce
On Sun, Dec 05, 2004 at 11:58:54AM +0900, Dan Kogai wrote: > % perl -Mblib -MEncode -le '$a="\x{}"; print encode("UTF-8", $a, 1)' > "\x{}" does not map to utf8 at [...] Shouldn't that (and similar messages) say "... does not map to UTF-8" ? Tim.

Re: real UTF-8 vs. utf8n_to_uvuni()

2004-12-06 Thread Nicholas Clark
On Sun, Dec 05, 2004 at 11:58:54AM +0900, Dan Kogai wrote: > Sine Gisle's patch make use of utf8n_to_uvuni(), it seems to be a > problem of perl core. So I have checked utf8.c which defines that. > Seems like it does not make use of PERL_UNICODE_MAX. > > The patch against utf8.c fixes that.

UTF8_ALLOW_ANYUV should not allow overlong sequences [PATCH]

2004-12-06 Thread Gisle Aas
Perl use the UTF8_ALLOW_ANYUV mask in functions that should not be restricted to only the valid Unicode code points. For some reason this mask currently include the UTF8_ALLOW_LONG flag. This seems totally wrong as there can't be a good reason to allow overlong sequences just because we don't wan

Re: real UTF-8 vs. utf8n_to_uvuni()

2004-12-06 Thread Gisle Aas
Dan Kogai <[EMAIL PROTECTED]> writes: > Sine Gisle's patch make use of utf8n_to_uvuni(), it seems to be a > problem of perl core. So I have checked utf8.c which defines that. > Seems like it does not make use of PERL_UNICODE_MAX. > > The patch against utf8.c fixes that. Seems like a good idea t