Re: [pugs] regexp "bug"?

2005-04-15 Thread Nicholas Clark
On Fri, Apr 15, 2005 at 09:34:58AM -0700, Larry Wall wrote: > It doesn't have to be the default, though. But there has to be > some way of allowing illegal characters to be talked about, or > you can't write programs that talk about them. It's like saying Thoughtcrime acceptable. Doubleplusgood

Re: [pugs] regexp "bug"?

2005-04-15 Thread Larry Wall
On Fri, Apr 15, 2005 at 05:12:54PM +, [EMAIL PROTECTED] wrote: : Isn't that what the difference between byte-level and codepoint-level : access to strings is all about. If you want to work with values that : are illegal codepoints then you should be working at the byte-level : not the codepoi

Re: [pugs] regexp "bug"?

2005-04-15 Thread mark . a . biggar
Isn't that what the difference between byte-level and codepoint-level access to strings is all about. If you want to work with values that are illegal codepoints then you should be working at the byte-level not the codepoint-level, at least by default. -- Mark Biggar [EMAIL PROTECTED] [EMAIL

Re: [pugs] regexp "bug"?

2005-04-15 Thread Larry Wall
On Fri, Apr 15, 2005 at 12:56:14AM -0700, Mark A. Biggar wrote: : Yes, the value 0x can be stored as either 3 byte UTF-8 string or a 2 : byte UCS-2 value, but the Unicode standard specifically says that the : values 0x, 0xFFFE and 0xFEFF are NOT valid codepoints and should : never appear

Re: [pugs] regexp "bug"?

2005-04-15 Thread hv
"Mark A. Biggar" <[EMAIL PROTECTED]> wrote: :BÁRTHÁZI András wrote: : :> Hi, :> :> This code: :> :> my $a='A'; :> $a ~~ s:perl5:g/A/{chr(65535)}/; :> say $a.bytes; :> :> Outputs "0". Why? :> :> Bye, :> Andras :> : :\u is not a legal unicode codepoint. chr(65535) should raise an :except

Re: [pugs] regexp "bug"?

2005-04-15 Thread Mark A. Biggar
BÁRTHÁZI András wrote: Hi, >> This code: >> >> my $a='A'; >> $a ~~ s:perl5:g/A/{chr(65535)}/; >> say $a.bytes; >> >> Outputs "0". Why? > > > \u is not a legal unicode codepoint. chr(65535) should raise an exception of some type. So the above code does seem show a possible bug. But

Re: [pugs] regexp "bug"?

2005-04-15 Thread Mark A. Biggar
BÁRTHÁZI András wrote: Hi, This code: my $a='A'; $a ~~ s:perl5:g/A/{chr(65535)}/; say $a.bytes; Outputs "0". Why? Bye, Andras \u is not a legal unicode codepoint. chr(65535) should raise an exception of some type. So the above code does seem show a possible bug. But as that chr(65535) is

Re: [pugs] regexp "bug"?

2005-04-15 Thread BÁRTHÁZI András
Hi, my $a='A'; $a ~~ s:perl5:g/A/{chr(65535)}/; say $a.bytes; Outputs "0". Why? \u is not a legal unicode codepoint. chr(65535) should raise an exception of some type. So the above code does seem show a possible bug. But as that chr(65535) is an undefined char, who knows what the code is a

Re: [pugs] regexp "bug"?

2005-04-15 Thread BÁRTHÁZI András
Hi, Yes, the value 0x can be stored as either 3 byte UTF-8 string or a 2 byte UCS-2 value, but the Unicode standard specifically says that the values 0x, 0xFFFE and 0xFEFF are NOT valid codepoints and should never appear in a Unicode string. 0x is reserved for out-of-band signaling

Re: [pugs] regexp "bug"?

2005-04-15 Thread BÁRTHÁZI András
Hi, >> This code: >> >> my $a='A'; >> $a ~~ s:perl5:g/A/{chr(65535)}/; >> say $a.bytes; >> >> Outputs "0". Why? > > > \u is not a legal unicode codepoint. chr(65535) should raise an exception of some type. So the above code does seem show a possible bug. But as that chr(65535) is an undefin

[pugs] regexp "bug"?

2005-04-14 Thread BÁRTHÁZI András
Hi, This code: my $a='A'; $a ~~ s:perl5:g/A/{chr(65535)}/; say $a.bytes; Outputs "0". Why? Bye, Andras