Re: Bit ops on strings

2004-05-26 Thread Nicholas Clark
On Tue, May 25, 2004 at 07:48:32PM -0700, Jeff Clites wrote: On May 25, 2004, at 12:26 PM, Dan Sugalski wrote: Yup. UTF8 is Just another variable-width encoding. Do anything with it and we convert it to a fixed-width encoding, in this case UTF32. This has the unfortunate side-effect of

Re: Bit ops on strings

2004-05-26 Thread Jeff Clites
On May 26, 2004, at 2:02 AM, Nicholas Clark wrote: On Tue, May 25, 2004 at 07:48:32PM -0700, Jeff Clites wrote: On May 25, 2004, at 12:26 PM, Dan Sugalski wrote: Yup. UTF8 is Just another variable-width encoding. Do anything with it and we convert it to a fixed-width encoding, in this case

Re: Bit ops on strings

2004-05-25 Thread Nicholas Clark
On Sun, May 02, 2004 at 11:37:31AM -0700, Jeff Clites wrote: Two more things to keep in mind: On May 1, 2004, at 4:54 PM, Aaron Sherman wrote: If Perl defaults to UTF-8 People need to realize also that although UTF-8 is a pretty good interchange format, it's a really bad in-memory

Re: Bit ops on strings

2004-05-25 Thread Dan Sugalski
At 12:30 PM +0100 5/25/04, Nicholas Clark wrote: I may be misremembering what I've read here but I thought that Dan said that for variable length encodings (such as shift-JIS) parrot would store the byte(s) in memory in constant size 16 or 32 bit integers, rather than the (external) variable

Re: Bit ops on strings

2004-05-25 Thread Nicholas Clark
On Tue, May 25, 2004 at 03:26:45PM -0400, Dan Sugalski wrote: Yup. UTF8 is Just another variable-width encoding. Do anything with it and we convert it to a fixed-width encoding, in this case UTF32. Does this mean that we won't be verifying the validity of UTF8 on input? And instead pitching

Re: Bit ops on strings

2004-05-25 Thread Dan Sugalski
At 8:30 PM +0100 5/25/04, Nicholas Clark wrote: On Tue, May 25, 2004 at 03:26:45PM -0400, Dan Sugalski wrote: Yup. UTF8 is Just another variable-width encoding. Do anything with it and we convert it to a fixed-width encoding, in this case UTF32. Does this mean that we won't be verifying the

Re: Bit ops on strings

2004-05-25 Thread Jeff Clites
On May 25, 2004, at 12:26 PM, Dan Sugalski wrote: At 12:30 PM +0100 5/25/04, Nicholas Clark wrote: I may be misremembering what I've read here but I thought that Dan said that for variable length encodings (such as shift-JIS) parrot would store the byte(s) in memory in constant size 16 or 32 bit

Re: Bit ops on strings

2004-05-02 Thread Jarkko Hietaniemi
I am very confused. THIS IS WHAT WE ALL SEEM TO BE SAYING. BITOPS ONLY ON EIGHT-BIT DATA. AM I WRONG? No, it's not, and could you please not get emotional about this? It's I apologize for using UPPERCASE. My only excuse is that it was not personally aimed at you: I have been griping

Re: Bit ops on strings

2004-05-02 Thread Jeff Clites
Two more things to keep in mind: On May 1, 2004, at 4:54 PM, Aaron Sherman wrote: If Perl defaults to UTF-8 People need to realize also that although UTF-8 is a pretty good interchange format, it's a really bad in-memory representation. This is for at least 2 related reasons: (1) To get to the

Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi
The bitshift operations on S-register contents are valid, so long as the thing hanging off the register support it. Binary data ought allow this. Most 8-bit string encodings will have to support it whether it's a good idea or not, since you can do it now. If Jarkko tells me you can do

RE: Bit ops on strings

2004-05-01 Thread Bryan C. Warnock
On Fri, 2004-04-30 at 13:53, Dan Sugalski wrote: Parrot, at the very low levels, makes no distinction between strings and buffers--as far as it's concerned they're the same thing, and either can hang off an S register. (Ultimately, when *I* talk of strings I mean A thing I can hang off an S

RE: Bit ops on strings

2004-05-01 Thread Bryan C. Warnock
On Fri, 2004-04-30 at 15:34, Dan Sugalski wrote: If you want, you could think of the S-register strings as mini-PMCs. The encoding and charset stuff (we'll ignore language semantics for the moment) are essentially small vtables that hang off the string, and whatever we do with it mostly

Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 04:57, Jarkko Hietaniemi wrote: If Jarkko tells me you can do bitwise operations with unicode text now in Perl 5, well... we'll support it there, too, though we shan't like it at all. We can and I don't like it at all [...] None of it anything I want to

Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi
So it seems to me that the obvious way to go is to have all bit-s operations first convert to raw bytes (possibly throwing an exception) and then proceed to do their work. If these conversions croak if there are code points beyond \x{ff}, I'm fine with it. But trying to mix \x{100} or

Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 11:26, Jarkko Hietaniemi wrote: As for codepoints outside of \x00-\xff, I vote exception. I don't think there's any other logical choice, but I think it's just an encoding conversion exception, not a special bit-op exception (that's arm-waving, I have not looked at Parrot's

Re: Bit ops on strings

2004-05-01 Thread Jeff Clites
On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote: So it seems to me that the obvious way to go is to have all bit-s operations first convert to raw bytes (possibly throwing an exception) and then proceed to do their work. If these conversions croak if there are code points beyond \x{ff}, I'm

Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 14:18, Jeff Clites wrote: On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote: Just FYI, the way I implemented bitwise-not so far, was to bitwise-not code points 0x{00}-0x{FF} as uint8-sized things, 0x{100}-0x{} as uint16-sized things, and 0x{} as uint32-sized

Re: Bit ops on strings

2004-05-01 Thread Jarkko Hietaniemi
How are you defining valid UTF-8? Is there a codepoint in UTF-8 between \x00 and \xff that isn't valid? Is there a reason to ever do Like, half of them? \x80 .. \xff are all invalid as UTF-8. bitwise operations on anything other than 8-bit codepoints? I am very confused. THIS IS WHAT WE

Re: Bit ops on strings

2004-05-01 Thread Jeff Clites
On May 1, 2004, at 12:00 PM, Aaron Sherman wrote: On Sat, 2004-05-01 at 14:18, Jeff Clites wrote: Exactly. And also realize that if you bitwise-not (or shift or something similar) the bytes of a UTF-8 serialization of something, the result isn't going to be valid UTF-8, so you'd be hard-pressed

Re: Bit ops on strings

2004-05-01 Thread Andrew E Switala
It's been said that what the masses think of as binary data is outside the concept of a string, and this lurker just don't see that. A binary string is string over a character set of size two, just like an ASCII string is a string over a character set of size 128. [Like character strings,

Re: Bit ops on strings

2004-05-01 Thread Aaron Sherman
On Sat, 2004-05-01 at 15:09, Jarkko Hietaniemi wrote: How are you defining valid UTF-8? Is there a codepoint in UTF-8 between \x00 and \xff that isn't valid? Is there a reason to ever do Like, half of them? \x80 .. \xff are all invalid as UTF-8. Heh, damn Ken Thompson and his placemat! I

Re: Bit ops on strings

2004-04-30 Thread Bryan C. Warnock
On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote: I think left and right shift of strings should work the same way that shifts on ints works--that is, it doesn't grow, bits just fall off the end. You can decide whether to sign-extend or 0-extend, either one's OK. Have we[1] finished working

Re: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 9:07 AM -0400 4/30/04, Bryan C. Warnock wrote: On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote: I think left and right shift of strings should work the same way that shifts on ints works--that is, it doesn't grow, bits just fall off the end. You can decide whether to sign-extend or 0-extend,

RE: Bit ops on strings

2004-04-30 Thread Butler, Gerald
If I may interject for a moment: -Original Message- From: Bryan C. Warnock [mailto:[EMAIL PROTECTED] Sent: Friday, April 30, 2004 9:08 AM To: Dan Sugalski Cc: Perl6 Internals List Subject: Re: Bit ops on strings On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote: I think left and right

Re: Bit ops on strings

2004-04-30 Thread Aaron Sherman
On Fri, 2004-04-30 at 10:42, Dan Sugalski wrote: Bitstring operations ought only be valid on binary data, though, unless someone can give me a good reason why we ought to allow bitshifting on Unicode. (And then give me a reasoned argument *how*, too) 100% agree. If you want to play games

RE: Bit ops on strings

2004-04-30 Thread Aaron Sherman
On Fri, 2004-04-30 at 09:47, Butler, Gerald wrote: If I may interject for a moment: Let me start by saying that I have not drunk the Unicode cool-aid. I'm not at all certain that the overhead required to do all of what Parrot wants to do is warranted, BUT that's beside the point. Parrot is

RE: Bit ops on strings

2004-04-30 Thread Aaron Sherman
On Fri, 2004-04-30 at 12:18, Butler, Gerald wrote: Now, we have people talking about doing LSL/LSR on Strings. That is 100% inconsistent with that definition of a String. Not at all, and keep in mind that I didn't propose this out of the blue. bands, bxors and bors are existing string ops and

Re: Bit ops on strings

2004-04-30 Thread Andre Pang
On 30/04/2004, at 11:47 PM, Butler, Gerald wrote: 1. String - low-level, abstract, base class (or in Perl6 terms role -- I think) which represents a logically contiguous series of Parrot Int 2. BinaryString - inherits from String, represents a logically contiguous series of

Re: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 2:57 AM +1000 5/1/04, Andre Pang wrote: Of course Parrot should have a function to reinterpret something of a string type as raw binary data and vice versa, but don't mix binary data with strings: they are completely different types, and raw binary data should never be able to be put into a

RE: Bit ops on strings

2004-04-30 Thread Butler, Gerald
-Original Message- From: Aaron Sherman [mailto:[EMAIL PROTECTED] Sent: Friday, April 30, 2004 11:58 AM To: Butler, Gerald Cc: Perl6 Internals List Subject: RE: Bit ops on strings On Fri, 2004-04-30 at 09:47, Butler, Gerald wrote: If I may interject for a moment: Let me start

RE: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 12:18 PM -0400 4/30/04, Butler, Gerald wrote: A string is what Dan described in his various postings on strings. Nuff said. Gerald Butler responds: Yes, I know a String is what Dan described. He described a thingy made up of 32-bit Values where each value represented a Code-Point. Now, we

RE: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 2:58 PM -0400 4/30/04, Bryan C. Warnock wrote: On Fri, 2004-04-30 at 13:53, Dan Sugalski wrote: Parrot, at the very low levels, makes no distinction between strings and buffers--as far as it's concerned they're the same thing, and either can hang off an S register. (Ultimately, when *I* talk

RE: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 4:15 PM -0400 4/30/04, Bryan C. Warnock wrote: On Fri, 2004-04-30 at 15:34, Dan Sugalski wrote: If you want, you could think of the S-register strings as mini-PMCs. The encoding and charset stuff (we'll ignore language semantics for the moment) are essentially small vtables that hang off the

Re: Bit ops on strings

2004-04-30 Thread Leopold Toetsch
Dan Sugalski [EMAIL PROTECTED] wrote: If you want, you could think of the S-register strings as mini-PMCs. The encoding and charset stuff (we'll ignore language semantics for the moment) are essentially small vtables that hang off the string, I think its the cleanest way of implementing all

Re: Bit ops on strings

2004-04-30 Thread Jeff Clites
On Apr 30, 2004, at 10:22 AM, Dan Sugalski wrote: At 2:57 AM +1000 5/1/04, Andre Pang wrote: Of course Parrot should have a function to reinterpret something of a string type as raw binary data and vice versa, but don't mix binary data with strings: they are completely different types, and raw

Re: Bit ops on strings

2004-04-30 Thread Dan Sugalski
At 7:07 PM -0700 4/30/04, Jeff Clites wrote: On Apr 30, 2004, at 10:22 AM, Dan Sugalski wrote: At 2:57 AM +1000 5/1/04, Andre Pang wrote: Of course Parrot should have a function to reinterpret something of a string type as raw binary data and vice versa, but don't mix binary data with strings:

Bit ops on strings

2004-04-29 Thread Aaron Sherman
bit.ops defines some ops on strings, and not others. I was wondering if anyone thinks the following would be useful (I'm offering to write them, as it won't be much work): lsls(inout STR, in INT) lsrs(inout STR, in INT) and, of course, their appropriate permutations. For those

Re: Bit ops on strings

2004-04-29 Thread Dan Sugalski
At 11:49 AM -0400 4/29/04, Aaron Sherman wrote: bit.ops defines some ops on strings, and not others. I was wondering if anyone thinks the following would be useful (I'm offering to write them, as it won't be much work): lsls(inout STR, in INT) lsrs(inout STR, in INT) and, of