On Tue, May 25, 2004 at 07:48:32PM -0700, Jeff Clites wrote:
On May 25, 2004, at 12:26 PM, Dan Sugalski wrote:
Yup. UTF8 is Just another variable-width encoding. Do anything with it
and we convert it to a fixed-width encoding, in this case UTF32.
This has the unfortunate side-effect of
On May 26, 2004, at 2:02 AM, Nicholas Clark wrote:
On Tue, May 25, 2004 at 07:48:32PM -0700, Jeff Clites wrote:
On May 25, 2004, at 12:26 PM, Dan Sugalski wrote:
Yup. UTF8 is Just another variable-width encoding. Do anything with
it
and we convert it to a fixed-width encoding, in this case
On Sun, May 02, 2004 at 11:37:31AM -0700, Jeff Clites wrote:
Two more things to keep in mind:
On May 1, 2004, at 4:54 PM, Aaron Sherman wrote:
If Perl defaults to UTF-8
People need to realize also that although UTF-8 is a pretty good
interchange format, it's a really bad in-memory
At 12:30 PM +0100 5/25/04, Nicholas Clark wrote:
I may be misremembering what I've read here but I thought that Dan said
that for variable length encodings (such as shift-JIS) parrot would store
the byte(s) in memory in constant size 16 or 32 bit integers, rather than
the (external) variable
On Tue, May 25, 2004 at 03:26:45PM -0400, Dan Sugalski wrote:
Yup. UTF8 is Just another variable-width encoding. Do anything with
it and we convert it to a fixed-width encoding, in this case UTF32.
Does this mean that we won't be verifying the validity of UTF8 on input?
And instead pitching
At 8:30 PM +0100 5/25/04, Nicholas Clark wrote:
On Tue, May 25, 2004 at 03:26:45PM -0400, Dan Sugalski wrote:
Yup. UTF8 is Just another variable-width encoding. Do anything with
it and we convert it to a fixed-width encoding, in this case UTF32.
Does this mean that we won't be verifying the
On May 25, 2004, at 12:26 PM, Dan Sugalski wrote:
At 12:30 PM +0100 5/25/04, Nicholas Clark wrote:
I may be misremembering what I've read here but I thought that Dan
said
that for variable length encodings (such as shift-JIS) parrot would
store
the byte(s) in memory in constant size 16 or 32 bit
I am very confused. THIS IS WHAT WE ALL SEEM TO BE SAYING. BITOPS ONLY
ON EIGHT-BIT DATA. AM I WRONG?
No, it's not, and could you please not get emotional about this? It's
I apologize for using UPPERCASE. My only excuse is that it was not
personally aimed at you: I have been griping
Two more things to keep in mind:
On May 1, 2004, at 4:54 PM, Aaron Sherman wrote:
If Perl defaults to UTF-8
People need to realize also that although UTF-8 is a pretty good
interchange format, it's a really bad in-memory representation. This is
for at least 2 related reasons: (1) To get to the
The bitshift operations on S-register contents are valid, so long as
the thing hanging off the register support it. Binary data ought
allow this. Most 8-bit string encodings will have to support it
whether it's a good idea or not, since you can do it now. If Jarkko
tells me you can do
On Fri, 2004-04-30 at 13:53, Dan Sugalski wrote:
Parrot, at the very low levels, makes no distinction between strings
and buffers--as far as it's concerned they're the same thing, and
either can hang off an S register. (Ultimately, when *I* talk of
strings I mean A thing I can hang off an S
On Fri, 2004-04-30 at 15:34, Dan Sugalski wrote:
If you want, you could think of the S-register strings as mini-PMCs.
The encoding and charset stuff (we'll ignore language semantics for
the moment) are essentially small vtables that hang off the string,
and whatever we do with it mostly
On Sat, 2004-05-01 at 04:57, Jarkko Hietaniemi wrote:
If Jarkko
tells me you can do bitwise operations with unicode text now in Perl
5, well... we'll support it there, too, though we shan't like it at
all.
We can and I don't like it at all [...]
None of it anything I want to
So it seems to me that the obvious way to go is to have all bit-s
operations first convert to raw bytes (possibly throwing an exception)
and then proceed to do their work.
If these conversions croak if there are code points beyond \x{ff}, I'm
fine with it. But trying to mix \x{100} or
On Sat, 2004-05-01 at 11:26, Jarkko Hietaniemi wrote:
As for codepoints outside of \x00-\xff, I vote exception. I don't think
there's any other logical choice, but I think it's just an encoding
conversion exception, not a special bit-op exception (that's arm-waving,
I have not looked at Parrot's
On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote:
So it seems to me that the obvious way to go is to have all bit-s
operations first convert to raw bytes (possibly throwing an exception)
and then proceed to do their work.
If these conversions croak if there are code points beyond \x{ff}, I'm
On Sat, 2004-05-01 at 14:18, Jeff Clites wrote:
On May 1, 2004, at 8:26 AM, Jarkko Hietaniemi wrote:
Just FYI, the way I implemented bitwise-not so far, was to bitwise-not
code points 0x{00}-0x{FF} as uint8-sized things, 0x{100}-0x{} as
uint16-sized things, and 0x{} as uint32-sized
How are you defining valid UTF-8? Is there a codepoint in UTF-8
between \x00 and \xff that isn't valid? Is there a reason to ever do
Like, half of them? \x80 .. \xff are all invalid as UTF-8.
bitwise operations on anything other than 8-bit codepoints?
I am very confused. THIS IS WHAT WE
On May 1, 2004, at 12:00 PM, Aaron Sherman wrote:
On Sat, 2004-05-01 at 14:18, Jeff Clites wrote:
Exactly. And also realize that if you bitwise-not (or shift or
something similar) the bytes of a UTF-8 serialization of something,
the
result isn't going to be valid UTF-8, so you'd be hard-pressed
It's been said that what the masses think of as binary data is outside
the concept of a string, and this lurker just don't see that. A binary
string is string over a character set of size two, just like an ASCII
string is a string over a character set of size 128. [Like character
strings,
On Sat, 2004-05-01 at 15:09, Jarkko Hietaniemi wrote:
How are you defining valid UTF-8? Is there a codepoint in UTF-8
between \x00 and \xff that isn't valid? Is there a reason to ever do
Like, half of them? \x80 .. \xff are all invalid as UTF-8.
Heh, damn Ken Thompson and his placemat!
I
On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote:
I think left and right shift of strings should work the same way that
shifts on ints works--that is, it doesn't grow, bits just fall off
the end. You can decide whether to sign-extend or 0-extend, either
one's OK.
Have we[1] finished working
At 9:07 AM -0400 4/30/04, Bryan C. Warnock wrote:
On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote:
I think left and right shift of strings should work the same way that
shifts on ints works--that is, it doesn't grow, bits just fall off
the end. You can decide whether to sign-extend or 0-extend,
If I may interject for a moment:
-Original Message-
From: Bryan C. Warnock [mailto:[EMAIL PROTECTED]
Sent: Friday, April 30, 2004 9:08 AM
To: Dan Sugalski
Cc: Perl6 Internals List
Subject: Re: Bit ops on strings
On Thu, 2004-04-29 at 13:04, Dan Sugalski wrote:
I think left and right
On Fri, 2004-04-30 at 10:42, Dan Sugalski wrote:
Bitstring operations ought only be valid on binary data, though,
unless someone can give me a good reason why we ought to allow
bitshifting on Unicode. (And then give me a reasoned argument *how*,
too)
100% agree. If you want to play games
On Fri, 2004-04-30 at 09:47, Butler, Gerald wrote:
If I may interject for a moment:
Let me start by saying that I have not drunk the Unicode cool-aid. I'm
not at all certain that the overhead required to do all of what Parrot
wants to do is warranted, BUT that's beside the point.
Parrot is
On Fri, 2004-04-30 at 12:18, Butler, Gerald wrote:
Now, we
have people talking about doing LSL/LSR on Strings. That is 100%
inconsistent with that definition of a String.
Not at all, and keep in mind that I didn't propose this out of the blue.
bands, bxors and bors are existing string ops and
On 30/04/2004, at 11:47 PM, Butler, Gerald wrote:
1. String - low-level, abstract, base class (or in Perl6 terms role --
I think) which represents a logically contiguous series of Parrot Int
2. BinaryString - inherits from String, represents a logically
contiguous series of
At 2:57 AM +1000 5/1/04, Andre Pang wrote:
Of course Parrot should have a function to reinterpret something of
a string type as raw binary data and vice versa, but don't mix
binary data with strings: they are completely different types, and
raw binary data should never be able to be put into a
-Original Message-
From: Aaron Sherman [mailto:[EMAIL PROTECTED]
Sent: Friday, April 30, 2004 11:58 AM
To: Butler, Gerald
Cc: Perl6 Internals List
Subject: RE: Bit ops on strings
On Fri, 2004-04-30 at 09:47, Butler, Gerald wrote:
If I may interject for a moment:
Let me start
At 12:18 PM -0400 4/30/04, Butler, Gerald wrote:
A string is what Dan described in his various postings on strings. Nuff
said.
Gerald Butler responds:
Yes, I know a String is what Dan described. He described a thingy
made up of 32-bit Values where each value represented a Code-Point. Now, we
At 2:58 PM -0400 4/30/04, Bryan C. Warnock wrote:
On Fri, 2004-04-30 at 13:53, Dan Sugalski wrote:
Parrot, at the very low levels, makes no distinction between strings
and buffers--as far as it's concerned they're the same thing, and
either can hang off an S register. (Ultimately, when *I* talk
At 4:15 PM -0400 4/30/04, Bryan C. Warnock wrote:
On Fri, 2004-04-30 at 15:34, Dan Sugalski wrote:
If you want, you could think of the S-register strings as mini-PMCs.
The encoding and charset stuff (we'll ignore language semantics for
the moment) are essentially small vtables that hang off the
Dan Sugalski [EMAIL PROTECTED] wrote:
If you want, you could think of the S-register strings as mini-PMCs.
The encoding and charset stuff (we'll ignore language semantics for
the moment) are essentially small vtables that hang off the string,
I think its the cleanest way of implementing all
On Apr 30, 2004, at 10:22 AM, Dan Sugalski wrote:
At 2:57 AM +1000 5/1/04, Andre Pang wrote:
Of course Parrot should have a function to reinterpret something of a
string type as raw binary data and vice versa, but don't mix binary
data with strings: they are completely different types, and raw
At 7:07 PM -0700 4/30/04, Jeff Clites wrote:
On Apr 30, 2004, at 10:22 AM, Dan Sugalski wrote:
At 2:57 AM +1000 5/1/04, Andre Pang wrote:
Of course Parrot should have a function to reinterpret something
of a string type as raw binary data and vice versa, but don't mix
binary data with strings:
bit.ops defines some ops on strings, and not others. I was wondering if
anyone thinks the following would be useful (I'm offering to write them,
as it won't be much work):
lsls(inout STR, in INT)
lsrs(inout STR, in INT)
and, of course, their appropriate permutations.
For those
At 11:49 AM -0400 4/29/04, Aaron Sherman wrote:
bit.ops defines some ops on strings, and not others. I was wondering if
anyone thinks the following would be useful (I'm offering to write them,
as it won't be much work):
lsls(inout STR, in INT)
lsrs(inout STR, in INT)
and, of
38 matches
Mail list logo