[perl #54704] [TODO] remove link to non-existing document parrot strings on site

2008-05-23 Thread Will Coleda via RT
Fixed, thanks!

[perl #54704] [TODO] remove link to non-existing document parrot strings on site

2008-05-23 Thread via RT
# New Ticket Created by Klaas-Jan Stol # Please include the string: [perl #54704] # in the subject line of all future correspondence about this issue. # http://rt.perl.org/rt3/Ticket/Display.html?id=54704 > on http://www.parrotcode.org/docs/ there is still a link to Parrot strings. T

Re: Parrot strings: are strings like "\x{FF10}" false?

2002-01-26 Thread Simon Cozens
On Sat, Jan 26, 2002 at 10:42:17AM +, David Chan wrote: > Which Parrot strings are supposed to be false in a boolean context? Thinking more deeply about this, I guess it depends entirely on the language, although we can provide string_bool as a sensible default. > For instance, is &q

Re: Parrot strings: are strings like "\x{FF10}" false?

2002-01-26 Thread Alex Gough
On Sat, 26 Jan 2002, David Chan wrote: > Hi, > > Which Parrot strings are supposed to be false in a boolean context? > For instance, is "\x{FF10}" (FULLWIDTH DIGIT ZERO) false? > > docs/strings.pod says[1] a string is false if it "consists of one > digit chara

Parrot strings: are strings like "\x{FF10}" false?

2002-01-26 Thread David Chan
Hi, Which Parrot strings are supposed to be false in a boolean context? For instance, is "\x{FF10}" (FULLWIDTH DIGIT ZERO) false? docs/strings.pod says[1] a string is false if it "consists of one digit character whose numeric value (as decided by its character type) is zero"

Re: on parrot strings

2002-01-23 Thread Simon Cozens
On Wed, Jan 23, 2002 at 06:06:00PM +0200, Kai Henningsen wrote: > People do get confused sometimes. I'm confused as to why this is still on p6i. Followups to alt.usage.german? Thanks. -- DESPAIR: It's Always Darkest Just Before it Gets Pitch Black

Re: on parrot strings

2002-01-23 Thread Kai Henningsen
[EMAIL PROTECTED] (Russ Allbery) wrote on 22.01.02 in <[EMAIL PROTECTED]>: > Kai Henningsen <[EMAIL PROTECTED]> writes: > > > A case that (in a slightly different context) recently came up on > > alt.usage.german (I don't remember if this particular point was made, > > but it belongs): > > > "b

Re: on parrot strings

2002-01-21 Thread Bryan C. Warnock
On Monday 21 January 2002 17:11, Russ Allbery wrote: > No, pretty much all of the time. There are differences between proper > nouns and common nouns, but those are differences routinely quashed as a > typesetting decision; if you write both proper nouns and common nouns in > all caps as part of

Re: on parrot strings

2002-01-21 Thread Russ Allbery
Bryan C Warnock <[EMAIL PROTECTED]> writes: > On Monday 21 January 2002 16:43, Russ Allbery wrote: >> Changing the capitalization of C does not change the word. > Er, most of the time. No, pretty much all of the time. There are differences between proper nouns and common nouns, but those are

RE: on parrot strings

2002-01-21 Thread Stephen Howard
AIL PROTECTED]] Sent: Monday, January 21, 2002 04:10 PM Cc: [EMAIL PROTECTED] Subject: RE: on parrot strings > > But e` and e are different letters man. And re`sume` and resume are > > different words come to that. If the user wants something that'll > > match 'em both the

Re: on parrot strings

2002-01-21 Thread Bryan C. Warnock
On Monday 21 January 2002 16:43, Russ Allbery wrote: > Changing the capitalization of C does not change the word. Er, most of the time. -- Bryan C. Warnock [EMAIL PROTECTED]

Re: on parrot strings

2002-01-21 Thread Russ Allbery
Hong Zhang <[EMAIL PROTECTED]> writes: > I disagree. The difference between 'e' and 'e`' is similar to 'c' > and 'C'. No, it's not. In many languages, an accented character is a completely different letter. It's alphabetized separately, it's pronounced differently, and there are many words that

RE: on parrot strings

2002-01-21 Thread Hong Zhang
> > But e` and e are different letters man. And re`sume` and resume are > > different words come to that. If the user wants something that'll > > match 'em both then the pattern should surely be: > > > >/r[ee`]sum[ee`]/ > > I disagree. The difference between 'e' and 'e`' is similar to 'c

RE: on parrot strings

2002-01-21 Thread Garrett Goebel
From: Hong Zhang [mailto:[EMAIL PROTECTED]] > > > But e` and e are different letters man. And re`sume` and resume are > > different words come to that. If the user wants something that'll > > match 'em both then the pattern should surely be: > > > >/r[ee`]sum[ee`]/ > > I disagree. The diffe

RE: on parrot strings

2002-01-21 Thread Hong Zhang
> Yes, that's somewhat problematic. Making up "a byte CEF" would be > Wrong, though, because there is, by definition, no CCS to map, and > we would be dangerously close to conflating in CES, too... > ACR-CCS-CEF-CES. Read the character model. Understand the character > model. Embrace the chara

RE: on parrot strings

2002-01-21 Thread Hong Zhang
> But e` and e are different letters man. And re`sume` and resume are > different words come to that. If the user wants something that'll > match 'em both then the pattern should surely be: > >/r[ee`]sum[ee`]/ I disagree. The difference between 'e' and 'e`' is similar to 'c' and 'C'. The Uni

Re: on parrot strings

2002-01-21 Thread Jarkko Hietaniemi
On Mon, Jan 21, 2002 at 05:09:06PM +, Dave Mitchell wrote: > Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote: > > > In the good ol'days, one could usefully use regexes on 8-bit binary data, > > > eg > > > > > > open G, 'myfile.gif' or die; > > > read G, $buf, 8192 or die; > > > if ($buf =~ /^GIF8

Re: on parrot strings

2002-01-21 Thread Dave Mitchell
Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote: > > In the good ol'days, one could usefully use regexes on 8-bit binary data, > > eg > > > > open G, 'myfile.gif' or die; > > read G, $buf, 8192 or die; > > if ($buf =~ /^GIF89a\x08\x02/) { > > . > > > > where it was clear to everyone that we

Re: on parrot strings

2002-01-21 Thread Jarkko Hietaniemi
On Mon, Jan 21, 2002 at 04:37:46PM +, Dave Mitchell wrote: > Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote: > > There is no string type built out of native eight-bit bytes. > > In the good ol'days, one could usefully use regexes on 8-bit binary data, > eg > > open G, 'myfile.gif' or die; > rea

Re: on parrot strings

2002-01-21 Thread Dave Mitchell
Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote: > There is no string type built out of native eight-bit bytes. In the good ol'days, one could usefully use regexes on 8-bit binary data, eg open G, 'myfile.gif' or die; read G, $buf, 8192 or die; if ($buf =~ /^GIF89a\x08\x02/) { . where it wa

Re: on parrot strings

2002-01-19 Thread Simon Cozens
On Sat, Jan 19, 2002 at 07:08:25PM +0200, Jarkko Hietaniemi wrote: > http://www.aw.com/catalog/academic/product/1,4096,0201700522,00.html > The book looks really promising-- unfortunately it's not yet published. Isn't this, uhm, http://www.concentric.net/~rtgillam/pubs/unibook/index.html ? -- S

Re: on parrot strings

2002-01-19 Thread Graham Barr
I belive IBM use inversion lists in thier ICU library for sets of unicode characters. Graham. On Sat, Jan 19, 2002 at 07:08:25PM +0200, Jarkko Hietaniemi wrote: > Honour where honour is due: I've got some questions about inversion > lists. Where I saw them mentioned by that name were some draft

Re: on parrot strings

2002-01-19 Thread Jarkko Hietaniemi
Honour where honour is due: I've got some questions about inversion lists. Where I saw them mentioned by that name were some drafts of this: http://www.aw.com/catalog/academic/product/1,4096,0201700522,00.html The book looks really promising-- unfortunately it's not yet published. -- $jhi++;

Re: on parrot strings

2002-01-18 Thread Piers Cawley
Hong Zhang <[EMAIL PROTECTED]> writes: >> > preprocessing. Another example, if I want to search for /resume/e, >> > (equivalent matching), the regex engine can normalize the case, fully >> > decompose input string, strip off any combining character, and do 8-bit >> >> Hmmm. The above sounds co

Parrot strings

2002-01-18 Thread Melvin Smith
Anyone have any objection to adding a couple of calls to terminate and/or return null terminated strings from Parrot strings for places where an API expects a standard C string? I'm not sure of the preferred way to handle this. It would be nice to at least try to terminate the current s

Re: on parrot strings

2002-01-18 Thread Jarkko Hietaniemi
On Fri, Jan 18, 2002 at 11:40:17PM +, Nicholas Clark wrote: > On Fri, Jan 18, 2002 at 05:24:00PM +0200, Jarkko Hietaniemi wrote: > > > > As for character encodings, we're forcing everything to UTF-32 in > > > regular expressions. No exceptions. If you use a string in a regex, > > > it'll be

Re: on parrot strings

2002-01-18 Thread Nicholas Clark
On Fri, Jan 18, 2002 at 05:24:00PM +0200, Jarkko Hietaniemi wrote: > > As for character encodings, we're forcing everything to UTF-32 in > > regular expressions. No exceptions. If you use a string in a regex, > > it'll be transcoded. I honestly can't think of a better way to > > guarantee effi

Re: on parrot strings

2002-01-18 Thread Jarkko Hietaniemi
> > > We *do* want to have (with some notation) > > > [[:digit:]\p{FunkyLooking}aeiou except 7], right? > > > > Of course. But that is all resolvable in regex compile time. > > No expression tree needed. > > My point was that if inversion lists are insufficient for describing > all the characte

Re: on parrot strings

2002-01-18 Thread Steve Fink
On Sat, Jan 19, 2002 at 12:28:15AM +0200, Jarkko Hietaniemi wrote: > On Fri, Jan 18, 2002 at 02:22:49PM -0800, Steve Fink wrote: > > On Sat, Jan 19, 2002 at 12:11:06AM +0200, Jarkko Hietaniemi wrote: > > > Complement of an inversion list is neat: insert 0 at the beginning > > > (and append max+1),

Re: on parrot strings

2002-01-18 Thread Jarkko Hietaniemi
On Fri, Jan 18, 2002 at 02:22:49PM -0800, Steve Fink wrote: > On Sat, Jan 19, 2002 at 12:11:06AM +0200, Jarkko Hietaniemi wrote: > > Complement of an inversion list is neat: insert 0 at the beginning > > (and append max+1), unless there already is one, in which case delete > > the 0 (and shift the

Re: on parrot strings

2002-01-18 Thread Steve Fink
On Sat, Jan 19, 2002 at 12:11:06AM +0200, Jarkko Hietaniemi wrote: > Complement of an inversion list is neat: insert 0 at the beginning > (and append max+1), unless there already is one, in which case delete > the 0 (and shift the list and delete the max+1). Again, O(N). > (One could of course h

Re: on parrot strings

2002-01-18 Thread Jarkko Hietaniemi
On Fri, Jan 18, 2002 at 01:40:26PM -0800, Steve Fink wrote: > On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote: > > ints, or 176 bytes. Searching for membership in an inversion list is > > O(N log N) (binary search). "Encoding the whole range" is a non-issue > > bordering on a jo

Re: on parrot strings

2002-01-18 Thread Jarkko Hietaniemi
On Fri, Jan 18, 2002 at 01:40:26PM -0800, Steve Fink wrote: > On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote: > > ints, or 176 bytes. Searching for membership in an inversion list is > > O(N log N) (binary search). "Encoding the whole range" is a non-issue > > bordering on a jo

Re: on parrot strings

2002-01-18 Thread Steve Fink
On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote: > ints, or 176 bytes. Searching for membership in an inversion list is > O(N log N) (binary search). "Encoding the whole range" is a non-issue > bordering on a joke: two ints, or 8 bytes. [Clarification from a noncombatant] You m

Re: on parrot strings

2002-01-18 Thread Jarkko Hietaniemi
On Fri, Jan 18, 2002 at 12:20:53PM -0800, Hong Zhang wrote: > > > My proposal is we should use mix method. The Unicode standard class, > > > such as \p{IsLu}, can be handled by a standard splitbin table. Please > > > see Java java.lang.Character or Python unicodedata_db.h. I did > > > measurement

RE: on parrot strings

2002-01-18 Thread Hong Zhang
> > My proposal is we should use mix method. The Unicode standard class, > > such as \p{IsLu}, can be handled by a standard splitbin table. Please > > see Java java.lang.Character or Python unicodedata_db.h. I did > > measurement on it, to handle all unicode category, simple casing, > > and decim

RE: on parrot strings

2002-01-18 Thread Hong Zhang
> > preprocessing. Another example, if I want to search for /resume/e, > > (equivalent matching), the regex engine can normalize the case, fully > > decompose input string, strip off any combining character, and do 8-bit > > Hmmm. The above sounds complicated not quite what I had in mind > for

Re: on parrot strings

2002-01-18 Thread Jarkko Hietaniemi
On Fri, Jan 18, 2002 at 11:44:00AM -0800, Hong Zhang wrote: > > (1) There are 5.125 bytes in Unicode, not four. > > (2) I think the above would suffer from the same problem as one common > > suggestion, two-level bitmaps (though I think the above would suffer > > less, being of finer granu

Re: on parrot strings

2002-01-18 Thread Jarkko Hietaniemi
> I don't think UTF-32 will save you much. The unicode case map is variable > length, combining character, canonical equivalence, and many other thing > will require variable length mapping. For example, if I only want to This is true. > parse /[0-9]+/, why you want to convert everything to UTF-

RE: on parrot strings

2002-01-18 Thread Hong Zhang
> (1) There are 5.125 bytes in Unicode, not four. > (2) I think the above would suffer from the same problem as one common > suggestion, two-level bitmaps (though I think the above would suffer > less, being of finer granularity): the problem is that a lot of > space is wasted, since t

Re: on parrot strings

2002-01-18 Thread Jarkko Hietaniemi
> Since I seem to be the main regex hacker for Parrot, I'll respond to > this as best I can. > > Currently, we are using bitmaps for character classes. Well, sort of. > A Bitmap in Parrot is defined like this: > > typedef struct bitmap_t { > char* bmp; >

Re: on parrot strings

2002-01-18 Thread Jarkko Hietaniemi
On Fri, Jan 18, 2002 at 04:51:07AM -0500, Bryan C. Warnock wrote: > Thanks, Jarrko. > > On Thursday 17 January 2002 23:21, Jarkko Hietaniemi wrote: > > The most important message is that give up on 8-bit bytes, already. > > Time to move on, chop chop. > > Do you think/feel/wish/demand that the t

Re: on parrot strings

2002-01-18 Thread Bryan C. Warnock
Thanks, Jarrko. On Thursday 17 January 2002 23:21, Jarkko Hietaniemi wrote: > The most important message is that give up on 8-bit bytes, already. > Time to move on, chop chop. Do you think/feel/wish/demand that the textual (string) APIs should differ from the binary (byte) APIs? (Both from an

RE: on parrot strings

2002-01-18 Thread Brent Dax
Jarkko Hietaniemi: About the implementation of character classes: since the Unicode code point range is big, a single big bitmap won't work any more: firstly, it would be big. Secondly, for most cases, it would be wastefully sparse. A balanced binary tree of (begin, end) points of ranges is sug

on parrot strings

2002-01-17 Thread Jarkko Hietaniemi
/reports/tr10/ Similarly to normalization, any string operations should primarily update the C of a C and invalidate the C by first releasing the storage for the sort key and then setting the C pointer in the primary string to a NULL pointer. When the collated version of the string is called f