Fixed, thanks!
# New Ticket Created by Klaas-Jan Stol
# Please include the string: [perl #54704]
# in the subject line of all future correspondence about this issue.
# http://rt.perl.org/rt3/Ticket/Display.html?id=54704 >
on http://www.parrotcode.org/docs/ there is still a link to Parrot
strings. T
On Sat, Jan 26, 2002 at 10:42:17AM +, David Chan wrote:
> Which Parrot strings are supposed to be false in a boolean context?
Thinking more deeply about this, I guess it depends entirely on the
language, although we can provide string_bool as a sensible default.
> For instance, is &q
On Sat, 26 Jan 2002, David Chan wrote:
> Hi,
>
> Which Parrot strings are supposed to be false in a boolean context?
> For instance, is "\x{FF10}" (FULLWIDTH DIGIT ZERO) false?
>
> docs/strings.pod says[1] a string is false if it "consists of one
> digit chara
Hi,
Which Parrot strings are supposed to be false in a boolean context?
For instance, is "\x{FF10}" (FULLWIDTH DIGIT ZERO) false?
docs/strings.pod says[1] a string is false if it "consists of one
digit character whose numeric value (as decided by its character type)
is zero"
On Wed, Jan 23, 2002 at 06:06:00PM +0200, Kai Henningsen wrote:
> People do get confused sometimes.
I'm confused as to why this is still on p6i. Followups to alt.usage.german?
Thanks.
--
DESPAIR:
It's Always Darkest Just Before it Gets Pitch Black
[EMAIL PROTECTED] (Russ Allbery) wrote on 22.01.02 in
<[EMAIL PROTECTED]>:
> Kai Henningsen <[EMAIL PROTECTED]> writes:
>
> > A case that (in a slightly different context) recently came up on
> > alt.usage.german (I don't remember if this particular point was made,
> > but it belongs):
>
> > "b
On Monday 21 January 2002 17:11, Russ Allbery wrote:
> No, pretty much all of the time. There are differences between proper
> nouns and common nouns, but those are differences routinely quashed as a
> typesetting decision; if you write both proper nouns and common nouns in
> all caps as part of
Bryan C Warnock <[EMAIL PROTECTED]> writes:
> On Monday 21 January 2002 16:43, Russ Allbery wrote:
>> Changing the capitalization of C does not change the word.
> Er, most of the time.
No, pretty much all of the time. There are differences between proper
nouns and common nouns, but those are
AIL PROTECTED]]
Sent: Monday, January 21, 2002 04:10 PM
Cc: [EMAIL PROTECTED]
Subject: RE: on parrot strings
> > But e` and e are different letters man. And re`sume` and resume are
> > different words come to that. If the user wants something that'll
> > match 'em both the
On Monday 21 January 2002 16:43, Russ Allbery wrote:
> Changing the capitalization of C does not change the word.
Er, most of the time.
--
Bryan C. Warnock
[EMAIL PROTECTED]
Hong Zhang <[EMAIL PROTECTED]> writes:
> I disagree. The difference between 'e' and 'e`' is similar to 'c'
> and 'C'.
No, it's not.
In many languages, an accented character is a completely different letter.
It's alphabetized separately, it's pronounced differently, and there are
many words that
> > But e` and e are different letters man. And re`sume` and resume are
> > different words come to that. If the user wants something that'll
> > match 'em both then the pattern should surely be:
> >
> >/r[ee`]sum[ee`]/
>
> I disagree. The difference between 'e' and 'e`' is similar to 'c
From: Hong Zhang [mailto:[EMAIL PROTECTED]]
>
> > But e` and e are different letters man. And re`sume` and resume are
> > different words come to that. If the user wants something that'll
> > match 'em both then the pattern should surely be:
> >
> >/r[ee`]sum[ee`]/
>
> I disagree. The diffe
> Yes, that's somewhat problematic. Making up "a byte CEF" would be
> Wrong, though, because there is, by definition, no CCS to map, and
> we would be dangerously close to conflating in CES, too...
> ACR-CCS-CEF-CES. Read the character model. Understand the character
> model. Embrace the chara
> But e` and e are different letters man. And re`sume` and resume are
> different words come to that. If the user wants something that'll
> match 'em both then the pattern should surely be:
>
>/r[ee`]sum[ee`]/
I disagree. The difference between 'e' and 'e`' is similar to 'c'
and 'C'. The Uni
On Mon, Jan 21, 2002 at 05:09:06PM +, Dave Mitchell wrote:
> Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> > > In the good ol'days, one could usefully use regexes on 8-bit binary data,
> > > eg
> > >
> > > open G, 'myfile.gif' or die;
> > > read G, $buf, 8192 or die;
> > > if ($buf =~ /^GIF8
Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> > In the good ol'days, one could usefully use regexes on 8-bit binary data,
> > eg
> >
> > open G, 'myfile.gif' or die;
> > read G, $buf, 8192 or die;
> > if ($buf =~ /^GIF89a\x08\x02/) {
> > .
> >
> > where it was clear to everyone that we
On Mon, Jan 21, 2002 at 04:37:46PM +, Dave Mitchell wrote:
> Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> > There is no string type built out of native eight-bit bytes.
>
> In the good ol'days, one could usefully use regexes on 8-bit binary data,
> eg
>
> open G, 'myfile.gif' or die;
> rea
Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> There is no string type built out of native eight-bit bytes.
In the good ol'days, one could usefully use regexes on 8-bit binary data,
eg
open G, 'myfile.gif' or die;
read G, $buf, 8192 or die;
if ($buf =~ /^GIF89a\x08\x02/) {
.
where it wa
On Sat, Jan 19, 2002 at 07:08:25PM +0200, Jarkko Hietaniemi wrote:
> http://www.aw.com/catalog/academic/product/1,4096,0201700522,00.html
> The book looks really promising-- unfortunately it's not yet published.
Isn't this, uhm, http://www.concentric.net/~rtgillam/pubs/unibook/index.html ?
--
S
I belive IBM use inversion lists in thier ICU library for sets of
unicode characters.
Graham.
On Sat, Jan 19, 2002 at 07:08:25PM +0200, Jarkko Hietaniemi wrote:
> Honour where honour is due: I've got some questions about inversion
> lists. Where I saw them mentioned by that name were some draft
Honour where honour is due: I've got some questions about inversion
lists. Where I saw them mentioned by that name were some drafts of
this:
http://www.aw.com/catalog/academic/product/1,4096,0201700522,00.html
The book looks really promising-- unfortunately it's not yet published.
--
$jhi++;
Hong Zhang <[EMAIL PROTECTED]> writes:
>> > preprocessing. Another example, if I want to search for /resume/e,
>> > (equivalent matching), the regex engine can normalize the case, fully
>> > decompose input string, strip off any combining character, and do 8-bit
>>
>> Hmmm. The above sounds co
Anyone have any objection to adding a couple of calls to terminate
and/or return null terminated strings from Parrot strings for places
where an API expects a standard C string?
I'm not sure of the preferred way to handle this. It would be nice to
at least try to terminate the current s
On Fri, Jan 18, 2002 at 11:40:17PM +, Nicholas Clark wrote:
> On Fri, Jan 18, 2002 at 05:24:00PM +0200, Jarkko Hietaniemi wrote:
>
> > > As for character encodings, we're forcing everything to UTF-32 in
> > > regular expressions. No exceptions. If you use a string in a regex,
> > > it'll be
On Fri, Jan 18, 2002 at 05:24:00PM +0200, Jarkko Hietaniemi wrote:
> > As for character encodings, we're forcing everything to UTF-32 in
> > regular expressions. No exceptions. If you use a string in a regex,
> > it'll be transcoded. I honestly can't think of a better way to
> > guarantee effi
> > > We *do* want to have (with some notation)
> > > [[:digit:]\p{FunkyLooking}aeiou except 7], right?
> >
> > Of course. But that is all resolvable in regex compile time.
> > No expression tree needed.
>
> My point was that if inversion lists are insufficient for describing
> all the characte
On Sat, Jan 19, 2002 at 12:28:15AM +0200, Jarkko Hietaniemi wrote:
> On Fri, Jan 18, 2002 at 02:22:49PM -0800, Steve Fink wrote:
> > On Sat, Jan 19, 2002 at 12:11:06AM +0200, Jarkko Hietaniemi wrote:
> > > Complement of an inversion list is neat: insert 0 at the beginning
> > > (and append max+1),
On Fri, Jan 18, 2002 at 02:22:49PM -0800, Steve Fink wrote:
> On Sat, Jan 19, 2002 at 12:11:06AM +0200, Jarkko Hietaniemi wrote:
> > Complement of an inversion list is neat: insert 0 at the beginning
> > (and append max+1), unless there already is one, in which case delete
> > the 0 (and shift the
On Sat, Jan 19, 2002 at 12:11:06AM +0200, Jarkko Hietaniemi wrote:
> Complement of an inversion list is neat: insert 0 at the beginning
> (and append max+1), unless there already is one, in which case delete
> the 0 (and shift the list and delete the max+1). Again, O(N).
> (One could of course h
On Fri, Jan 18, 2002 at 01:40:26PM -0800, Steve Fink wrote:
> On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote:
> > ints, or 176 bytes. Searching for membership in an inversion list is
> > O(N log N) (binary search). "Encoding the whole range" is a non-issue
> > bordering on a jo
On Fri, Jan 18, 2002 at 01:40:26PM -0800, Steve Fink wrote:
> On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote:
> > ints, or 176 bytes. Searching for membership in an inversion list is
> > O(N log N) (binary search). "Encoding the whole range" is a non-issue
> > bordering on a jo
On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote:
> ints, or 176 bytes. Searching for membership in an inversion list is
> O(N log N) (binary search). "Encoding the whole range" is a non-issue
> bordering on a joke: two ints, or 8 bytes.
[Clarification from a noncombatant] You m
On Fri, Jan 18, 2002 at 12:20:53PM -0800, Hong Zhang wrote:
> > > My proposal is we should use mix method. The Unicode standard class,
> > > such as \p{IsLu}, can be handled by a standard splitbin table. Please
> > > see Java java.lang.Character or Python unicodedata_db.h. I did
> > > measurement
> > My proposal is we should use mix method. The Unicode standard class,
> > such as \p{IsLu}, can be handled by a standard splitbin table. Please
> > see Java java.lang.Character or Python unicodedata_db.h. I did
> > measurement on it, to handle all unicode category, simple casing,
> > and decim
> > preprocessing. Another example, if I want to search for /resume/e,
> > (equivalent matching), the regex engine can normalize the case, fully
> > decompose input string, strip off any combining character, and do 8-bit
>
> Hmmm. The above sounds complicated not quite what I had in mind
> for
On Fri, Jan 18, 2002 at 11:44:00AM -0800, Hong Zhang wrote:
> > (1) There are 5.125 bytes in Unicode, not four.
> > (2) I think the above would suffer from the same problem as one common
> > suggestion, two-level bitmaps (though I think the above would suffer
> > less, being of finer granu
> I don't think UTF-32 will save you much. The unicode case map is variable
> length, combining character, canonical equivalence, and many other thing
> will require variable length mapping. For example, if I only want to
This is true.
> parse /[0-9]+/, why you want to convert everything to UTF-
> (1) There are 5.125 bytes in Unicode, not four.
> (2) I think the above would suffer from the same problem as one common
> suggestion, two-level bitmaps (though I think the above would suffer
> less, being of finer granularity): the problem is that a lot of
> space is wasted, since t
> Since I seem to be the main regex hacker for Parrot, I'll respond to
> this as best I can.
>
> Currently, we are using bitmaps for character classes. Well, sort of.
> A Bitmap in Parrot is defined like this:
>
> typedef struct bitmap_t {
> char* bmp;
>
On Fri, Jan 18, 2002 at 04:51:07AM -0500, Bryan C. Warnock wrote:
> Thanks, Jarrko.
>
> On Thursday 17 January 2002 23:21, Jarkko Hietaniemi wrote:
> > The most important message is that give up on 8-bit bytes, already.
> > Time to move on, chop chop.
>
> Do you think/feel/wish/demand that the t
Thanks, Jarrko.
On Thursday 17 January 2002 23:21, Jarkko Hietaniemi wrote:
> The most important message is that give up on 8-bit bytes, already.
> Time to move on, chop chop.
Do you think/feel/wish/demand that the textual (string) APIs should differ
from the binary (byte) APIs? (Both from an
Jarkko Hietaniemi:
About the implementation of character classes: since the Unicode code
point range is big, a single big bitmap won't work any more: firstly,
it would be big. Secondly, for most cases, it would be wastefully
sparse. A balanced binary tree of (begin, end) points of ranges is
sug
/reports/tr10/
Similarly to normalization, any string operations should primarily
update the C of a C and invalidate the
C by first releasing the storage for the sort key and
then setting the C pointer in the primary string to
a NULL pointer. When the collated version of the string is called
f
45 matches
Mail list logo