subject:"RE\: The .bytes\/.codepoints\/.graphemes methods"

Re: The .bytes/.codepoints/.graphemes methods

2004-07-13 Thread David Green

In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Larry Wall) wrote: On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote: : :u0 # use bytes (. is byte) : :u1 # level 1 support (. is codepoint) : :u2 # level 1 support (. is

RE: The .bytes/.codepoints/.graphemes methods

2004-07-12 Thread Austin Hastings

-Original Message- From: Jonadab the Unsightly One [mailto:[EMAIL PROTECTED] Austin Hastings [EMAIL PROTECTED] writes: I think this is something that we'll want as a mode, a la case-insensitivity. Think of it as mark insensitivity. Makes sense to me, but... Maybe it can just

Re: The .bytes/.codepoints/.graphemes methods

2004-07-12 Thread Jonadab the Unsightly One

Luke Palmer [EMAIL PROTECTED] writes: Or, god forbid, a word? m:base/que mas/ We're not mathematicians: we're allowed to use more than one letter in a row to designate something :-) Well, if it were *me*, *I* would have voted for keeping the core language 100% pure ASCII, untainted by

Re: The .bytes/.codepoints/.graphemes methods

2004-07-10 Thread Jonadab the Unsightly One

Austin Hastings [EMAIL PROTECTED] writes: I think this is something that we'll want as a mode, a la case-insensitivity. Think of it as mark insensitivity. Makes sense to me, but... Maybe it can just roll into :i? It will probably get used in _conjunction_ with case-insensitivity quite a

Re: The .bytes/.codepoints/.graphemes methods

2004-07-10 Thread Luke Palmer

Jonadab the Unsightly One writes: Austin Hastings [EMAIL PROTECTED] writes: I think this is something that we'll want as a mode, a la case-insensitivity. Think of it as mark insensitivity. Makes sense to me, but... Maybe it can just roll into :i? It will probably get used in

Re: The .bytes/.codepoints/.graphemes methods

2004-07-08 Thread Austin Hastings

--- Larry Wall [EMAIL PROTECTED] wrote: On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote: : Or was that to imply that a literal a in the RE would be : interpretted as a grapheme a when :u2 is active? I don't know what you mean by grapheme a there. If you mean, Does it

Re: The .bytes/.codepoints/.graphemes methods

2004-07-07 Thread Larry Wall

On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote: : On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote: : This has no direct bearing on p6l, since performance is a p6i issue. : But perhaps in the interests of performance as well as hackery we : should explicitly

Re: The .bytes/.codepoints/.graphemes methods

2004-07-07 Thread Larry Wall

On Wed, Jul 07, 2004 at 08:09:51PM -0700, Larry Wall wrote: : On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote: : : On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote: : : This has no direct bearing on p6l, since performance is a p6i issue. : : But perhaps in the

Re: The .bytes/.codepoints/.graphemes methods

2004-07-03 Thread Brent 'Dax' Royal-Gordon

Aaron Sherman wrote: On Tue, 2004-06-29 at 11:34, Austin Hastings wrote: (2) Perl6 should equitably support all its target locales; (3) we should set out to make sure the performance is damn fast no matter what locale we're using. Well, that's a nice theory, but you can prove that low-level

Re: The .bytes/.codepoints/.graphemes methods

2004-07-02 Thread Aaron Sherman

On Tue, 2004-06-29 at 11:34, Austin Hastings wrote: [...] when you switch to LC_ALL= pick your favorite language, you just get really slow performance: Apparently the 'C' locale is such a totally special case that the performance of LC_ALL=C is one or more orders of magnitude better than

Re: The .bytes/.codepoints/.graphemes methods

2004-07-01 Thread Matt Diephouse

Larry Wall wrote: On Sat, Jun 26, 2004 at 12:27:38PM -0700, Brent 'Dax' Royal-Gordon wrote: : Issues: : * Limits lvalue substr (doesn't allow it to be a different size) : unless splice is used (or a substr method is also provided). That all has to be looked at anyway. What does 5 mean when

Re: The .bytes/.codepoints/.graphemes methods

2004-07-01 Thread Juerd

Matt Diephouse skribis 2004-06-30 20:51 (-0400): my $string = Hello, World!; say $string[0..4]; # prints Hello\n $string[7...] = Larry!; say $string; # prints Hello, Larry!\n And that array is one of bytes? graphemes? In general, I like the idea. In [EMAIL PROTECTED], almost the same was

Re: The .bytes/.codepoints/.graphemes methods

2004-07-01 Thread Matt Diephouse

Juerd wrote: Matt Diephouse skribis 2004-06-30 20:51 (-0400): my $string = Hello, World!; say $string[0..4]; # prints Hello\n $string[7...] = Larry!; say $string; # prints Hello, Larry!\n And that array is one of bytes? graphemes? I'm not really up on my unicode, but I think .chars is what I have

Re: The .bytes/.codepoints/.graphemes methods

2004-07-01 Thread John Williams

On Thu, 1 Jul 2004, Juerd wrote: Matt Diephouse skribis 2004-06-30 20:51 (-0400): my $string = Hello, World!; say $string[0..4]; # prints Hello\n $string[7...] = Larry!; say $string; # prints Hello, Larry!\n And that array is one of bytes? graphemes? In general, I like the idea.

Re: The .bytes/.codepoints/.graphemes methods

2004-06-29 Thread Jonadab the Unsightly One

Dan Sugalski [EMAIL PROTECTED] writes: Hmm. Suppose that I have a system that is friendly to 80 byte records. I want to output meaningful strings, so I want to partition a buffer into 80-ish byte substrings, but preserve any graphemes (i.e., store the data in a legible format). How would I

Re: The .bytes/.codepoints/.graphemes methods

2004-06-29 Thread Jonadab the Unsightly One

Austin Hastings [EMAIL PROTECTED] writes: A couple of alternatives: substr.bytes($string, 2, 4) = $substitute; Well, that's arguably better than bsubstr. substr($string.bytes, 2, 4) = $substitute; I could live with that, although it doesn't allow mixing units. (Someone will pop in here

Re: The .bytes/.codepoints/.graphemes methods

2004-06-29 Thread Austin Hastings

--- Jonadab the Unsightly One [EMAIL PROTECTED] wrote: Have the implications of the bytes/codepoints/graphemes/woohickies distinction for the regular expression engine been discussed already? Not enough. One of my current clients just rolled on to redhat 9, and what a steaming pile of

Re: The .bytes/.codepoints/.graphemes methods

2004-06-29 Thread Jonadab the Unsightly One

Juerd [EMAIL PROTECTED] writes: substr($string, 2 but graphemes, 4 but bytes); I think but even makes sense, if substr defaults to something. That could be combined with a smart substr that only needs the units once (err, only needs a position object for one of the args) and knows how to

Re: The .bytes/.codepoints/.graphemes methods

2004-06-29 Thread Jonathan Scott Duff

On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote: This has no direct bearing on p6l, since performance is a p6i issue. But perhaps in the interests of performance as well as hackery we should explicitly provide some sort of variant regex behavior: /a./ :bytes /a./

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Jonadab the Unsightly One

Larry Wall [EMAIL PROTECTED] writes: That all has to be looked at anyway. What does 5 mean when you pass it to substr, anyway? I was just going to ask about substrings, and then didn't because I figured that had been hashed out already and I'd missed it... (I've been trying to make it

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Dave Whipp

Jonadab The Unsightly One [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] It would be possible to have right-associative operators (that bind at least more tightly than comma and possibly very tightly) and convert a number to one of these objects, so that we can do stuff like this:

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Dan Sugalski

On Mon, 28 Jun 2004, Larry Wall wrote: On Mon, Jun 28, 2004 at 11:26:32AM -0400, Jonadab the Unsightly One wrote: : You could coin the abbreviation ligs, for Language Independent : Graphemes. Then some ingenious rascal can create a pragma or whatever : that allows $str.b, $str.c, $str.g, and

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Juerd

Dave Whipp skribis 2004-06-28 9:55 (-0700): substr($string, 2 bytes, 4 bytes) = $substitute; substr($string, 2, 4 :bytes) substr($string, 2 but graphemes, 4 but bytes); I think but even makes sense, if substr defaults to something. Juerd

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Dan Sugalski

On Mon, 28 Jun 2004, Juerd wrote: Dave Whipp skribis 2004-06-28 9:55 (-0700): substr($string, 2 bytes, 4 bytes) = $substitute; substr($string, 2, 4 :bytes) substr($string, 2 but graphemes, 4 but bytes); I think but even makes sense, if substr defaults to something. I think mixing

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Austin Hastings

--- Dan Sugalski [EMAIL PROTECTED] wrote: On Mon, 28 Jun 2004, Juerd wrote: Dave Whipp skribis 2004-06-28 9:55 (-0700): substr($string, 2 bytes, 4 bytes) = $substitute; substr($string, 2, 4 :bytes) substr($string, 2 but graphemes, 4 but bytes); I think but even makes sense, if

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Dan Sugalski

On Mon, 28 Jun 2004, Austin Hastings wrote: --- Dan Sugalski [EMAIL PROTECTED] wrote: On Mon, 28 Jun 2004, Juerd wrote: Dave Whipp skribis 2004-06-28 9:55 (-0700): substr($string, 2 bytes, 4 bytes) = $substitute; substr($string, 2, 4 :bytes) substr($string, 2 but

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Austin Hastings

--- Jonadab the Unsightly One [EMAIL PROTECTED] wrote: Larry Wall [EMAIL PROTECTED] writes: (I've been trying to make it assume some implicit unit based on the current lexical scope's Unicode level, but issues remain.) We have magical string positions that have different numeric values

Re: The .bytes/.codepoints/.graphemes methods

2004-06-26 Thread Larry Wall

On Sat, Jun 26, 2004 at 12:27:38PM -0700, Brent 'Dax' Royal-Gordon wrote: : As currently designed, the String::bytes, String::codepoints, and : String::graphemes methods return the number of bytes, codepoints, : and graphemes, respectively, in the string they were called on. I : would like to

Re: The .bytes/.codepoints/.graphemes methods

RE: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

Re: The .bytes/.codepoints/.graphemes methods

28 matches

Site Navigation

Mail list logo

Footer information