In article [EMAIL PROTECTED],
[EMAIL PROTECTED] (Larry Wall) wrote:
On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote:
: :u0 # use bytes (. is byte)
: :u1 # level 1 support (. is codepoint)
: :u2 # level 1 support (. is
-Original Message-
From: Jonadab the Unsightly One [mailto:[EMAIL PROTECTED]
Austin Hastings [EMAIL PROTECTED] writes:
I think this is something that we'll want as a mode, a la
case-insensitivity. Think of it as mark insensitivity.
Makes sense to me, but...
Maybe it can just
Luke Palmer [EMAIL PROTECTED] writes:
Or, god forbid, a word?
m:base/que mas/
We're not mathematicians: we're allowed to use more than one letter
in a row to designate something :-)
Well, if it were *me*, *I* would have voted for keeping the core
language 100% pure ASCII, untainted by
Austin Hastings [EMAIL PROTECTED] writes:
I think this is something that we'll want as a mode, a la
case-insensitivity. Think of it as mark insensitivity.
Makes sense to me, but...
Maybe it can just roll into :i?
It will probably get used in _conjunction_ with case-insensitivity
quite a
Jonadab the Unsightly One writes:
Austin Hastings [EMAIL PROTECTED] writes:
I think this is something that we'll want as a mode, a la
case-insensitivity. Think of it as mark insensitivity.
Makes sense to me, but...
Maybe it can just roll into :i?
It will probably get used in
--- Larry Wall [EMAIL PROTECTED] wrote:
On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote:
: Or was that to imply that a literal a in the RE would be
: interpretted as a grapheme a when :u2 is active?
I don't know what you mean by grapheme a there. If you mean, Does
it
On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote:
: On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote:
: This has no direct bearing on p6l, since performance is a p6i issue.
: But perhaps in the interests of performance as well as hackery we
: should explicitly
On Wed, Jul 07, 2004 at 08:09:51PM -0700, Larry Wall wrote:
: On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote:
: : On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote:
: : This has no direct bearing on p6l, since performance is a p6i issue.
: : But perhaps in the
Aaron Sherman wrote:
On Tue, 2004-06-29 at 11:34, Austin Hastings wrote:
(2) Perl6 should equitably support all its target
locales; (3) we should set out to make sure the performance is damn
fast no matter what locale we're using.
Well, that's a nice theory, but you can prove that low-level
On Tue, 2004-06-29 at 11:34, Austin Hastings wrote:
[...] when you switch to LC_ALL= pick your favorite
language, you just get really slow performance: Apparently the 'C'
locale is such a totally special case that the performance of LC_ALL=C
is one or more orders of magnitude better than
Larry Wall wrote:
On Sat, Jun 26, 2004 at 12:27:38PM -0700, Brent 'Dax' Royal-Gordon wrote:
: Issues:
: * Limits lvalue substr (doesn't allow it to be a different size)
: unless splice is used (or a substr method is also provided).
That all has to be looked at anyway. What does 5 mean when
Matt Diephouse skribis 2004-06-30 20:51 (-0400):
my $string = Hello, World!;
say $string[0..4]; # prints Hello\n
$string[7...] = Larry!;
say $string; # prints Hello, Larry!\n
And that array is one of bytes? graphemes?
In general, I like the idea. In [EMAIL PROTECTED], almost
the same was
Juerd wrote:
Matt Diephouse skribis 2004-06-30 20:51 (-0400):
my $string = Hello, World!;
say $string[0..4]; # prints Hello\n
$string[7...] = Larry!;
say $string; # prints Hello, Larry!\n
And that array is one of bytes? graphemes?
I'm not really up on my unicode, but I think .chars is what I have
On Thu, 1 Jul 2004, Juerd wrote:
Matt Diephouse skribis 2004-06-30 20:51 (-0400):
my $string = Hello, World!;
say $string[0..4]; # prints Hello\n
$string[7...] = Larry!;
say $string; # prints Hello, Larry!\n
And that array is one of bytes? graphemes?
In general, I like the idea.
Dan Sugalski [EMAIL PROTECTED] writes:
Hmm. Suppose that I have a system that is friendly to 80 byte
records. I want to output meaningful strings, so I want to
partition a buffer into 80-ish byte substrings, but preserve any
graphemes (i.e., store the data in a legible format).
How would I
Austin Hastings [EMAIL PROTECTED] writes:
A couple of alternatives:
substr.bytes($string, 2, 4) = $substitute;
Well, that's arguably better than bsubstr.
substr($string.bytes, 2, 4) = $substitute;
I could live with that, although it doesn't allow mixing units.
(Someone will pop in here
--- Jonadab the Unsightly One [EMAIL PROTECTED] wrote:
Have the implications of the bytes/codepoints/graphemes/woohickies
distinction for the regular expression engine been discussed already?
Not enough.
One of my current clients just rolled on to redhat 9, and what a
steaming pile of
Juerd [EMAIL PROTECTED] writes:
substr($string, 2 but graphemes, 4 but bytes);
I think but even makes sense, if substr defaults to something.
That could be combined with a smart substr that only needs the units
once (err, only needs a position object for one of the args) and knows
how to
On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote:
This has no direct bearing on p6l, since performance is a p6i issue.
But perhaps in the interests of performance as well as hackery we
should explicitly provide some sort of variant regex behavior:
/a./ :bytes
/a./
Larry Wall [EMAIL PROTECTED] writes:
That all has to be looked at anyway. What does 5 mean when you
pass it to substr, anyway?
I was just going to ask about substrings, and then didn't because I
figured that had been hashed out already and I'd missed it...
(I've been trying to make it
Jonadab The Unsightly One [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
It would be possible to have right-associative operators (that bind at
least more tightly than comma and possibly very tightly) and convert a
number to one of these objects, so that we can do stuff like this:
On Mon, 28 Jun 2004, Larry Wall wrote:
On Mon, Jun 28, 2004 at 11:26:32AM -0400, Jonadab the Unsightly One wrote:
: You could coin the abbreviation ligs, for Language Independent
: Graphemes. Then some ingenious rascal can create a pragma or whatever
: that allows $str.b, $str.c, $str.g, and
Dave Whipp skribis 2004-06-28 9:55 (-0700):
substr($string, 2 bytes, 4 bytes) = $substitute;
substr($string, 2, 4 :bytes)
substr($string, 2 but graphemes, 4 but bytes);
I think but even makes sense, if substr defaults to something.
Juerd
On Mon, 28 Jun 2004, Juerd wrote:
Dave Whipp skribis 2004-06-28 9:55 (-0700):
substr($string, 2 bytes, 4 bytes) = $substitute;
substr($string, 2, 4 :bytes)
substr($string, 2 but graphemes, 4 but bytes);
I think but even makes sense, if substr defaults to something.
I think mixing
--- Dan Sugalski [EMAIL PROTECTED] wrote:
On Mon, 28 Jun 2004, Juerd wrote:
Dave Whipp skribis 2004-06-28 9:55 (-0700):
substr($string, 2 bytes, 4 bytes) = $substitute;
substr($string, 2, 4 :bytes)
substr($string, 2 but graphemes, 4 but bytes);
I think but even makes sense, if
On Mon, 28 Jun 2004, Austin Hastings wrote:
--- Dan Sugalski [EMAIL PROTECTED] wrote:
On Mon, 28 Jun 2004, Juerd wrote:
Dave Whipp skribis 2004-06-28 9:55 (-0700):
substr($string, 2 bytes, 4 bytes) = $substitute;
substr($string, 2, 4 :bytes)
substr($string, 2 but
--- Jonadab the Unsightly One [EMAIL PROTECTED] wrote:
Larry Wall [EMAIL PROTECTED] writes:
(I've been trying to make it assume some implicit unit based on the
current lexical scope's Unicode level, but issues remain.) We have
magical string positions that have different numeric values
On Sat, Jun 26, 2004 at 12:27:38PM -0700, Brent 'Dax' Royal-Gordon wrote:
: As currently designed, the String::bytes, String::codepoints, and
: String::graphemes methods return the number of bytes, codepoints,
: and graphemes, respectively, in the string they were called on. I
: would like to
28 matches
Mail list logo