mplest form of this is (?(1)yes|no). This is rather harder to
emulate with other mechanisms without running to eval. OTTOMH it is
equivalent to (??{ defined($1) ? 'yes' : 'no' }).
Hugo
posal needs a MIGRATION section to
add scalar() around existing uses in a list context. Such use is
quite common in parsing contexts; C::Scan does this for example:
push @out, pos $txt;
Hugo
ngent limits on this.
I'm sorry I was unable to give an updated summary this week; if anyone
was waiting to discover specific information from it, please ask me.
Hugo
---
RFC 72: The regexp engine should go backward as well as forward. (Peter Heslin)
RFC 112: Assignment within a regex (Richar
y it feels uncommonly liberating to me.
:You may have to completely rewrite your script. So much for code reuse.
I don't believe that it need be so painful to take advantage of it
in existing code. We can ease that by providing a selection of
helpful ready-rolled routines for common tasks.
Hugo
ou want to remove support for (?(...)) completely,
you need to address the utility and options for migration for all
the available uses of it, not just the one addressed by the new
handling of (?{...}).
Hugo
In <[EMAIL PROTECTED]>, Bart Lateur writes:
:On Fri, 29 Sep 2000 13:19:47 +0100, Hugo wrote:
:
:>I think that involves
:>rewriting your /p example something like:
:> if (/^$pat$/z) {
:>print "found a complete match";
:> } elsif (defined pos) {
:>print
er
was the last value set. Please can you make sure this is clearly
explained in the next version of the RFC?
Hugo
In <[EMAIL PROTECTED]>, Bart Lateur writes:
:On Fri, 29 Sep 2000 00:29:31 +0100, Hugo wrote:
:>:I originally had thought of providing a separate, dedicated regex
:>:modifier, just for the match prefix, but I don't think too many people
:>:need this that desperately. Yo
is identical to $$ by
:definition.
Well, not quite. First, writing $var as ${var} is the usual and common
way to disambiguate where there is a problem; second, $$ is rarely
used in a regexp pattern. We can easily migrate perl5 scripts by
translating $$ to ${$} throughout. There is no problem here.
Hugo
t once
a month I find myself unsure enough about which is /m and which
is /s that I need to check the top of perlre to be sure. I think
we've appreciated for some time that it was a mistake to name them
as if they were opposites, but if anything I'd like to reduce the
need for them rather than to increase it.
Hugo
it has been stated before that (?Q is reserved along with
other letters for possible regexp flags.
Hugo
om within the regex compiler so that the
:regex can expand as and where appropriate. Changing this should not affect
:any existing behaviour.
That may not be necessary for this case; it may be enough just to tweak
the parser slightly, to detect '(?$' (and maybe '(?\$'). Don't forget
that the parser already successfully skips past '$' when we need it to.
Hugo
e have a nasty
hack, it is true, but it could allow us to defer the much trickier
proper solution. Of course it breaks every other use of the string
value, and I'm not sure how big a problem that might be.
Hugo
're in over your head,
:anyway. ;-)
I don't understand this paragraph.
Hugo
to? Yes: you can always
have a (?{ local $a = new Object }) with a DESTROY method. It may not
necessarily be the cleanest possible way to write everything, though.
Hugo
ng that this changes the definition of every . in their
regexp as well. I like the idea of $$ better - this is a natural
and obvious extension to $, which adds a new capability without
messing with any existing capability. Furthermore people who find
that they have a problem in their existing regexp because $ does
not mean what they thought will not set themselves up for new and
different problems when they apply the obvious one-byte fix.
Hugo
e extension that would allow us to refer back to either variables
(RFC 112) or hash keys (RFC 150). I don't think switching to $1 is any
help for those, though.
Hugo
it will
become easier to read. And comments are a wonderful thing.
Hugo
on a forward match but is executed when the
:code is unwound due to backtracking.
The support in (?{...}) for localisation is (as I understand it) the
intended mechanism for permitting such effects. Can you describe some
specific problems you are trying to solve here?
Hugo
In <[EMAIL PROTECTED]>, Bart Lateur writes:
:On 25 Sep 2000 20:14:52 -, Perl6 RFC Librarian wrote:
:
:>Remove C, C and friends.
:
:I'm putting the finishing touches on an RFC to drop (?{...}) and replace
:it with something far more localized, hence cleaner: assertions, also in
:Perl code. That
quite unreasonable. Why do you _want_ to restrict someone
from calling isKeyword($1) within the regexp, which will then read
the keyword patterns from a file and check $1 against those patterns
using regexps? It seems like an entirely reasonable and useful thing
to do.
Hugo
In <[EMAIL PROTECTED]>, Perl6 RFC Librarian writes:
:=head1 ABSTRACT
:
:Remove C, C and friends.
Whoops, I missed this bit - what 'friends' do you mean?
Hugo
way of expressing simple
recursive regexps such as the above without resorting to a costly
eval. When I've tried to come up with an appropriate restriction,
however, I find it very difficult to pick a dividing line.
Hugo
r I suggest the rest of the
semantics warrant - that backreferences are localised within a qr().
I lie: the other reason qr{} currently doesn't behave like that is that
when we interpolate a compiled regexp into a context that requires it be
recompiled, we currently ignore the compiled form and act only on the
original string. Perhaps this is also an insufficiently intelligent thing
to do.
Hugo
nding RFCs shortly to
encourage them to work towards freezing them as soon as practical.
Hugo
RFC 72: The regexp engine should go backward as well as
forward. (Peter Heslin)
Peter says (edited):
:If the regexp code is unlikely to be rewritten from the ground up, then
:there may be little
input
medium, then we could turn on the special meaning of $ only in such
cases and define it as $$ above in all other cases. I think this
would be more confusing, though.
We could also consider changing the base definition to (?=($/)?\z),
particularly if $/ is to be seen as a regexp.
I think I like $$ the best.
Hugo
mike mulligan writes:
:From: Hugo <[EMAIL PROTECTED]>
:Sent: Tuesday, September 12, 2000 2:54 PM
:
:> 3. The regexp is matched left to right: first the lookbehind, then 'X',
:> then '[yz]'.
:
:Thanks for the insight - I was stuck in my bad assumption that the
t be resolved in this way; I thought
it might be of interest nonetheless.
Hugo
m where we are now. In reality,
the regexp engine needs a lot of work just to catch up with the rest
of perl as it is now; by the time that is done, the code will look
different enough that I'd probably be reactionary about completely
different things ...
Hugo
In <085601c01cc8$2c94f390$[EMAIL PROTECTED]>, "mike mulligan" w
rites:
:From: Hugo <[EMAIL PROTECTED]>
:Sent: Monday, September 11, 2000 11:59 PM
:
:
:> mike mulligan replied to Peter Heslin:
:> : ... it is greedy in the sense of the forward matching "*" or &
$re = qr{
(?{ $c++ ? 'foo' : '' })
|
(??{ $re }) (??{ $re })
}x;
And no, I have no idea what strings that will match. That's what makes
this job so much fun. :)
Hugo
Peter Heslin writes:
:On Wed, Aug 30, 2000 at 11:54:29PM -0400, Mark-Jason Dominus wrote:
:> Perhaps Hugo van der Sanden
:> would be willing to discuss this with you in more detail?
:
:I am not acquainted with the gentleman you name. Please do solicit
:the input of others you know who mi
and everything to do with
left-to-rightness. The regexp engine does not look for x* except
in those positions where the lookbehind has already matched.
Hugo
ready get backreferences out of zero-width
assertions. Like the Tardis, it only looks zero-width from the
outside:
"test" =~ /(?<=(t))(.)/ and print $1, $2;
Hugo
unnecessary copy.
The other problem with this, of course, is that the compiler may not
yet have seen the $& we intend to use:
crypt% perl -wle '$_="foo"; /.*/; $_="bar"; print eval q{$&}'
bar
crypt%
.. and I think coredumps may be possible from this. (Hmm, perlbug
upcoming.)
Hugo
es it to recurse into the
second branch until you hit REG_INFTY or overflow the stack. Swap
second and third branches and you have a better chance:
$re = qr{
\( (??{$re}) \)
|
(?> [^()]+)
|
(??{$re}) (??{$re})
}x;
(I haven't checked that there aren't other problems with it, though.)
Hugo
ng multi-punctuation
signifiers, so it may be time to go for a new paradigm with more room
for expansion: (+keyword) or (*keyword) would seem to be candidates.
Hugo
unintuitive. I don't see an alternative definition that doesn't
lead to other (probably worse) problems; certainly there is room
for lots of confusion here.
One approach to reducing confusion would be to disallow mixing:
if you use (?%name) in a regexp, you may not also use normal paren
backrefs, nor refer back to them with numbers. That may be too
restrictive and kill the utility of the things in the first place,
of course.
Hugo
In <[EMAIL PROTECTED]>, Hugo writes:
:Apologies in advance for so rudely dumping this lot and _still_ not
:joining the list [...]
Ah, I've now discovered the archives, and seen that this list is not
so frighteningly busy as I had anticipated. Now joined.
Hugo
ific.
I've tried in particular to add a note about implementation issues
in each case.
Enjoy,
Hugo
---
RFC 72: Variable-length lookbehind: the regexp engine should also go backward.
==
This is an interesting idea. However, it is not obvious to me that
there is any practical differ
40 matches
Mail list logo