In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/b97b7b693710c4cd463688d004cec42f02c42158?hp=a0dd190b98da5a187fe751068264c32007af3931>
- Log ----------------------------------------------------------------- commit b97b7b693710c4cd463688d004cec42f02c42158 Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 22:51:44 2012 -0700 Disable const repl optimisation for empty pattern s//$a/ cannot assume that the $a expression is going to return the same value at each iteration, because the last-used pattern may con- tain code blocks that clobber *a. M op.c M t/lib/warnings/9uninit M t/re/subst.t commit b29368b8882319244d301988a33bc3432075a9f7 Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 20:26:23 2012 -0700 defins.t: Suppress uninit warning and make the no-warnings test pass. M t/op/defins.t commit 6a6a13d6acb473fcb1725db25f34d300f5dd4776 Author: Brad Gilbert <b2gi...@gmail.com> Date: Tue Oct 9 14:24:38 2012 -0500 Move tests from t/op/while_readdir.t to t/op/defins.t It turns out that some of what t/op/while_readdir.t was testing was also tested by t/op/defins.t M MANIFEST M t/op/defins.t D t/op/while_readdir.t commit ef90d20ae1b70bb24316828c3249daee27459a7b Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 20:22:08 2012 -0700 Use const repl optimisation with s///e where possible In those cases where s///e contains a single variable or a sequence that is folded to a const op, we can do away with substcont. PMf_EVAL means that there was an /e. But we donât actually need to check that; instead we can just examine the op tree, which we have to do anyway. The op tree that s//$x/e and s//"constant"/e compile down to have a null (a do-block) containing a scope op (block with a single state- ment, as opposed to op_leave which represents multiple statements) containing a null followed by the constant or variable. M dist/B-Deparse/Deparse.pm M dist/B-Deparse/t/deparse.t M op.c M t/lib/warnings/9uninit commit fa4533d036f96ddb26c693c5b0c642b7cbc7d667 Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 19:55:20 2012 -0700 perl5180delta: B::Generate is fixed M Porting/perl5180delta.pod commit bb933b9ba3a34a454c83adeaf265bd1d4eb466a1 Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 14:38:31 2012 -0700 [perl #49190] Donât prematurely optimise s/foo/bar$baz/ $baz could be aliased to a package variable, so we do need to recon- catenate for every iteration of s///g. For s/// without /g, only one more op will be executed, so the speed difference is negligible. The only cases we can optimise in terms of skipping the evaluation of the ops on the rhs (by eliminating the substconst op) are s//constant/ and s//$single_variable/. Anything more complicated causes bugs. A recent commit made s/foo/$bar/g re-stringify $bar for each iteration (though without having to reevaluate the ops that return $bar). So we no longer have to special-case match vars at compile time. This means that s/foo/bar$baz/g will be slower (and less buggy), but s/foo/$1/g will be faster. This also caused an existing taint but in pp_subst to surface. If get-magic turns off taint on a replacement string, it should not be considered tainted. So the taint check on the replacement should come *after* the stringification. This applies to the constant replacement optimisation. pp_substcont was already doing this correctly. M ext/B/t/optree_misc.t M ext/B/t/walkoptree.t M op.c M pp_hot.c M t/re/subst.t commit 52c47e1631d20f2f6b5ebaf188f61d5470d887f3 Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 18:01:40 2012 -0700 Donât taint return value of s///e based on replacement According to the comments about how taint works above pp_subst in pp_hot.c, the return value of s/// should not be tainted based on the taintedness of the replacement. That makes sense, because the replacement does not affect how many iterations there were. (The return value is the number of iterations). It only applies, however, to the cases where the âconstant replace- mentâ optimisation applies. That means /e taints its return value: $ perl5.16.0 -MDevel::Peek -Te '$_ = "abcd"; $x = s//$^X/; Dump $x' SV = PVMG(0x822ff4) at 0x824dc0 REFCNT = 1 FLAGS = (pIOK) IV = 1 NV = 0 PV = 0 $ perl5.16.0 -MDevel::Peek -Te '$_ = "abcd"; $x = s//$^X/e; Dump $x' SV = PVMG(0x823010) at 0x824dc0 REFCNT = 1 FLAGS = (GMG,SMG,pIOK) IV = 1 NV = 0 PV = 0 MAGIC = 0x201940 MG_VIRTUAL = &PL_vtbl_taint MG_TYPE = PERL_MAGIC_taint(t) MG_LEN = 1 The number pushed on to the stack was becoming tainted due to the set- ting of PL_tainted. PL_tainted is assigned to and the return value explicitly tainted if appropriate shortly after the mPUSHi (which implies sv_setiv, which taints when PL_tainted is true), so setting PL_tainted to 0 just before the mPUSHi is safe. M pp_ctl.c M t/op/taint.t commit 266d9182a2c7501b6d5686442b9273ed85be5e6f Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 14:05:29 2012 -0700 Remove PMf_MAYBE_CONST It was added in ce862d02d but has never been used. M op.c M op.h commit 1754320d025df69f39aefec8568947369f4b13cb Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 02:03:35 2012 -0700 [perl #49190] Stringify repl repeatedly in s///g pm_runtime in op.c examines the rhs of s/// to see whether it is safe to execute that set of ops just once. If it sees a match var or an expression with side effects, it creates a pp_substcont op, which results in the rhs being executed multiple times. If the rhs seems constant enough, pp_subst does the substitution in a tight loop. This unfortunately causes s/a/$a/ to fail if *a has been aliased to *1. Furthermore, $REGMARK and $REGERROR did not count as match vars. pp_subst actually has two separate loops. One of them modifies the target in place. The other appends to a new scalar and then copies it back to the target. The first loop is used if it seems safe. This commit makes $REGMARK, $REGERROR and aliases to match vars work= when the replacement consists solely of the variable. It does this by setting PL_curpm before stringifying the replacement, so that $1 et al. see the right pattern. It also stringifies the variable for each iteration of the second loop, so that $1 and $REGMARK update. The first loop, which requires the rhs to be constant, is skipped if the regexp contains the special backtracking control verbs that mod- ify $REGMARK and $REGERROR. M pp_hot.c M t/re/pat_rt_report.t M t/re/subst.t commit 2ed8c61f1921fabddfeb2a099733b977cb66ae42 Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 08:37:44 2012 -0700 RXf_MODIFIES_VARS regcomp.c sets this new flag whenever regops that could modify $REGMARK or $REGERROR have been seen. pp_subst will use this to tell whether it should repeatedly stringify the replacement. M regcomp.c M regexp.h M regnodes.h commit cccd1425414e6518c1fc8b7bcaccfb119320c513 Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 09:27:18 2012 -0700 Define RXf_SPLIT and RXf_SKIPWHITE as 0 They are on longer used in core, and we need room for more flags. The only CPAN modules that use them check whether RXf_SPLIT is set (which no longer happens) before setting RXf_SKIPWHITE (which is ignored). M dump.c M regexp.h M regnodes.h commit 26a745235f6f2b641f21239df4e7bfefc93119ca Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 00:54:56 2012 -0700 pp_hot.c:pp_subst: add comment M pp_hot.c commit 64534138a19b24be2454457bdf70ffb0d983b248 Author: Father Chrysostomos <spr...@cpan.org> Date: Thu Oct 11 00:24:18 2012 -0700 Simplify the fix for bug #41530 We donât need to upgrade the target string and redo the pattern match if the replacement is in utf8. We can simply convert during concate- nation, using the more recently added SV_CATUTF8 and SV_CATBYTES flags to sv_catpvn_flags. This should make things faster, too, as sv_catpvn_flags does not need to allocate extra SVs or string buffers. This happened to trigger an existing COW bug, causing test failures. SvIsCOW and sv_force_normal_flags were being called on TARG before get-magic. So a magical scalar returning a COW could have that COW modified in place. I added a test for something I nearly broke. M pp_hot.c M sv.h M t/re/subst.t ----------------------------------------------------------------------- Summary of changes: MANIFEST | 1 - Porting/perl5180delta.pod | 4 - dist/B-Deparse/Deparse.pm | 12 ++-- dist/B-Deparse/t/deparse.t | 2 + dump.c | 4 - ext/B/t/optree_misc.t | 22 +++--- ext/B/t/walkoptree.t | 2 +- op.c | 75 +++++++----------- op.h | 3 - pp_ctl.c | 1 + pp_hot.c | 73 ++++++++---------- regcomp.c | 3 + regexp.h | 30 ++++---- regnodes.h | 6 +- sv.h | 2 + t/lib/warnings/9uninit | 13 +-- t/op/defins.t | 36 +++++++++- t/op/taint.t | 4 +- t/op/while_readdir.t | 180 -------------------------------------------- t/re/pat_rt_report.t | 6 +- t/re/subst.t | 35 ++++++++- 21 files changed, 186 insertions(+), 328 deletions(-) delete mode 100644 t/op/while_readdir.t diff --git a/MANIFEST b/MANIFEST index 8f8f4bd..60e4726 100644 --- a/MANIFEST +++ b/MANIFEST @@ -5408,7 +5408,6 @@ t/op/vec.t See if vectors work t/op/ver.t See if v-strings and the %v format flag work t/op/wantarray.t See if wantarray works t/op/warn.t See if warn works -t/op/while_readdir.t See if while(readdir) works t/op/write.t See if write works (formats work) t/op/yadayada.t See if ... works t/perl.supp Perl valgrind suppressions diff --git a/Porting/perl5180delta.pod b/Porting/perl5180delta.pod index 2605953..b0bc82a 100644 --- a/Porting/perl5180delta.pod +++ b/Porting/perl5180delta.pod @@ -341,10 +341,6 @@ soon: =item * -L<B::Generate> version 1.46 - -=item * - L<B::Hooks::Parser> version 0.11 =back diff --git a/dist/B-Deparse/Deparse.pm b/dist/B-Deparse/Deparse.pm index 2a768c0..07386d5 100644 --- a/dist/B-Deparse/Deparse.pm +++ b/dist/B-Deparse/Deparse.pm @@ -4726,19 +4726,19 @@ sub pp_subst { my $flags = ""; my $pmflags = $op->pmflags; if (null($op->pmreplroot)) { - $repl = $self->dq($kid); + $repl = $kid; $kid = $kid->sibling; } else { $repl = $op->pmreplroot->first; # skip substcont - while ($repl->name eq "entereval") { + } + while ($repl->name eq "entereval") { $repl = $repl->first; $flags .= "e"; - } - if ($pmflags & PMf_EVAL) { + } + if ($pmflags & PMf_EVAL) { $repl = $self->deparse($repl->first, 0); - } else { + } else { $repl = $self->dq($repl); - } } my $extended = ($pmflags & PMf_EXTENDED); if (null $kid) { diff --git a/dist/B-Deparse/t/deparse.t b/dist/B-Deparse/t/deparse.t index 3500d5b..d1c6cb0 100644 --- a/dist/B-Deparse/t/deparse.t +++ b/dist/B-Deparse/t/deparse.t @@ -322,6 +322,8 @@ my $foo = "Ab\x{100}\200\x{200}\237Cd\000Ef\x{1000}\cA\x{2000}\cZ"; #### # s///e s/x/'y';/e; +s/x/$a;/e; +s/x/complex_expression();/e; #### # block { my $x; } diff --git a/dump.c b/dump.c index 830ab4b..72f7c4e 100644 --- a/dump.c +++ b/dump.c @@ -672,8 +672,6 @@ S_pm_description(pTHX_ const PMOP *pm) if (RX_EXTFLAGS(regex) & RXf_CHECK_ALL) sv_catpv(desc, ",ALL"); } - if (RX_EXTFLAGS(regex) & RXf_SKIPWHITE) - sv_catpv(desc, ",SKIPWHITE"); } append_flags(desc, pmflags, pmflags_flags_names); @@ -1417,12 +1415,10 @@ const struct flag_to_name regexp_flags_names[] = { {RXf_USE_INTUIT_NOML, "USE_INTUIT_NOML,"}, {RXf_USE_INTUIT_ML, "USE_INTUIT_ML,"}, {RXf_INTUIT_TAIL, "INTUIT_TAIL,"}, - {RXf_SPLIT, "SPLIT,"}, {RXf_COPY_DONE, "COPY_DONE,"}, {RXf_TAINTED_SEEN, "TAINTED_SEEN,"}, {RXf_TAINTED, "TAINTED,"}, {RXf_START_ONLY, "START_ONLY,"}, - {RXf_SKIPWHITE, "SKIPWHITE,"}, {RXf_WHITE, "WHITE,"}, {RXf_NULL, "NULL,"}, }; diff --git a/ext/B/t/optree_misc.t b/ext/B/t/optree_misc.t index 7a52cc3..5b623f5 100644 --- a/ext/B/t/optree_misc.t +++ b/ext/B/t/optree_misc.t @@ -66,25 +66,23 @@ checkOptree ( name => 'PMOP children', code => sub { $foo =~ s/(a)/$1/ }, strip_open_hints => 1, expect => <<'EOT_EOT', expect_nt => <<'EONT_EONT'); -# 6 <1> leavesub[1 ref] K/REFC,1 ->(end) -# - <@> lineseq KP ->6 +# 5 <1> leavesub[1 ref] K/REFC,1 ->(end) +# - <@> lineseq KP ->5 # 1 <;> nextstate(main 1 -e:1) v:>,<,%,{ ->2 -# 3 </> subst(/"(a)"/ replstart->4) KS ->6 +# 4 </> subst(/"(a)"/) KS ->5 # - <1> ex-rv2sv sKRM/1 ->3 # 2 <#> gvsv[*foo] s ->3 -# 5 <|> substcont(other->3) sK/1 ->(end) -# - <1> ex-rv2sv sK/1 ->5 -# 4 <#> gvsv[*1] s ->5 +# - <1> ex-rv2sv sK/1 ->4 +# 3 <#> gvsv[*1] s ->4 EOT_EOT -# 6 <1> leavesub[1 ref] K/REFC,1 ->(end) -# - <@> lineseq KP ->6 +# 5 <1> leavesub[1 ref] K/REFC,1 ->(end) +# - <@> lineseq KP ->5 # 1 <;> nextstate(main 1 -e:1) v:>,<,%,{ ->2 -# 3 </> subst(/"(a)"/ replstart->4) KS ->6 +# 4 </> subst(/"(a)"/) KS ->5 # - <1> ex-rv2sv sKRM/1 ->3 # 2 <$> gvsv(*foo) s ->3 -# 5 <|> substcont(other->3) sK/1 ->(end) -# - <1> ex-rv2sv sK/1 ->5 -# 4 <$> gvsv(*1) s ->5 +# - <1> ex-rv2sv sK/1 ->4 +# 3 <$> gvsv(*1) s ->4 EONT_EONT } #skip diff --git a/ext/B/t/walkoptree.t b/ext/B/t/walkoptree.t index fbdc50f..3648835 100644 --- a/ext/B/t/walkoptree.t +++ b/ext/B/t/walkoptree.t @@ -33,7 +33,7 @@ sub B::OP::walkoptree_debug { my $victim = sub { # This gives us a substcont, which gets to the second recursive call # point (in the if statement in the XS code) - $_[0] =~ s/(a)/$1/; + $_[0] =~ s/(a)/ $1/; # PMOP_pmreplroot(cPMOPo) is NULL for this $_[0] =~ s/(b)//; # This gives an OP_PUSHRE diff --git a/op.c b/op.c index 65fb457..8fccf71 100644 --- a/op.c +++ b/op.c @@ -4752,65 +4752,48 @@ Perl_pmruntime(pTHX_ OP *o, OP *expr, bool isreg, I32 floor) } if (repl) { - OP *curop; + OP *curop = repl; + bool konst; if (pm->op_pmflags & PMf_EVAL) { - curop = NULL; if (CopLINE(PL_curcop) < (line_t)PL_parser->multi_end) CopLINE_set(PL_curcop, (line_t)PL_parser->multi_end); } - else if (repl->op_type == OP_CONST) - curop = repl; - else { - OP *lastop = NULL; - for (curop = LINKLIST(repl); curop!=repl; curop = LINKLIST(curop)) { - if (curop->op_type == OP_SCOPE - || curop->op_type == OP_LEAVE - || (PL_opargs[curop->op_type] & OA_DANGEROUS)) { - if (curop->op_type == OP_GV) { - GV * const gv = cGVOPx_gv(curop); - repl_has_vars = 1; - if (strchr("&`'123456789+-\016\022", *GvENAME(gv))) - break; - } - else if (curop->op_type == OP_RV2CV) - break; - else if (curop->op_type == OP_RV2SV || - curop->op_type == OP_RV2AV || - curop->op_type == OP_RV2HV || - curop->op_type == OP_RV2GV) { - if (lastop && lastop->op_type != OP_GV) /*funny deref?*/ - break; - } - else if (curop->op_type == OP_PADSV || - curop->op_type == OP_PADAV || - curop->op_type == OP_PADHV || - curop->op_type == OP_PADANY) - { - repl_has_vars = 1; - } - else if (curop->op_type == OP_PUSHRE) - NOOP; /* Okay here, dangerous in newASSIGNOP */ - else - break; - } - else if ((PL_opargs[curop->op_type] & OA_CLASS_MASK) - == OA_LOGOP) - break; - lastop = curop; - } - } - if (curop == repl + /* If we are looking at s//.../e with a single statement, get past + the implicit do{}. */ + if (curop->op_type == OP_NULL && curop->op_flags & OPf_KIDS + && cUNOPx(curop)->op_first->op_type == OP_SCOPE + && cUNOPx(curop)->op_first->op_flags & OPf_KIDS) { + OP *kid = cUNOPx(cUNOPx(curop)->op_first)->op_first; + if (kid->op_type == OP_NULL && kid->op_sibling + && !kid->op_sibling->op_sibling) + curop = kid->op_sibling; + } + if (curop->op_type == OP_CONST) + konst = TRUE; + else if (( (curop->op_type == OP_RV2SV || + curop->op_type == OP_RV2AV || + curop->op_type == OP_RV2HV || + curop->op_type == OP_RV2GV) + && cUNOPx(curop)->op_first + && cUNOPx(curop)->op_first->op_type == OP_GV ) + || curop->op_type == OP_PADSV + || curop->op_type == OP_PADAV + || curop->op_type == OP_PADHV + || curop->op_type == OP_PADANY) { + repl_has_vars = 1; + konst = TRUE; + } + else konst = FALSE; + if (konst && !(repl_has_vars && (!PM_GETRE(pm) + || !RX_PRELEN(PM_GETRE(pm)) || RX_EXTFLAGS(PM_GETRE(pm)) & RXf_EVAL_SEEN))) { pm->op_pmflags |= PMf_CONST; /* const for long enough */ op_prepend_elem(o->op_type, scalar(repl), o); } else { - if (curop == repl && !PM_GETRE(pm)) { /* Has variables. */ - pm->op_pmflags |= PMf_MAYBE_CONST; - } NewOp(1101, rcop, 1, LOGOP); rcop->op_type = OP_SUBSTCONT; rcop->op_ppaddr = PL_ppaddr[OP_SUBSTCONT]; diff --git a/op.h b/op.h index 3ddce78..399c02e 100644 --- a/op.h +++ b/op.h @@ -408,9 +408,6 @@ struct pmop { * OP_MATCH and OP_QR */ #define PMf_ONCE (1<<(PMf_BASE_SHIFT+1)) -/* replacement contains variables */ -#define PMf_MAYBE_CONST (1<<(PMf_BASE_SHIFT+2)) - /* PMf_ONCE has matched successfully. Not used under threading. */ #define PMf_USED (1<<(PMf_BASE_SHIFT+3)) diff --git a/pp_ctl.c b/pp_ctl.c index 19c2417..23847c4 100644 --- a/pp_ctl.c +++ b/pp_ctl.c @@ -250,6 +250,7 @@ PP(pp_substcont) SvUTF8_on(targ); SvPV_set(dstr, NULL); + PL_tainted = 0; mPUSHi(saviters - 1); (void)SvPOK_only_UTF8(targ); diff --git a/pp_hot.c b/pp_hot.c index 0ea4c66..7994992 100644 --- a/pp_hot.c +++ b/pp_hot.c @@ -2062,7 +2062,7 @@ PP(pp_subst) int force_on_match = 0; const I32 oldsave = PL_savestack_ix; STRLEN slen; - bool doutf8 = FALSE; + bool doutf8 = FALSE; /* whether replacement is in utf8 */ #ifdef PERL_OLD_COPY_ON_WRITE bool is_cow; #endif @@ -2081,6 +2081,7 @@ PP(pp_subst) EXTEND(SP,1); } + SvGETMAGIC(TARG); /* must come before cow check */ #ifdef PERL_OLD_COPY_ON_WRITE /* Awooga. Awooga. "bool" types that are actually char are dangerous, because they make integers such as 256 "false". */ @@ -2100,8 +2101,7 @@ PP(pp_subst) Perl_croak_no_modify(aTHX); PUTBACK; - s = SvPV_mutable(TARG, len); - setup_match: + s = SvPV_nomg(TARG, len); if (!SvPOKp(TARG) || SvTYPE(TARG) == SVt_PVGV || SvVOK(TARG)) force_on_match = 1; @@ -2168,29 +2168,10 @@ PP(pp_subst) RETURN; } + PL_curpm = pm; + /* known replacement string? */ if (dstr) { - if (SvTAINTED(dstr)) - rxtainted |= SUBST_TAINT_REPL; - - /* Upgrade the source if the replacement is utf8 but the source is not, - * but only if it matched; see - * http://www.nntp.perl.org/group/perl.perl5.porters/2010/04/msg158809.html - */ - if (DO_UTF8(dstr) && ! DO_UTF8(TARG)) { - char * const orig_pvx = SvPOKp(TARG) ? SvPVX(TARG) : NULL; - const STRLEN new_len = sv_utf8_upgrade_nomg(TARG); - - /* If the lengths are the same, the pattern contains only - * invariants, can keep going; otherwise, various internal markers - * could be off, so redo */ - if (new_len != len || orig_pvx != SvPVX(TARG)) { - /* Do this here, to avoid multiple FETCHes. */ - s = SvPV_nomg(TARG, len); - goto setup_match; - } - } - /* replacement needing upgrading? */ if (DO_UTF8(TARG) && !doutf8) { nsv = sv_newmortal(); @@ -2206,6 +2187,9 @@ PP(pp_subst) c = SvPV_const(dstr, clen); doutf8 = DO_UTF8(dstr); } + + if (SvTAINTED(dstr)) + rxtainted |= SUBST_TAINT_REPL; } else { c = NULL; @@ -2219,7 +2203,7 @@ PP(pp_subst) #endif && (I32)clen <= RX_MINLENRET(rx) && (once || !(r_flags & REXEC_COPY_STR)) - && !(RX_EXTFLAGS(rx) & RXf_LOOKBEHIND_SEEN) + && !(RX_EXTFLAGS(rx) & (RXf_LOOKBEHIND_SEEN|RXf_MODIFIES_VARS)) && (!doutf8 || SvUTF8(TARG)) && !(rpm->op_pmflags & PMf_NONDESTRUCT)) { @@ -2236,7 +2220,6 @@ PP(pp_subst) goto force_it; } d = s; - PL_curpm = pm; if (once) { if (RX_MATCH_TAINTED(rx)) /* run time pattern taint, eg locale */ rxtainted |= SUBST_TAINT_PAT; @@ -2306,6 +2289,8 @@ PP(pp_subst) } } else { + bool first; + SV *repl; if (force_on_match) { force_on_match = 0; if (rpm->op_pmflags & PMf_NONDESTRUCT) { @@ -2324,8 +2309,8 @@ PP(pp_subst) #endif if (RX_MATCH_TAINTED(rx)) /* run time pattern taint, eg locale */ rxtainted |= SUBST_TAINT_PAT; + repl = dstr; dstr = newSVpvn_flags(m, s-m, SVs_TEMP | (DO_UTF8(TARG) ? SVf_UTF8 : 0)); - PL_curpm = pm; if (!c) { PERL_CONTEXT *cx; SPAGAIN; @@ -2338,6 +2323,7 @@ PP(pp_subst) RETURNOP(cPMOP->op_pmreplrootu.op_pmreplroot); } r_flags |= REXEC_IGNOREPOS | REXEC_NOT_FIRST; + first = TRUE; do { if (iters++ > maxiters) DIE(aTHX_ "Substitution loop"); @@ -2352,21 +2338,30 @@ PP(pp_subst) strend = s + (strend - m); } m = RX_OFFS(rx)[0].start + orig; - if (doutf8 && !SvUTF8(dstr)) - sv_catpvn_nomg_utf8_upgrade(dstr, s, m - s, nsv); - else - sv_catpvn_nomg(dstr, s, m-s); + sv_catpvn_nomg_maybeutf8(dstr, s, m - s, DO_UTF8(TARG)); s = RX_OFFS(rx)[0].end + orig; - if (clen) - sv_catpvn_nomg(dstr, c, clen); + if (first) { + /* replacement already stringified */ + if (clen) + sv_catpvn_nomg_maybeutf8(dstr, c, clen, doutf8); + first = FALSE; + } + else { + if (PL_encoding) { + if (!nsv) nsv = sv_newmortal(); + sv_copypv(nsv, repl); + if (!DO_UTF8(nsv)) sv_recode_to_utf8(nsv, PL_encoding); + sv_catsv(dstr, nsv); + } + else sv_catsv(dstr, repl); + if (SvTAINTED(repl)) + rxtainted |= SUBST_TAINT_REPL; + } if (once) break; } while (CALLREGEXEC(rx, s, strend, orig, s == m, TARG, NULL, r_flags)); - if (doutf8 && !DO_UTF8(TARG)) - sv_catpvn_nomg_utf8_upgrade(dstr, s, strend - s, nsv); - else - sv_catpvn_nomg(dstr, s, strend - s); + sv_catpvn_nomg_maybeutf8(dstr, s, strend - s, DO_UTF8(TARG)); if (rpm->op_pmflags & PMf_NONDESTRUCT) { /* From here on down we're using the copy, and leaving the original @@ -2391,7 +2386,7 @@ PP(pp_subst) SvPV_set(TARG, SvPVX(dstr)); SvCUR_set(TARG, SvCUR(dstr)); SvLEN_set(TARG, SvLEN(dstr)); - doutf8 |= DO_UTF8(dstr); + SvFLAGS(TARG) |= SvUTF8(dstr); SvPV_set(dstr, NULL); SPAGAIN; @@ -2401,8 +2396,6 @@ PP(pp_subst) if (!(rpm->op_pmflags & PMf_NONDESTRUCT)) { (void)SvPOK_only_UTF8(TARG); - if (doutf8) - SvUTF8_on(TARG); } /* See "how taint works" above */ diff --git a/regcomp.c b/regcomp.c index 0cc711f..960d7c4 100644 --- a/regcomp.c +++ b/regcomp.c @@ -6360,7 +6360,10 @@ reStudy: if (RExC_seen & REG_SEEN_CANY) r->extflags |= RXf_CANY_SEEN; if (RExC_seen & REG_SEEN_VERBARG) + { r->intflags |= PREGf_VERBARG_SEEN; + r->extflags |= RXf_MODIFIES_VARS; + } if (RExC_seen & REG_SEEN_CUTGROUP) r->intflags |= PREGf_CUTGROUP_SEEN; if (pm_flags & PMf_USE_RE_EVAL) diff --git a/regexp.h b/regexp.h index c515667..f631db9 100644 --- a/regexp.h +++ b/regexp.h @@ -365,6 +365,19 @@ get_regex_charset_name(const U32 flags, STRLEN* const lenp) } } +/* + Two flags no longer used. + RXf_SPLIT used to be set in Perl_pmruntime if op_flags & OPf_SPECIAL, + i.e., split. It was used by the regex engine to check whether it should + set RXf_SKIPWHITE. Regexp plugins on CPAN also have done the same thing + historically, so we leave these flags defined. +*/ +#ifndef PERL_CORE +# define RXf_SPLIT 0 +# define RXf_SKIPWHITE 0 +#endif + + /* Anchor and GPOS related stuff */ #define RXf_ANCH_BOL (1<<(RXf_BASE_SHIFT+0)) #define RXf_ANCH_MBOL (1<<(RXf_BASE_SHIFT+1)) @@ -393,19 +406,10 @@ get_regex_charset_name(const U32 flags, STRLEN* const lenp) #define RXf_USE_INTUIT_NOML (1<<(RXf_BASE_SHIFT+12)) #define RXf_USE_INTUIT_ML (1<<(RXf_BASE_SHIFT+13)) #define RXf_INTUIT_TAIL (1<<(RXf_BASE_SHIFT+14)) - -/* - This used to be set in Perl_pmruntime if op_flags & OPf_SPECIAL, i.e. - split. It was used by the regex engine to check whether it should set - RXf_SKIPWHITE. Regexp plugins on CPAN also have done the same thing - historically, so we leave this flag defined, even though it is never set. -*/ -#if !defined(PERL_CORE) || defined(PERL_IN_DUMP_C) -# define RXf_SPLIT (1<<(RXf_BASE_SHIFT+15)) -#endif - #define RXf_USE_INTUIT (RXf_USE_INTUIT_NOML|RXf_USE_INTUIT_ML) +#define RXf_MODIFIES_VARS (1<<(RXf_BASE_SHIFT+15)) + /* Copy and tainted info */ #define RXf_COPY_DONE (1<<(RXf_BASE_SHIFT+16)) @@ -417,10 +421,6 @@ get_regex_charset_name(const U32 flags, STRLEN* const lenp) /* Flags indicating special patterns */ #define RXf_START_ONLY (1<<(RXf_BASE_SHIFT+19)) /* Pattern is /^/ */ -/* No longer used, but CPAN modules still mention it. */ -#if !defined(PERL_CORE) || defined(PERL_IN_DUMP_C) -# define RXf_SKIPWHITE (1<<(RXf_BASE_SHIFT+20)) /* Pattern is for a split " " */ -#endif #define RXf_WHITE (1<<(RXf_BASE_SHIFT+21)) /* Pattern is /\s+/ */ #define RXf_NULL (1U<<(RXf_BASE_SHIFT+22)) /* Pattern is // */ #if RXf_BASE_SHIFT+22 > 31 diff --git a/regnodes.h b/regnodes.h index 42f2452..b53487d 100644 --- a/regnodes.h +++ b/regnodes.h @@ -785,7 +785,7 @@ EXTCONST char * const PL_reg_name[] = { EXTCONST char * PL_reg_extflags_name[]; #else EXTCONST char * const PL_reg_extflags_name[] = { - /* Bits in extflags defined: 11111111111111111111111011111111 */ + /* Bits in extflags defined: 11011111111111111111111011111111 */ "MULTILINE", /* 0x00000001 */ "SINGLELINE", /* 0x00000002 */ "FOLD", /* 0x00000004 */ @@ -810,12 +810,12 @@ EXTCONST char * const PL_reg_extflags_name[] = { "USE_INTUIT_NOML", /* 0x00200000 */ "USE_INTUIT_ML", /* 0x00400000 */ "INTUIT_TAIL", /* 0x00800000 */ - "SPLIT", /* 0x01000000 */ + "MODIFIES_VARS", /* 0x01000000 */ "COPY_DONE", /* 0x02000000 */ "TAINTED_SEEN", /* 0x04000000 */ "TAINTED", /* 0x08000000 */ "START_ONLY", /* 0x10000000 */ - "SKIPWHITE", /* 0x20000000 */ + "UNUSED_BIT_29", /* 0x20000000 */ "WHITE", /* 0x40000000 */ "NULL", /* 0x80000000 */ }; diff --git a/sv.h b/sv.h index 69a7380..4cac64a 100644 --- a/sv.h +++ b/sv.h @@ -1819,6 +1819,8 @@ mg.c:1024: warning: left-hand operand of comma expression has no effect sv_utf8_upgrade(nsv); \ sv_catsv_nomg(dsv, nsv); \ } STMT_END +#define sv_catpvn_nomg_maybeutf8(dsv, sstr, slen, is_utf8) \ + sv_catpvn_flags(dsv, sstr, slen, (is_utf8)?SV_CATUTF8:SV_CATBYTES) #ifdef PERL_CORE # define sv_or_pv_len_utf8(sv, pv, bytelen) \ diff --git a/t/lib/warnings/9uninit b/t/lib/warnings/9uninit index 717e7f6..43cb670 100644 --- a/t/lib/warnings/9uninit +++ b/t/lib/warnings/9uninit @@ -830,9 +830,8 @@ Use of uninitialized value $m1 in regexp compilation at - line 10. Use of uninitialized value $_ in substitution (s///) at - line 10. Use of uninitialized value $_ in substitution (s///) at - line 10. Use of uninitialized value $_ in substitution (s///) at - line 11. -Use of uninitialized value $g1 in substitution (s///) at - line 11. Use of uninitialized value $_ in substitution (s///) at - line 11. -Use of uninitialized value $g1 in substitution (s///) at - line 11. +Use of uninitialized value $g1 in substitution iterator at - line 11. Use of uninitialized value $m1 in regexp compilation at - line 12. Use of uninitialized value $_ in substitution (s///) at - line 12. Use of uninitialized value $_ in substitution (s///) at - line 12. @@ -849,9 +848,8 @@ Use of uninitialized value $m1 in regexp compilation at - line 21. Use of uninitialized value $_ in substitution (s///) at - line 21. Use of uninitialized value $_ in substitution (s///) at - line 21. Use of uninitialized value $_ in substitution (s///) at - line 22. -Use of uninitialized value $g1 in substitution (s///) at - line 22. Use of uninitialized value $_ in substitution (s///) at - line 22. -Use of uninitialized value $g1 in substitution (s///) at - line 22. +Use of uninitialized value $g1 in substitution iterator at - line 22. Use of uninitialized value $m1 in regexp compilation at - line 23. Use of uninitialized value $_ in substitution (s///) at - line 23. Use of uninitialized value $_ in substitution (s///) at - line 23. @@ -868,9 +866,8 @@ Use of uninitialized value $m1 in regexp compilation at - line 31. Use of uninitialized value $g2 in substitution (s///) at - line 31. Use of uninitialized value $g2 in substitution (s///) at - line 31. Use of uninitialized value $g2 in substitution (s///) at - line 32. -Use of uninitialized value $g1 in substitution (s///) at - line 32. Use of uninitialized value $g2 in substitution (s///) at - line 32. -Use of uninitialized value $g1 in substitution (s///) at - line 32. +Use of uninitialized value $g1 in substitution iterator at - line 32. Use of uninitialized value $m1 in regexp compilation at - line 33. Use of uninitialized value $g2 in substitution (s///) at - line 33. Use of uninitialized value $g2 in substitution (s///) at - line 33. @@ -880,10 +877,10 @@ Use of uninitialized value in transliteration (tr///) at - line 35. Use of uninitialized value $m1 in regexp compilation at - line 38. Use of uninitialized value $g1 in regexp compilation at - line 39. Use of uninitialized value $m1 in regexp compilation at - line 41. -Use of uninitialized value $g1 in substitution (s///) at - line 42. +Use of uninitialized value $g1 in substitution iterator at - line 42. Use of uninitialized value $m1 in regexp compilation at - line 43. Use of uninitialized value $g1 in substitution iterator at - line 43. -Use of uninitialized value $m1 in substitution iterator at - line 44. +Use of uninitialized value $m1 in substitution (s///) at - line 44. Use of uninitialized value in substitution iterator at - line 47. ######## use warnings 'uninitialized'; diff --git a/t/op/defins.t b/t/op/defins.t index 5b26bf8..d3d50fb 100644 --- a/t/op/defins.t +++ b/t/op/defins.t @@ -10,7 +10,7 @@ BEGIN { $SIG{__WARN__} = sub { $warns++; warn $_[0] }; } require 'test.pl'; -plan( tests => 23 ); +plan( tests => 26 ); my $unix_mode = 1; @@ -33,6 +33,15 @@ if ($^O eq 'VMS') { $unix_mode = 1 if $drop_dot && unix_rpt; } +# $wanted_filename should be 0 for readdir() and glob() tests. +# This is because it is the only valid filename that is false in a boolean test. + +# $filename = '0'; +# print "hi\n" if $filename; # doesn't print + +# In the case of VMS, '0' isn't always the filename that you get. +# Which makes those particular tests pointless. + $wanted_filename = $unix_mode ? '0' : '0.'; $saved_filename = './0'; @@ -106,6 +115,31 @@ while ($where{$seen} = readdir(DIR)) } cmp_ok($seen,'==',1,'saw file in hash while()'); +rewinddir(DIR); +$seen = 0; +$_ = 'not 0'; +while (readdir(DIR)) + { + $seen++ if $_ eq $wanted_filename; + } +cmp_ok($seen,'==',1,'saw file in bare while(readdir){...}'); + +rewinddir(DIR); +$seen = 0; +$_ = 'not 0'; + +$_ eq $wanted_filename && $seen++ while readdir(DIR); +cmp_ok($seen,'==',1,'saw file in bare "... while readdir"'); + +rewinddir(DIR); +$seen = 0; +$_ = ""; # suppress uninit warning +do + { + $seen++ if $_ eq $wanted_filename; + } while (readdir(DIR)); +cmp_ok($seen,'==',1,'saw file in bare do{...}while(readdir)'); + $seen = 0; while (my $name = glob('*')) { diff --git a/t/op/taint.t b/t/op/taint.t index 0e89c1f..d621de6 100644 --- a/t/op/taint.t +++ b/t/op/taint.t @@ -561,7 +561,7 @@ my $TEST = 'TEST'; $one = $1; } is_tainted($s, "$desc: s tainted"); - is_tainted($res, "$desc: res tainted"); + isnt_tainted($res, "$desc: res tainted"); isnt_tainted($one, "$desc: \$1 not tainted"); is($s, '123', "$desc: s value"); is($res, 3, "$desc: res value"); @@ -918,7 +918,7 @@ my $TEST = 'TEST'; $one = $1; } is_tainted($s, "$desc: s tainted"); - is_tainted($res, "$desc: res tainted"); + isnt_tainted($res, "$desc: res tainted"); isnt_tainted($one, "$desc: \$1 not tainted"); is($s, '123', "$desc: s value"); is($res, 3, "$desc: res value"); diff --git a/t/op/while_readdir.t b/t/op/while_readdir.t deleted file mode 100644 index 63f8d92..0000000 --- a/t/op/while_readdir.t +++ /dev/null @@ -1,180 +0,0 @@ -#!./perl - -BEGIN { - chdir 't' if -d 't'; - @INC = '../lib'; - require './test.pl'; -} - -use strict; -use warnings; - -plan 10; - -# Need to run this in a quiet private directory as it assumes that it can read -# the contents twice and get the same result. -my $tempdir = tempfile; - -mkdir $tempdir, 0700 or die "Can't mkdir '$tempdir': $!"; -chdir $tempdir or die die "Can't chdir '$tempdir': $!"; - -my $cleanup = 1; -my %tempfiles; - -END { - if ($cleanup) { - foreach my $file (keys %tempfiles) { - # We only wrote each of these once so 1 delete should work: - if (unlink $file) { - warn "unlink tempfile '$file' passed but it's still there" - if -e $file; - } else { - warn "Couldn't unlink tempfile '$file': $!"; - } - } - chdir '..' or die "Couldn't chdir .. for cleanup: $!"; - rmdir $tempdir or die "Couldn't unlink tempdir '$tempdir': $!"; - } -} - -# This is intentionally not random (per run), but intentionally will try to -# give different file names for different people running this test. -srand $< * $]; - -my @chars = ('A' .. 'Z', 'a' .. 'z', 0 .. 9); - -sub make_file { - my $name = shift; - - return if $tempfiles{$name}++; - - print "# Writing to $name in $tempdir\n"; - - open my $fh, '>', $name or die "Can't open '$name' for writing: $!\n"; - print $fh <<'FILE0'; -This file is here for testing - -while(readdir $dir){...} -... while readdir $dir - -etc -FILE0 - close $fh or die "Can't close '$name': $!"; -} - -sub make_some_files { - for (1..int rand 10) { - my $name; - $name .= $chars[rand $#chars] for 1..int(10 + rand 5); - make_file($name); - } -} - -make_some_files(); -make_file('0'); -make_some_files(); - -ok(-f '0', "'0' file is here"); - -opendir my $dirhandle, '.' - or die "Failed test: unable to open directory: $!\n"; - -my @dir = readdir $dirhandle; -rewinddir $dirhandle; - -{ - my @list; - while(readdir $dirhandle){ - push @list, $_; - } - ok( eq_array( \@dir, \@list ), 'while(readdir){push}' ); - rewinddir $dirhandle; -} - -{ - my @list; - push @list, $_ while readdir $dirhandle; - ok( eq_array( \@dir, \@list ), 'push while readdir' ); - rewinddir $dirhandle; -} - -{ - my $tmp; - my @list; - push @list, $tmp while $tmp = readdir $dirhandle; - ok( eq_array( \@dir, \@list ), 'push $dir while $dir = readdir' ); - rewinddir $dirhandle; -} - -{ - my @list; - while( my $dir = readdir $dirhandle){ - push @list, $dir; - } - ok( eq_array( \@dir, \@list ), 'while($dir=readdir){push}' ); - rewinddir $dirhandle; -} - - -{ - my @list; - my $sub = sub{ - push @list, $_; - }; - $sub->($_) while readdir $dirhandle; - ok( eq_array( \@dir, \@list ), '$sub->($_) while readdir' ); - rewinddir $dirhandle; -} - -{ - my $works = 0; - while(readdir $dirhandle){ - $_ =~ s/\.$// if defined $_ && $^O eq 'VMS'; # may have zero-length extension - if( defined $_ && $_ eq '0'){ - $works = 1; - last; - } - } - ok( $works, 'while(readdir){} with file named "0"' ); - rewinddir $dirhandle; -} - -{ - my $works = 0; - my $sub = sub{ - $_ =~ s/\.$// if defined $_ && $^O eq 'VMS'; # may have zero-length extension - if( defined $_ && $_ eq '0' ){ - $works = 1; - } - }; - $sub->($_) while readdir $dirhandle; - ok( $works, '$sub->($_) while readdir; with file named "0"' ); - rewinddir $dirhandle; -} - -{ - my $works = 0; - while( my $dir = readdir $dirhandle ){ - $dir =~ s/\.$// if defined $dir && $^O eq 'VMS'; # may have zero-length extension - if( defined $dir && $dir eq '0'){ - $works = 1; - last; - } - } - ok( $works, 'while($dir=readdir){} with file named "0"'); - rewinddir $dirhandle; -} - -{ - my $tmp; - my $ok; - my @list; - while( $tmp = readdir $dirhandle ){ - $tmp =~ s/\.$// if defined $tmp && $^O eq 'VMS'; # may have zero-length extension - last if defined($tmp)&& !$tmp && ($ok=1) - } - ok( $ok, '$dir while $dir = readdir; with file named "0"' ); - rewinddir $dirhandle; -} - -closedir $dirhandle; diff --git a/t/re/pat_rt_report.t b/t/re/pat_rt_report.t index 453e5ab..262e8d3 100644 --- a/t/re/pat_rt_report.t +++ b/t/re/pat_rt_report.t @@ -22,7 +22,7 @@ BEGIN { } -plan tests => 2525; # Update this when adding/deleting tests. +plan tests => 2527; # Update this when adding/deleting tests. run_tests() unless caller; @@ -921,6 +921,10 @@ sub run_tests { $_ = "CCCCBAA"; ok(s/(*:X)A+|(*:Y)B+|(*:Z)C+/$REGMARK/g, $message); is($_, "ZYX", $message); + # Use a longer name to force reallocation of $REGMARK. + $_ = "CCCCBAA"; + ok(s/(*:X)A+|(*:YYYYYYYYYYYYYYYY)B+|(*:Z)C+/$REGMARK/g, $message); + is($_, "ZYYYYYYYYYYYYYYYYX", $message); } { diff --git a/t/re/subst.t b/t/re/subst.t index 0016843..b139812 100644 --- a/t/re/subst.t +++ b/t/re/subst.t @@ -7,7 +7,7 @@ BEGIN { require './test.pl'; } -plan( tests => 201 ); +plan( tests => 205 ); $_ = 'david'; $a = s/david/rules/r; @@ -841,6 +841,7 @@ pass("s/// on tied var returning a cow"); } +# Test problems with constant replacement optimisation # [perl #26986] logop in repl resulting in incorrect optimisation "g" =~ /(.)/; @l{'a'..'z'} = 'A'..':'; @@ -848,3 +849,35 @@ $_ = "hello"; { s/(.)/$l{my $a||$1}/g } is $_, "HELLO", 'logop in s/// repl does not result in "constant" repl optimisation'; +# Aliases to match vars +"g" =~ /(.)/; +$_ = "hello"; +{ + local *a = *1; + s/(.)\1/$a/g; +} +is $_, 'helo', 's/pat/$alias_to_match_var/'; +"g" =~ /(.)/; +$_ = "hello"; +{ + local *a = *1; + s/e(.)\1/a$a/g; +} +is $_, 'halo', 's/pat/$alias_to_match_var/'; +# Last-used pattern containing re-evals that modify "constant" rhs +{ + local *a; + $x = "hello"; + $x =~ /(?{*a = \"a"})./; + undef *a; + $x =~ s//$a/g; + is $x, 'aaaaa', + 'last-used pattern disables constant repl optimisation'; +} + + +$_ = "\xc4\x80"; +$a = ""; +utf8::upgrade $a; +$_ =~ s/$/$a/; +is $_, "\xc4\x80", "empty utf8 repl does not result in mangled utf8"; -- Perl5 Master Repository