In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/53fdf12aede2fd58b0c5ee236a58025184f49def?hp=ca31f56c9ea43ef6c05c38da5542fb95a322f2c1>
- Log ----------------------------------------------------------------- commit 53fdf12aede2fd58b0c5ee236a58025184f49def Author: Karl Williamson <k...@cpan.org> Date: Tue May 30 21:19:20 2017 -0600 Relax fatal circumstances of unescaped '{' After the 5.26.0 code freeze, it came out that an application that many others depend on, GNU Autoconf, has an unescaped '{' in it. Commit 7335cb814c19345052a23bc4462c701ce734e6c5 created a kludge that was minimal, and designed to get just that one application to work. I originally proposed a less kludgy patch that was applicable across a larger set of applications. The proposed patch didn't fatalize uses of unesacped '{' where we don't anticipate using it for something other than its literal self. That approach worked for Autoconf, but also far more instances, but was more complicated, and was rejected as being too risky during code freeze. Now this commit implements my original suggestion. I am putting it in now, to let it soak in blead, in case something else surfaces besides Autoconf, that we need to work around. By having experience with the patch live, we can be more confident about using it, if necessary, in a dot release. M embed.fnc M embed.h M pod/perldelta.pod M proto.h M regcomp.c M t/re/reg_mesg.t commit d30277c7df69b4aca40edeb2ae3bf9bb529b01e8 Author: Karl Williamson <k...@cpan.org> Date: Wed May 31 13:08:33 2017 -0600 t/re/reg_mesg.t: Add override of warning default on/off This .t needs an overhaul to more cleanly accommodate the extra tasks it has been given over the years. But until then, this is a minimal enhancement that will be useful in the commit after this one. This adds the ability to specify that a particular pattern being tested should generate a message which is raised by default vs one that isn't. The messages are currently grouped in categories whose default is determined by the category itself. This commit avoids having to create a new category when a message comes along that doesn't quite fit into the existing ones. M t/re/reg_mesg.t commit 9b5d391f84ebc32466959f6f8f90891b3930c4b9 Author: Karl Williamson <k...@cpan.org> Date: Wed May 31 13:44:20 2017 -0600 regcomp.c: Don't set variable within an 'if' Sometimes it is convenient/and or necessary to do an assignment within a clause of an 'if', but it adds a little cognitive load. In this case, it's entirely unnecessary. This patch changes to do the assignment before the 'if'. M regcomp.c commit e613617cd28e5ec6a6555339c1300e9fbdbde425 Author: Karl Williamson <k...@cpan.org> Date: Tue May 23 20:54:06 2017 -0600 perlguts: Add some C<> M pod/perlguts.pod commit fc2ecde5b62c6497fe698bfcc9a7e299db105832 Author: Karl Williamson <k...@cpan.org> Date: Tue May 23 20:53:11 2017 -0600 charnames: Clarify comment M lib/_charnames.pm commit 754e15cfc175f0e2e1299d27fd387a0d868c2764 Author: Karl Williamson <k...@cpan.org> Date: Tue May 23 20:51:56 2017 -0600 charnames: Remove obsolete pod about NBSP This is illegal since 5.26, and the text should have been removed then, but was overlooked. M lib/_charnames.pm M lib/charnames.pm commit a1c583cae893f7b865a4436117411a5c84ed09c3 Author: Karl Williamson <k...@cpan.org> Date: Tue May 23 20:50:44 2017 -0600 perlmodinstall: Make a link for http text M pod/perlmodinstall.pod commit a6951642ede4abe605dcf0e94b74948e0a60a56b Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 26 10:29:58 2017 -0600 utf8.h: Add assertions for macros that take chars This is inspired by [perl #131190]. The UTF-8 macros whose parameters are characters now have assertions that verify they are not being called with something that won't fit in a char. These assertions should be getting optimized out if the input type is a char or U8. M utf8.h commit a5ba252751fc7fd7b9d43d0ad4491eb68a14a4a6 Author: Karl Williamson <k...@cpan.org> Date: Thu Apr 20 08:33:42 2017 -0600 Change formal parameter for newSVpvn This fixes a discrepancy in perlapi. See http://nntp.perl.org/group/perl.perl5.porters/243384 M embed.fnc M proto.h commit a25f3052bac2e69bc82d661cc97662078e096e3f Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 11 11:37:23 2017 -0600 AUTHORS: Update Jim Shneider's email M AUTHORS M Porting/checkAUTHORS.pl commit 06b22f95a86a59966b460870bdffe91363ea4822 Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 11 14:11:40 2017 -0600 t/op/fork.t: Don't output shell warning If the shell doesn't support 'ulimit -u', it can cause unexpected warnings that can cause the tests to fail. This happens on s/390. M t/op/fork.t commit 20d1d40b2ce87fc39895935ec95221e8b9e36392 Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 11 11:11:24 2017 -0600 APItest/numeric.xs: Fix uninit error valgrind shows that a variable could be used unininitialized. M ext/XS-APItest/APItest.pm M ext/XS-APItest/numeric.xs commit 7d24ce13858cd4a442f7b74228dfbbe808147063 Author: Karl Williamson <k...@cpan.org> Date: Tue Mar 21 21:52:33 2017 -0600 t/harness: Run APItests in parallel This commit changes these tests to be run like the tests in t/lib, in parallel with each other, when available. This is the longest running directory, and prior to this commit, on many-core systems it can be the final thing chugging along, a test at-a-time, while the other cores are idle. M t/harness commit 704b574dcdee24984d4fe89fd31a01cb6fc2ac72 Author: Karl Williamson <k...@cpan.org> Date: Tue Mar 21 21:41:34 2017 -0600 t/harness: Remove useless sort Instead move its effect to the sort that overrides the first one. This is because the tests are executed in the order of the rules to TAP::Harness, not in the order of the test list. M t/harness commit eb7486de5e577623ae719baf1b442bde992cb7c7 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 30 14:52:46 2017 -0700 Slightly change -Dr output of regex ANYOF nodes This changes to precede each literal '[' in a [...] class with a backslash to better make is standout as a literal M regcomp.c M t/re/anyof.t ----------------------------------------------------------------------- Summary of changes: AUTHORS | 2 +- Porting/checkAUTHORS.pl | 1 + embed.fnc | 3 +- embed.h | 1 + ext/XS-APItest/APItest.pm | 2 +- ext/XS-APItest/numeric.xs | 2 +- lib/_charnames.pm | 6 +-- lib/charnames.pm | 10 ++--- pod/perldelta.pod | 17 +++++++++ pod/perlguts.pod | 4 +- pod/perlmodinstall.pod | 2 +- proto.h | 7 +++- regcomp.c | 97 +++++++++++++++++++++++++++++++++++++---------- t/harness | 8 ++-- t/op/fork.t | 2 +- t/re/anyof.t | 4 +- t/re/reg_mesg.t | 28 ++++++++++---- utf8.h | 28 +++++++++----- 18 files changed, 162 insertions(+), 62 deletions(-) diff --git a/AUTHORS b/AUTHORS index 35e3068d76..aa0a22a2fe 100644 --- a/AUTHORS +++ b/AUTHORS @@ -594,7 +594,7 @@ Jim Cromie <jcro...@cpan.org> Jim Meyering <meyer...@asic.sc.ti.com> Jim Miner <j...@winternet.com> Jim Richardson -Jim Schneider <jschn...@netilla.com> +Jim Schneider <james.schnei...@db.com> Jirka HruÅ¡ka <ji...@fud.cz> Joachim Huober Joaquin Ferrero <explo...@joaquinferrero.com> diff --git a/Porting/checkAUTHORS.pl b/Porting/checkAUTHORS.pl index 9ee55f7724..57582d70bc 100755 --- a/Porting/checkAUTHORS.pl +++ b/Porting/checkAUTHORS.pl @@ -692,6 +692,7 @@ jpeacock\100messagesystems.com john.peacock\100havurah-software.org + jpeacock\100jpeacock-hp.doesntexist.org + jpeacock\100cpan.org + jpeacock\100rowman.com +james.schneider\100db.com jschneid\100netilla.com jpl.jpl\100gmail.com jpl\100research.att.com jql\100accessone.com jql\100jql.accessone.com jsm28\100hermes.cam.ac.uk jsm28\100cam.ac.uk diff --git a/embed.fnc b/embed.fnc index d0c9953273..bb107fa91d 100644 --- a/embed.fnc +++ b/embed.fnc @@ -1138,7 +1138,7 @@ ApdR |SV* |newSViv |const IV i ApdR |SV* |newSVuv |const UV u ApdR |SV* |newSVnv |const NV n ApdR |SV* |newSVpv |NULLOK const char *const s|const STRLEN len -ApdR |SV* |newSVpvn |NULLOK const char *const s|const STRLEN len +ApdR |SV* |newSVpvn |NULLOK const char *const buffer|const STRLEN len ApdR |SV* |newSVpvn_flags |NULLOK const char *const s|const STRLEN len|const U32 flags ApdR |SV* |newSVhek |NULLOK const HEK *const hek ApdR |SV* |newSVpvn_share |NULLOK const char* s|I32 len|U32 hash @@ -1655,6 +1655,7 @@ EMRs |SV* |_make_exactf_invlist |NN RExC_state_t *pRExC_state \ |NN regnode *node EsMR |SV* |invlist_contents|NN SV* const invlist \ |const bool traditional_style +EsRn |bool |new_regcurly |NN const char *s|NN const char *e #endif #if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_UTF8_C) EXmM |void |_invlist_intersection |NN SV* const a|NN SV* const b|NN SV** i diff --git a/embed.h b/embed.h index 2fa77c6fda..54b8fbddc8 100644 --- a/embed.h +++ b/embed.h @@ -1042,6 +1042,7 @@ #define is_ssc_worth_it S_is_ssc_worth_it #define join_exact(a,b,c,d,e,f,g) S_join_exact(aTHX_ a,b,c,d,e,f,g) #define make_trie(a,b,c,d,e,f,g,h) S_make_trie(aTHX_ a,b,c,d,e,f,g,h) +#define new_regcurly S_new_regcurly #define nextchar(a) S_nextchar(aTHX_ a) #define output_or_return_posix_warnings(a,b,c) S_output_or_return_posix_warnings(aTHX_ a,b,c) #define parse_lparen_question_flags(a) S_parse_lparen_question_flags(aTHX_ a) diff --git a/ext/XS-APItest/APItest.pm b/ext/XS-APItest/APItest.pm index 1be011660b..a5dd1332f6 100644 --- a/ext/XS-APItest/APItest.pm +++ b/ext/XS-APItest/APItest.pm @@ -5,7 +5,7 @@ use strict; use warnings; use Carp; -our $VERSION = '0.88'; +our $VERSION = '0.89'; require XSLoader; diff --git a/ext/XS-APItest/numeric.xs b/ext/XS-APItest/numeric.xs index 0ce9e080ff..fac81ba3e0 100644 --- a/ext/XS-APItest/numeric.xs +++ b/ext/XS-APItest/numeric.xs @@ -40,7 +40,7 @@ grok_atoUV(number, endsv) const char *pv = SvPV(number, len); UV value = 0xdeadbeef; bool result; - const char* endptr; + const char* endptr = NULL; PPCODE: EXTEND(SP,2); if (endsv == &PL_sv_undef) { diff --git a/lib/_charnames.pm b/lib/_charnames.pm index 50fdd85c5d..c6169d16f8 100644 --- a/lib/_charnames.pm +++ b/lib/_charnames.pm @@ -6,7 +6,7 @@ package _charnames; use strict; use warnings; -our $VERSION = '1.44'; +our $VERSION = '1.45'; use unicore::Name; # mktables-generated algorithmically-defined names use bytes (); # for $bytes::hint_bits @@ -22,8 +22,8 @@ $Carp::Internal{ (__PACKAGE__) } = 1; # The official names with their code points are stored in a table in # lib/unicore/Name.pl which is read in as a large string (almost 3/4 Mb in # Unicode 6.0). Each code point/name combination is separated by a \n in the -# string. (Some of the CJK and the Hangul syllable names are determined -# instead algorithmically via subroutines stored instead in +# string. (Some of the CJK and the Hangul syllable names are instead +# determined algorithmically via subroutines stored instead in # lib/unicore/Name.pm). Because of the large size of this table, it isn't # converted into hashes for faster lookup. # diff --git a/lib/charnames.pm b/lib/charnames.pm index a0f3227f4d..e22c71913c 100644 --- a/lib/charnames.pm +++ b/lib/charnames.pm @@ -1,7 +1,7 @@ package charnames; use strict; use warnings; -our $VERSION = '1.44'; +our $VERSION = '1.45'; use unicore::Name; # mktables-generated algorithmically-defined names use _charnames (); # The submodule for this where most of the work gets done @@ -278,11 +278,9 @@ mean C<"B">, etc. Aliases must begin with a character that is alphabetic. After that, each may contain any combination of word (C<\w>) characters, SPACE (U+0020), -HYPHEN-MINUS (U+002D), LEFT PARENTHESIS (U+0028), RIGHT PARENTHESIS (U+0029), -and NO-BREAK SPACE (U+00A0). These last three should never have been allowed -in names, and are retained for backwards compatibility only; NO-BREAK SPACE IS -currently deprecated and scheduled for removal in Perl v5.26; the other two -may also be +HYPHEN-MINUS (U+002D), LEFT PARENTHESIS (U+0028), and RIGHT PARENTHESIS +(U+0029). These last two should never have been allowed +in names, and are retained for backwards compatibility only, and may be deprecated and removed in future releases of Perl, so don't use them for new names. (More precisely, the first character of a name you specify must be something that matches all of C<\p{ID_Start}>, C<\p{Alphabetic}>, and diff --git a/pod/perldelta.pod b/pod/perldelta.pod index 13b9cdcd18..584474d729 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -59,6 +59,23 @@ respectively. XXX Any deprecated features, syntax, modules etc. should be listed here. +=head2 Some uses of unescaped C<"{"> are no longer fatal + +Perl 5.26.0 fatalized some uses of an unescaped left brace, but an +exception was made at the last minute, specifically crafted to be a +minimal change to allow GNU Autoconf to work. This code is heavily +depended upon, and continues to use the deprecated usage. Its use of an +unescaped left brace is one where we have no intention of repurposing +C<"{"> to be something other than itself. + +That exception is now generalized to include various other such cases +where the C<"{"> will not be repurposed. This is to get real experience +with this more complicated change now, in case we need to issue a dot +release if we find other things like Autoconf that are important to work +around. + +Note that these uses continue to raise a deprecation message. + =head2 Module removals XXX Remove this section if inapplicable. diff --git a/pod/perlguts.pod b/pod/perlguts.pod index 2da946ce46..56eee93164 100644 --- a/pod/perlguts.pod +++ b/pod/perlguts.pod @@ -56,7 +56,7 @@ The seven routines are: SV* newSVpvf(const char*, ...); SV* newSVsv(SV*); -C<STRLEN> is an integer type (Size_t, usually defined as size_t in +C<STRLEN> is an integer type (C<Size_t>, usually defined as C<size_t> in F<config.h>) guaranteed to be large enough to represent the size of any string that perl can handle. @@ -2675,7 +2675,7 @@ whatever the compiler has. If you are printing addresses of pointers, use UVxf combined with PTR2UV(), do not use %lx or %p. -=head2 Formatted Printing of Size_t and SSize_t +=head2 Formatted Printing of C<Size_t> and C<SSize_t> The most general way to do this is to cast them to a UV or IV, and print as in the diff --git a/pod/perlmodinstall.pod b/pod/perlmodinstall.pod index 39c410df30..86f9cf948a 100644 --- a/pod/perlmodinstall.pod +++ b/pod/perlmodinstall.pod @@ -416,7 +416,7 @@ don't send me mail asking for help on how to install your modules. There are too many modules, and too few Orwants, for me to be able to answer or even acknowledge all your questions. Contact the module author instead, ask someone familiar with Perl on your operating -system, or if all else fails, file a ticket at http://rt.cpan.org/. +system, or if all else fails, file a ticket at L<http://rt.cpan.org/>. =head1 AUTHOR diff --git a/proto.h b/proto.h index 8307c6d6a9..f60564370a 100644 --- a/proto.h +++ b/proto.h @@ -2313,7 +2313,7 @@ PERL_CALLCONV SV* Perl_newSVpvf(pTHX_ const char *const pat, ...) #define PERL_ARGS_ASSERT_NEWSVPVF \ assert(pat) -PERL_CALLCONV SV* Perl_newSVpvn(pTHX_ const char *const s, const STRLEN len) +PERL_CALLCONV SV* Perl_newSVpvn(pTHX_ const char *const buffer, const STRLEN len) __attribute__warn_unused_result__; PERL_CALLCONV SV* Perl_newSVpvn_flags(pTHX_ const char *const s, const STRLEN len, const U32 flags) @@ -5138,6 +5138,11 @@ STATIC U32 S_join_exact(pTHX_ RExC_state_t *pRExC_state, regnode *scan, UV *min_ STATIC I32 S_make_trie(pTHX_ RExC_state_t *pRExC_state, regnode *startbranch, regnode *first, regnode *last, regnode *tail, U32 word_count, U32 flags, U32 depth); #define PERL_ARGS_ASSERT_MAKE_TRIE \ assert(pRExC_state); assert(startbranch); assert(first); assert(last); assert(tail) +STATIC bool S_new_regcurly(const char *s, const char *e) + __attribute__warn_unused_result__; +#define PERL_ARGS_ASSERT_NEW_REGCURLY \ + assert(s); assert(e) + STATIC void S_nextchar(pTHX_ RExC_state_t *pRExC_state); #define PERL_ARGS_ASSERT_NEXTCHAR \ assert(pRExC_state) diff --git a/regcomp.c b/regcomp.c index f9d56c1813..6fc3716951 100644 --- a/regcomp.c +++ b/regcomp.c @@ -119,7 +119,7 @@ typedef struct scan_frame { /* Certain characters are output as a sequence with the first being a * backslash. */ -#define isBACKSLASHED_PUNCT(c) strchr("-]\\^", c) +#define isBACKSLASHED_PUNCT(c) strchr("-[]\\^", c) struct RExC_state_t { @@ -12398,6 +12398,52 @@ S_alloc_maybe_populate_EXACT(pTHX_ RExC_state_t *pRExC_state, } } +STATIC bool +S_new_regcurly(const char *s, const char *e) +{ + /* This is a temporary function designed to match the most lenient form of + * a {m,n} quantifier we ever envision, with either number omitted, and + * spaces anywhere between/before/after them. + * + * If this function fails, then the string it matches is very unlikely to + * ever be considered a valid quantifier, so we can allow the '{' that + * begins it to be considered as a literal */ + + bool has_min = FALSE; + bool has_max = FALSE; + + PERL_ARGS_ASSERT_NEW_REGCURLY; + + if (s >= e || *s++ != '{') + return FALSE; + + while (s < e && isSPACE(*s)) { + s++; + } + while (s < e && isDIGIT(*s)) { + has_min = TRUE; + s++; + } + while (s < e && isSPACE(*s)) { + s++; + } + + if (*s == ',') { + s++; + while (s < e && isSPACE(*s)) { + s++; + } + while (s < e && isDIGIT(*s)) { + has_max = TRUE; + s++; + } + while (s < e && isSPACE(*s)) { + s++; + } + } + + return s < e && *s == '}' && (has_min || has_max); +} /* Parse backref decimal value, unless it's too big to sensibly be a backref, * in which case return I32_MAX (rather than possibly 32-bit wrapping) */ @@ -12832,6 +12878,12 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth) /* FALLTHROUGH */ finish_meta_pat: + if ( UCHARAT(RExC_parse + 1) == '{' + && UNLIKELY(! new_regcurly(RExC_parse + 1, RExC_end))) + { + RExC_parse += 2; + vFAIL("Unescaped left brace in regex is illegal here"); + } nextchar(pRExC_state); Set_Node_Length(ret, 2); /* MJD */ break; @@ -13381,22 +13433,25 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth) } /* End of switch on '\' */ break; case '{': - /* Currently we don't care if the lbrace is at the start - * of a construct. This catches it in the middle of a - * literal string, or when it's the first thing after - * something like "\b" */ - if (len || (p > RExC_start && isALPHA_A(*(p -1)))) { - - /* GNU Autoconf is depended on by a lot of code, and - * can't seem to release a new version that avoids the - * deprecation now made fatal. (A commit to do so has - * been in its repository since early 2013; only one - * pattern is affected.) As a work-around, don't - * fatalize this if the pattern being compiled is the - * precise one that trips up Autoconf. See [perl - * #130497] for more details. */ - if (memNEs(RExC_start, RExC_end - RExC_start, - "\\${[^\\}]*}")) + /* Currently we allow an lbrace at the start of a construct + * without raising a warning. This is because we think we + * will never want such a brace to be meant to be other + * than taken literally. */ + if (len || (p > RExC_start && isALPHA_A(*(p - 1)))) { + + /* But, we raise a fatal warning otherwise, as the + * deprecation cycle has come and gone. Except that it + * turns out that some heavily-relied on upstream + * software, notably GNU Autoconf, have failed to fix + * their uses. For these, don't make it fatal unless + * we anticipate using the '{' for something else. + * This happens after any alpha, and for a looser {m,n} + * quantifier specification */ + if ( RExC_strict + || ( p > parse_start + 1 + && isALPHA_A(*(p - 1)) + && *(p - 2) == '\\') + || new_regcurly(p, RExC_end)) { RExC_parse = p + 1; vFAIL("Unescaped left brace in regex is " @@ -13446,10 +13501,10 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth) * this character again next time through, when it will be the * only thing in its new node */ - if ((next_is_quantifier = ( LIKELY(p < RExC_end) - && UNLIKELY(ISMULT2(p)))) - && LIKELY(len)) - { + next_is_quantifier = LIKELY(p < RExC_end) + && UNLIKELY(ISMULT2(p)); + + if (next_is_quantifier && LIKELY(len)) { p = oldp; goto loopdone; } diff --git a/t/harness b/t/harness index b46582ddd9..3fc3ae082e 100644 --- a/t/harness +++ b/t/harness @@ -145,7 +145,7 @@ if (@ARGV) { push @seq, $next; my @last; - push @last, sort { lc $a cmp lc $b } + push @last, _tests_from_manifest($Config{extensions}, $Config{known_extensions}); my %times; if ($state) { @@ -163,7 +163,9 @@ if (@ARGV) { s,\\,/,g; # canonicalize path }; # Treat every file matching lib/*.t as a "directory" - m!\A(\.\./lib/[^/]+\.t\z|.*[/])! or die "'$_'"; + m! \A ( \.\. / (?: lib | ext/XS-APItest/t ) + / [^/]+ \.t \z | .* [/] ) !x + or die "'$_'"; push @{$dir{$1}}, $_; $total_time{$1} += $times{$_} || 0; } @@ -174,7 +176,7 @@ if (@ARGV) { # sequentially. push @seq, { par => [ map { s!/$!/*!; { seq => $_ } } sort { # Directories, ordered by total time descending then name ascending - $total_time{$b} <=> $total_time{$a} || $a cmp $b + $total_time{$b} <=> $total_time{$a} || lc $a cmp lc $b } keys %dir ] }; $rules = { seq => \@seq }; diff --git a/t/op/fork.t b/t/op/fork.t index b69a929fe0..be3125d673 100644 --- a/t/op/fork.t +++ b/t/op/fork.t @@ -22,7 +22,7 @@ SKIP: { skip "This test can only be run under bash or zsh" unless $shell =~ m{/(?:ba|z)sh$}; my $probe = qx{ - $shell -c 'ulimit -u 1 2>&1 && echo good' + $shell -c 'ulimit -u 1 2>/dev/null && echo good' }; chomp $probe; skip "Can't set ulimit -u on this system: $probe" diff --git a/t/re/anyof.t b/t/re/anyof.t index 12ae043fc3..84f2881e0d 100644 --- a/t/re/anyof.t +++ b/t/re/anyof.t @@ -29,10 +29,10 @@ BEGIN { # skipped and not skipped. my @tests = ( - '[[{]' => 'ANYOF[[\{]', + '[[{]' => 'ANYOF[\[\{]', '[^\S ]' => 'ANYOFD[\t\n\x0B\f\r{utf8}\x85\xA0][1680 2000-200A 2028-2029 202F 205F 3000]', '[^\n\r]' => 'ANYOF[^\n\r][0100-INFINITY]', - '[^\/\|,\$\%%\@\ \%"\<\>\:\#\&\*\{\}\[\]\(\)]' => 'ANYOF[^ "#$%&()*,/:<>@[\]\{|\}][0100-INFINITY]', + '[^\/\|,\$\%%\@\ \%"\<\>\:\#\&\*\{\}\[\]\(\)]' => 'ANYOF[^ "#$%&()*,/:<>@\[\]\{|\}][0100-INFINITY]', '[^[:^print:][:^ascii:]]' => 'ANYOF[\x20-\x7E]', '[ [:blank:]]' => 'ANYOFD[\t {utf8}\xA0][1680 2000-200A 202F 205F 3000]', '[_[:^blank:]]' => 'ANYOFD[^\t {utf8}\xA0][0100-167F 1681-1FFF 200B-202E 2030-205E 2060-2FFF 3001-INFINITY]', diff --git a/t/re/reg_mesg.t b/t/re/reg_mesg.t index b80b692e62..090eccbbb4 100644 --- a/t/re/reg_mesg.t +++ b/t/re/reg_mesg.t @@ -106,9 +106,14 @@ my $high_mixed_digit = ('A' lt '0') ? '0' : 'A'; my $colon_hex = sprintf "%02X", ord(":"); my $tab_hex = sprintf "%02X", ord("\t"); -## -## Key-value pairs of code/error of code that should have fatal errors. -## +# Key-value pairs of strings eval'd as patterns => warn/error messages that +# they should generate. In some cases, the value is an array of multiple +# messages. Some groups have the message(s) be default on; others, default +# off. This can be overridden on an individual key basis by preceding the +# pattern string with either 'default_on' or 'default_off' +# +# The first set are those that should be fatal errors. + my @death = ( '/[[=foo=]]/' => 'POSIX syntax [= =] is reserved for future extensions {#} m/[[=foo=]{#}]/', @@ -284,8 +289,6 @@ my @death = '/\w{/' => 'Unescaped left brace in regex is illegal here {#} m/\w{{#}/', '/\q{/' => 'Unescaped left brace in regex is illegal here {#} m/\q{{#}/', '/\A{/' => 'Unescaped left brace in regex is illegal here {#} m/\A{{#}/', - '/:{4,a}/' => 'Unescaped left brace in regex is illegal here {#} m/:{{#}4,a}/', - '/xa{3\,4}y/' => 'Unescaped left brace in regex is illegal here {#} m/xa{{#}3\,4}y/', '/abc/xix' => "", '/(?xmsixp:abc)/' => "", '/(?xmsixp)abc/' => "", @@ -372,6 +375,12 @@ my @death_only_under_strict = ( => 'False [] range "[:digit:]-" {#} m/[[:digit:]-{#}[:alpha:]]\x{100}/', '/[a\zb]\x{100}/' => 'Unrecognized escape \z in character class passed through {#} m/[a\z{#}b]\x{100}/', => 'Unrecognized escape \z in character class {#} m/[a\z{#}b]\x{100}/', + 'default_on/:{4,a}/' => 'Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through {#} m/:{{#}4,a}/', + => 'Unescaped left brace in regex is illegal here {#} m/:{{#}4,a}/', + 'default_on/xa{3\,4}y/' => 'Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through {#} m/xa{{#}3\,4}y/', + => 'Unescaped left brace in regex is illegal here {#} m/xa{{#}3\,4}y/', + 'default_on/\\${[^\\}]*}/' => 'Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through {#} m/\\${{#}[^\\}]*}/', + => 'Unescaped left brace in regex is illegal here {#} m/\\${{#}[^\\}]*}/', ); # These need the character 'ã' as a marker for mark_as_utf8() @@ -652,7 +661,6 @@ my @deprecated = ( '/.{/' => 'Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through {#} m/.{{#}/', '/[x]{/' => 'Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through {#} m/[x]{{#}/', '/\p{Latin}{/' => 'Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through {#} m/\p{Latin}{{#}/', - '/\\${[^\\}]*}/' => 'Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through {#} m/\\${{#}[^\\}]*}/', ); for my $strict ("", "use re 'strict';") { @@ -673,7 +681,7 @@ for my $strict ("", "use re 'strict';") { } } for (my $i = 0; $i < @death; $i += 2) { - my $regex = $death[$i]; + my $regex = $death[$i] =~ s/ default_ (on | off) //rx; my $expect = fixup_expect($death[$i+1], $strict); no warnings 'experimental::regex_sets'; no warnings 'experimental::re_strict'; @@ -736,7 +744,11 @@ for my $strict ("", "no warnings 'experimental::re_strict'; use re 'strict';") $default_on = 1; } for (my $i = 0; $i < @$ref; $i += 2) { + my $this_default_on = $default_on; my $regex = $ref->[$i]; + if ($regex =~ s/ default_ (on | off) //x) { + $this_default_on = $1 eq 'on'; + } my @expect = fixup_expect($ref->[$i+1], $strict); # A length-1 array with an empty warning means no warning gets @@ -790,7 +802,7 @@ for my $strict ("", "no warnings 'experimental::re_strict'; use re 'strict';") eval "$strict $regex" }); # Warning should be on as well if is testing # '(?[...])' which turns on strict - if ($default_on || grep { $_ =~ /\Q(?[/ } @expect ) { + if ($this_default_on || grep { $_ =~ /\Q(?[/ } @expect ) { ok @warns > 0, "... and the warning is on by default"; } else { diff --git a/utf8.h b/utf8.h index affa2d67f5..b2e338a80a 100644 --- a/utf8.h +++ b/utf8.h @@ -266,13 +266,15 @@ C<cp> is Unicode if above 255; otherwise is platform-native. /* Misleadingly named: is the UTF8-encoded byte 'c' part of a variant sequence * in UTF-8? This is the inverse of UTF8_IS_INVARIANT. The |0 makes sure this * isn't mistakenly called with a ptr argument */ -#define UTF8_IS_CONTINUED(c) (((U8)((c) | 0)) & UTF_CONTINUATION_MARK) +#define UTF8_IS_CONTINUED(c) (__ASSERT_(FITS_IN_8_BITS(c)) \ + ((U8)((c) | 0)) & UTF_CONTINUATION_MARK) /* Is the byte 'c' the first byte of a multi-byte UTF8-8 encoded sequence? * This doesn't catch invariants (they are single-byte). It also excludes the * illegal overlong sequences that begin with C0 and C1. The |0 makes sure * this isn't mistakenly called with a ptr argument */ -#define UTF8_IS_START(c) (((U8)((c) | 0)) >= 0xc2) +#define UTF8_IS_START(c) (__ASSERT_(FITS_IN_8_BITS(c)) \ + ((U8)((c) | 0)) >= 0xc2) /* For use in UTF8_IS_CONTINUATION() below */ #define UTF_IS_CONTINUATION_MASK 0xC0 @@ -280,20 +282,22 @@ C<cp> is Unicode if above 255; otherwise is platform-native. /* Is the byte 'c' part of a multi-byte UTF8-8 encoded sequence, and not the * first byte thereof? The |0 makes sure this isn't mistakenly called with a * ptr argument */ -#define UTF8_IS_CONTINUATION(c) \ - ((((U8)((c) | 0)) & UTF_IS_CONTINUATION_MASK) == UTF_CONTINUATION_MARK) +#define UTF8_IS_CONTINUATION(c) (__ASSERT_(FITS_IN_8_BITS(c)) \ + (((U8)((c) | 0)) & UTF_IS_CONTINUATION_MASK) == UTF_CONTINUATION_MARK) /* Is the UTF8-encoded byte 'c' the first byte of a two byte sequence? Use * UTF8_IS_NEXT_CHAR_DOWNGRADEABLE() instead if the input isn't known to * be well-formed. Masking with 0xfe allows the low bit to be 0 or 1; thus * this matches 0xc[23]. The |0 makes sure this isn't mistakenly called with a * ptr argument */ -#define UTF8_IS_DOWNGRADEABLE_START(c) ((((U8)((c) | 0)) & 0xfe) == 0xc2) +#define UTF8_IS_DOWNGRADEABLE_START(c) (__ASSERT_(FITS_IN_8_BITS(c)) \ + (((U8)((c) | 0)) & 0xfe) == 0xc2) /* Is the UTF8-encoded byte 'c' the first byte of a sequence of bytes that * represent a code point > 255? The |0 makes sure this isn't mistakenly * called with a ptr argument */ -#define UTF8_IS_ABOVE_LATIN1(c) (((U8)((c) | 0)) >= 0xc4) +#define UTF8_IS_ABOVE_LATIN1(c) (__ASSERT_(FITS_IN_8_BITS(c)) \ + ((U8)((c) | 0)) >= 0xc4) /* This is the number of low-order bits a continuation byte in a UTF-8 encoded * sequence contributes to the specification of the code point. In the bit @@ -309,7 +313,8 @@ C<cp> is Unicode if above 255; otherwise is platform-native. * problematic in some contexts. This allows code that needs to check for * those to to quickly exclude the vast majority of code points it will * encounter */ -#define isUTF8_POSSIBLY_PROBLEMATIC(c) ((U8) c >= 0xED) +#define isUTF8_POSSIBLY_PROBLEMATIC(c) (__ASSERT_(FITS_IN_8_BITS(c)) \ + (U8) c >= 0xED) /* A helper macro for isUTF8_CHAR, so use that one instead of this. This was * generated by regen/regcharclass.pl, and then moved here. Then it was @@ -529,7 +534,8 @@ encoded as UTF-8. C<cp> is a native (ASCII or EBCDIC) code point if less than * that this is asymmetric on EBCDIC platforms, in that the 'new' parameter is * the UTF-EBCDIC byte, whereas the 'old' parameter is a Unicode (not EBCDIC) * code point in process of being generated */ -#define UTF8_ACCUMULATE(old, new) (((old) << UTF_ACCUMULATION_SHIFT) \ +#define UTF8_ACCUMULATE(old, new) (__ASSERT_(FITS_IN_8_BITS(new)) \ + ((old) << UTF_ACCUMULATION_SHIFT) \ | ((NATIVE_UTF8_TO_I8((U8)new)) \ & UTF_CONTINUATION_MASK)) @@ -571,8 +577,10 @@ encoded as UTF-8. C<cp> is a native (ASCII or EBCDIC) code point if less than * Note that the result can be larger than 255 if the input character is not * downgradable */ #define TWO_BYTE_UTF8_TO_NATIVE(HI, LO) \ - ( __ASSERT_(PL_utf8skip[HI] == 2) \ - __ASSERT_(UTF8_IS_CONTINUATION(LO)) \ + (__ASSERT_(FITS_IN_8_BITS(HI)) \ + __ASSERT_(FITS_IN_8_BITS(LO)) \ + __ASSERT_(PL_utf8skip[HI] == 2) \ + __ASSERT_(UTF8_IS_CONTINUATION(LO)) \ UNI_TO_NATIVE(UTF8_ACCUMULATE((NATIVE_UTF8_TO_I8(HI) & UTF_START_MASK(2)), \ (LO)))) -- Perl5 Master Repository