[perl.git] branch blead, updated. v5.27.0-49-g53fdf12aed

Karl Williamson Thu, 01 Jun 2017 06:09:01 -0700

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/53fdf12aede2fd58b0c5ee236a58025184f49def?hp=ca31f56c9ea43ef6c05c38da5542fb95a322f2c1>


- Log -----------------------------------------------------------------
commit 53fdf12aede2fd58b0c5ee236a58025184f49def
Author: Karl Williamson <k...@cpan.org>
Date:   Tue May 30 21:19:20 2017 -0600

    Relax fatal circumstances of unescaped '{'
    
    After the 5.26.0 code freeze, it came out that an application that many
    others depend on, GNU Autoconf, has an unescaped '{' in it.  Commit
    7335cb814c19345052a23bc4462c701ce734e6c5 created a kludge that was
    minimal, and designed to get just that one application to work.
    
    I originally proposed a less kludgy patch that was applicable across a
    larger set of applications.  The proposed patch didn't fatalize uses
    of unesacped '{' where we don't anticipate using it for something other
    than its literal self.  That approach worked for Autoconf, but also far
    more instances, but was more complicated, and was rejected as being too
    risky during code freeze.
    
    Now this commit implements my original suggestion.  I am putting it in
    now, to let it soak in blead, in case something else surfaces besides
    Autoconf, that we need to work around.  By having experience with the
    patch live, we can be more confident about using it, if necessary, in a
    dot release.

M       embed.fnc
M       embed.h
M       pod/perldelta.pod
M       proto.h
M       regcomp.c
M       t/re/reg_mesg.t

commit d30277c7df69b4aca40edeb2ae3bf9bb529b01e8
Author: Karl Williamson <k...@cpan.org>
Date:   Wed May 31 13:08:33 2017 -0600

    t/re/reg_mesg.t: Add override of warning default on/off
    
    This .t needs an overhaul to more cleanly accommodate the extra tasks it
    has been given over the years.  But until then, this is a minimal
    enhancement that will be useful in the commit after this one.
    
    This adds the ability to specify that a particular pattern being tested
    should generate a message which is raised by default vs one that isn't.
    The messages are currently grouped in categories whose default is
    determined by the category itself.  This commit avoids having to create
    a new category when a message comes along that doesn't quite fit into
    the existing ones.

M       t/re/reg_mesg.t

commit 9b5d391f84ebc32466959f6f8f90891b3930c4b9
Author: Karl Williamson <k...@cpan.org>
Date:   Wed May 31 13:44:20 2017 -0600

    regcomp.c: Don't set variable within an 'if'
    
    Sometimes it is convenient/and or necessary to do an assignment within a
    clause of an 'if', but it adds a little cognitive load.  In this case,
    it's entirely unnecessary.  This patch changes to do the assignment
    before the 'if'.

M       regcomp.c

commit e613617cd28e5ec6a6555339c1300e9fbdbde425
Author: Karl Williamson <k...@cpan.org>
Date:   Tue May 23 20:54:06 2017 -0600

    perlguts: Add some C<>

M       pod/perlguts.pod

commit fc2ecde5b62c6497fe698bfcc9a7e299db105832
Author: Karl Williamson <k...@cpan.org>
Date:   Tue May 23 20:53:11 2017 -0600

    charnames: Clarify comment

M       lib/_charnames.pm

commit 754e15cfc175f0e2e1299d27fd387a0d868c2764
Author: Karl Williamson <k...@cpan.org>
Date:   Tue May 23 20:51:56 2017 -0600

    charnames: Remove obsolete pod about NBSP
    
    This is illegal since 5.26, and the text should have been removed then,
    but was overlooked.

M       lib/_charnames.pm
M       lib/charnames.pm

commit a1c583cae893f7b865a4436117411a5c84ed09c3
Author: Karl Williamson <k...@cpan.org>
Date:   Tue May 23 20:50:44 2017 -0600

    perlmodinstall: Make a link for http text

M       pod/perlmodinstall.pod

commit a6951642ede4abe605dcf0e94b74948e0a60a56b
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Apr 26 10:29:58 2017 -0600

    utf8.h: Add assertions for macros that take chars
    
    This is inspired by [perl #131190].  The UTF-8 macros whose parameters
    are characters now have assertions that verify they are not being called
    with something that won't fit in a char.  These assertions should be
    getting optimized out if the input type is a char or U8.

M       utf8.h

commit a5ba252751fc7fd7b9d43d0ad4491eb68a14a4a6
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Apr 20 08:33:42 2017 -0600

    Change formal parameter for newSVpvn
    
    This fixes a discrepancy in perlapi.  See
    http://nntp.perl.org/group/perl.perl5.porters/243384

M       embed.fnc
M       proto.h

commit a25f3052bac2e69bc82d661cc97662078e096e3f
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Apr 11 11:37:23 2017 -0600

    AUTHORS: Update Jim Shneider's email

M       AUTHORS
M       Porting/checkAUTHORS.pl

commit 06b22f95a86a59966b460870bdffe91363ea4822
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Apr 11 14:11:40 2017 -0600

    t/op/fork.t: Don't output shell warning
    
    If the shell doesn't support 'ulimit -u', it can cause unexpected
    warnings that can cause the tests to fail.  This happens on s/390.

M       t/op/fork.t

commit 20d1d40b2ce87fc39895935ec95221e8b9e36392
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Apr 11 11:11:24 2017 -0600

    APItest/numeric.xs: Fix uninit error
    
    valgrind shows that a variable could be used unininitialized.

M       ext/XS-APItest/APItest.pm
M       ext/XS-APItest/numeric.xs

commit 7d24ce13858cd4a442f7b74228dfbbe808147063
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Mar 21 21:52:33 2017 -0600

    t/harness: Run APItests in parallel
    
    This commit changes these tests to be run like the tests in t/lib, in
    parallel with each other, when available.  This is the longest running
    directory, and prior to this commit, on many-core systems it can be the
    final thing chugging along, a test at-a-time, while the other cores are
    idle.

M       t/harness

commit 704b574dcdee24984d4fe89fd31a01cb6fc2ac72
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Mar 21 21:41:34 2017 -0600

    t/harness: Remove useless sort
    
    Instead move its effect to the sort that overrides the first one.  This
    is because the tests are executed in the order of the rules to
    TAP::Harness, not in the order of the test list.

M       t/harness

commit eb7486de5e577623ae719baf1b442bde992cb7c7
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Jan 30 14:52:46 2017 -0700

    Slightly change -Dr output of regex ANYOF nodes
    
    This changes to precede each literal '[' in a [...] class with a
    backslash to better make is standout as a literal

M       regcomp.c
M       t/re/anyof.t
-----------------------------------------------------------------------

Summary of changes:
 AUTHORS                   |  2 +-
 Porting/checkAUTHORS.pl   |  1 +
 embed.fnc                 |  3 +-
 embed.h                   |  1 +
 ext/XS-APItest/APItest.pm |  2 +-
 ext/XS-APItest/numeric.xs |  2 +-
 lib/_charnames.pm         |  6 +--
 lib/charnames.pm          | 10 ++---
 pod/perldelta.pod         | 17 +++++++++
 pod/perlguts.pod          |  4 +-
 pod/perlmodinstall.pod    |  2 +-
 proto.h                   |  7 +++-
 regcomp.c                 | 97 +++++++++++++++++++++++++++++++++++++----------
 t/harness                 |  8 ++--
 t/op/fork.t               |  2 +-
 t/re/anyof.t              |  4 +-
 t/re/reg_mesg.t           | 28 ++++++++++----
 utf8.h                    | 28 +++++++++-----
 18 files changed, 162 insertions(+), 62 deletions(-)

diff --git a/AUTHORS b/AUTHORS
index 35e3068d76..aa0a22a2fe 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -594,7 +594,7 @@ Jim Cromie                  <jcro...@cpan.org>
 Jim Meyering                   <meyer...@asic.sc.ti.com>
 Jim Miner                      <j...@winternet.com>
 Jim Richardson
-Jim Schneider                  <jschn...@netilla.com>
+Jim Schneider                  <james.schnei...@db.com>
 Jirka HruÅ¡ka                  <ji...@fud.cz>
 Joachim Huober
 Joaquin Ferrero                        <explo...@joaquinferrero.com>
diff --git a/Porting/checkAUTHORS.pl b/Porting/checkAUTHORS.pl
index 9ee55f7724..57582d70bc 100755
--- a/Porting/checkAUTHORS.pl
+++ b/Porting/checkAUTHORS.pl
@@ -692,6 +692,7 @@ jpeacock\100messagesystems.com          
john.peacock\100havurah-software.org
 +                                       jpeacock\100jpeacock-hp.doesntexist.org
 +                                       jpeacock\100cpan.org
 +                                       jpeacock\100rowman.com
+james.schneider\100db.com               jschneid\100netilla.com
 jpl.jpl\100gmail.com                    jpl\100research.att.com
 jql\100accessone.com                    jql\100jql.accessone.com
 jsm28\100hermes.cam.ac.uk               jsm28\100cam.ac.uk
diff --git a/embed.fnc b/embed.fnc
index d0c9953273..bb107fa91d 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1138,7 +1138,7 @@ ApdR      |SV*    |newSViv        |const IV i
 ApdR   |SV*    |newSVuv        |const UV u
 ApdR   |SV*    |newSVnv        |const NV n
 ApdR   |SV*    |newSVpv        |NULLOK const char *const s|const STRLEN len
-ApdR   |SV*    |newSVpvn       |NULLOK const char *const s|const STRLEN len
+ApdR   |SV*    |newSVpvn       |NULLOK const char *const buffer|const STRLEN 
len
 ApdR   |SV*    |newSVpvn_flags |NULLOK const char *const s|const STRLEN 
len|const U32 flags
 ApdR   |SV*    |newSVhek       |NULLOK const HEK *const hek
 ApdR   |SV*    |newSVpvn_share |NULLOK const char* s|I32 len|U32 hash
@@ -1655,6 +1655,7 @@ EMRs      |SV*    |_make_exactf_invlist   |NN 
RExC_state_t *pRExC_state \
                                        |NN regnode *node
 EsMR   |SV*    |invlist_contents|NN SV* const invlist              \
                                 |const bool traditional_style
+EsRn   |bool   |new_regcurly   |NN const char *s|NN const char *e
 #endif
 #if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_UTF8_C)
 EXmM   |void   |_invlist_intersection  |NN SV* const a|NN SV* const b|NN SV** i
diff --git a/embed.h b/embed.h
index 2fa77c6fda..54b8fbddc8 100644
--- a/embed.h
+++ b/embed.h
@@ -1042,6 +1042,7 @@
 #define is_ssc_worth_it                S_is_ssc_worth_it
 #define join_exact(a,b,c,d,e,f,g)      S_join_exact(aTHX_ a,b,c,d,e,f,g)
 #define make_trie(a,b,c,d,e,f,g,h)     S_make_trie(aTHX_ a,b,c,d,e,f,g,h)
+#define new_regcurly           S_new_regcurly
 #define nextchar(a)            S_nextchar(aTHX_ a)
 #define output_or_return_posix_warnings(a,b,c) 
S_output_or_return_posix_warnings(aTHX_ a,b,c)
 #define parse_lparen_question_flags(a) S_parse_lparen_question_flags(aTHX_ a)
diff --git a/ext/XS-APItest/APItest.pm b/ext/XS-APItest/APItest.pm
index 1be011660b..a5dd1332f6 100644
--- a/ext/XS-APItest/APItest.pm
+++ b/ext/XS-APItest/APItest.pm
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use Carp;
 
-our $VERSION = '0.88';
+our $VERSION = '0.89';
 
 require XSLoader;
 
diff --git a/ext/XS-APItest/numeric.xs b/ext/XS-APItest/numeric.xs
index 0ce9e080ff..fac81ba3e0 100644
--- a/ext/XS-APItest/numeric.xs
+++ b/ext/XS-APItest/numeric.xs
@@ -40,7 +40,7 @@ grok_atoUV(number, endsv)
        const char *pv = SvPV(number, len);
        UV value = 0xdeadbeef;
        bool result;
-       const char* endptr;
+       const char* endptr = NULL;
     PPCODE:
        EXTEND(SP,2);
        if (endsv == &PL_sv_undef) {
diff --git a/lib/_charnames.pm b/lib/_charnames.pm
index 50fdd85c5d..c6169d16f8 100644
--- a/lib/_charnames.pm
+++ b/lib/_charnames.pm
@@ -6,7 +6,7 @@
 package _charnames;
 use strict;
 use warnings;
-our $VERSION = '1.44';
+our $VERSION = '1.45';
 use unicore::Name;    # mktables-generated algorithmically-defined names
 
 use bytes ();          # for $bytes::hint_bits
@@ -22,8 +22,8 @@ $Carp::Internal{ (__PACKAGE__) } = 1;
 # The official names with their code points are stored in a table in
 # lib/unicore/Name.pl which is read in as a large string (almost 3/4 Mb in
 # Unicode 6.0).  Each code point/name combination is separated by a \n in the
-# string.  (Some of the CJK and the Hangul syllable names are determined
-# instead algorithmically via subroutines stored instead in
+# string.  (Some of the CJK and the Hangul syllable names are instead
+# determined algorithmically via subroutines stored instead in
 # lib/unicore/Name.pm).  Because of the large size of this table, it isn't
 # converted into hashes for faster lookup.
 #
diff --git a/lib/charnames.pm b/lib/charnames.pm
index a0f3227f4d..e22c71913c 100644
--- a/lib/charnames.pm
+++ b/lib/charnames.pm
@@ -1,7 +1,7 @@
 package charnames;
 use strict;
 use warnings;
-our $VERSION = '1.44';
+our $VERSION = '1.45';
 use unicore::Name;    # mktables-generated algorithmically-defined names
 use _charnames ();    # The submodule for this where most of the work gets done
 
@@ -278,11 +278,9 @@ mean C<"B">, etc.
 
 Aliases must begin with a character that is alphabetic.  After that, each may
 contain any combination of word (C<\w>) characters, SPACE (U+0020),
-HYPHEN-MINUS (U+002D), LEFT PARENTHESIS (U+0028), RIGHT PARENTHESIS (U+0029),
-and NO-BREAK SPACE (U+00A0).  These last three should never have been allowed
-in names, and are retained for backwards compatibility only; NO-BREAK SPACE IS
-currently deprecated and scheduled for removal in Perl v5.26; the other two
-may also be
+HYPHEN-MINUS (U+002D), LEFT PARENTHESIS (U+0028), and RIGHT PARENTHESIS
+(U+0029).  These last two should never have been allowed
+in names, and are retained for backwards compatibility only, and may be
 deprecated and removed in future releases of Perl, so don't use them for new
 names.  (More precisely, the first character of a name you specify must be
 something that matches all of C<\p{ID_Start}>, C<\p{Alphabetic}>, and
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index 13b9cdcd18..584474d729 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -59,6 +59,23 @@ respectively.
 
 XXX Any deprecated features, syntax, modules etc. should be listed here.
 
+=head2 Some uses of unescaped C<"{"> are no longer fatal
+
+Perl 5.26.0 fatalized some uses of an unescaped left brace, but an
+exception was made at the last minute, specifically crafted to be a
+minimal change to allow GNU Autoconf to work.  This code is heavily
+depended upon, and continues to use the deprecated usage.  Its use of an
+unescaped left brace is one where we have no intention of repurposing
+C<"{"> to be something other than itself.
+
+That exception is now generalized to include various other such cases
+where the C<"{"> will not be repurposed.  This is to get real experience
+with this more complicated change now, in case we need to issue a dot
+release if we find other things like Autoconf that are important to work
+around.
+
+Note that these uses continue to raise a deprecation message.
+
 =head2 Module removals
 
 XXX Remove this section if inapplicable.
diff --git a/pod/perlguts.pod b/pod/perlguts.pod
index 2da946ce46..56eee93164 100644
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@@ -56,7 +56,7 @@ The seven routines are:
     SV*  newSVpvf(const char*, ...);
     SV*  newSVsv(SV*);
 
-C<STRLEN> is an integer type (Size_t, usually defined as size_t in
+C<STRLEN> is an integer type (C<Size_t>, usually defined as C<size_t> in
 F<config.h>) guaranteed to be large enough to represent the size of
 any string that perl can handle.
 
@@ -2675,7 +2675,7 @@ whatever the compiler has.
 If you are printing addresses of pointers, use UVxf combined
 with PTR2UV(), do not use %lx or %p.
 
-=head2 Formatted Printing of Size_t and SSize_t
+=head2 Formatted Printing of C<Size_t> and C<SSize_t>
 
 The most general way to do this is to cast them to a UV or IV, and
 print as in the
diff --git a/pod/perlmodinstall.pod b/pod/perlmodinstall.pod
index 39c410df30..86f9cf948a 100644
--- a/pod/perlmodinstall.pod
+++ b/pod/perlmodinstall.pod
@@ -416,7 +416,7 @@ don't send me mail asking for help on how to install your 
modules.
 There are too many modules, and too few Orwants, for me to be able to
 answer or even acknowledge all your questions.  Contact the module
 author instead, ask someone familiar with Perl on your operating
-system, or if all else fails, file a ticket at http://rt.cpan.org/.
+system, or if all else fails, file a ticket at L<http://rt.cpan.org/>.
 
 =head1 AUTHOR
 
diff --git a/proto.h b/proto.h
index 8307c6d6a9..f60564370a 100644
--- a/proto.h
+++ b/proto.h
@@ -2313,7 +2313,7 @@ PERL_CALLCONV SV* Perl_newSVpvf(pTHX_ const char *const 
pat, ...)
 #define PERL_ARGS_ASSERT_NEWSVPVF      \
        assert(pat)
 
-PERL_CALLCONV SV*      Perl_newSVpvn(pTHX_ const char *const s, const STRLEN 
len)
+PERL_CALLCONV SV*      Perl_newSVpvn(pTHX_ const char *const buffer, const 
STRLEN len)
                        __attribute__warn_unused_result__;
 
 PERL_CALLCONV SV*      Perl_newSVpvn_flags(pTHX_ const char *const s, const 
STRLEN len, const U32 flags)
@@ -5138,6 +5138,11 @@ STATIC U32       S_join_exact(pTHX_ RExC_state_t 
*pRExC_state, regnode *scan, UV *min_
 STATIC I32     S_make_trie(pTHX_ RExC_state_t *pRExC_state, regnode 
*startbranch, regnode *first, regnode *last, regnode *tail, U32 word_count, U32 
flags, U32 depth);
 #define PERL_ARGS_ASSERT_MAKE_TRIE     \
        assert(pRExC_state); assert(startbranch); assert(first); assert(last); 
assert(tail)
+STATIC bool    S_new_regcurly(const char *s, const char *e)
+                       __attribute__warn_unused_result__;
+#define PERL_ARGS_ASSERT_NEW_REGCURLY  \
+       assert(s); assert(e)
+
 STATIC void    S_nextchar(pTHX_ RExC_state_t *pRExC_state);
 #define PERL_ARGS_ASSERT_NEXTCHAR      \
        assert(pRExC_state)
diff --git a/regcomp.c b/regcomp.c
index f9d56c1813..6fc3716951 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -119,7 +119,7 @@ typedef struct scan_frame {
 
 /* Certain characters are output as a sequence with the first being a
  * backslash. */
-#define isBACKSLASHED_PUNCT(c)  strchr("-]\\^", c)
+#define isBACKSLASHED_PUNCT(c)  strchr("-[]\\^", c)
 
 
 struct RExC_state_t {
@@ -12398,6 +12398,52 @@ S_alloc_maybe_populate_EXACT(pTHX_ RExC_state_t 
*pRExC_state,
     }
 }
 
+STATIC bool
+S_new_regcurly(const char *s, const char *e)
+{
+    /* This is a temporary function designed to match the most lenient form of
+     * a {m,n} quantifier we ever envision, with either number omitted, and
+     * spaces anywhere between/before/after them.
+     *
+     * If this function fails, then the string it matches is very unlikely to
+     * ever be considered a valid quantifier, so we can allow the '{' that
+     * begins it to be considered as a literal */
+
+    bool has_min = FALSE;
+    bool has_max = FALSE;
+
+    PERL_ARGS_ASSERT_NEW_REGCURLY;
+
+    if (s >= e || *s++ != '{')
+       return FALSE;
+
+    while (s < e && isSPACE(*s)) {
+        s++;
+    }
+    while (s < e && isDIGIT(*s)) {
+        has_min = TRUE;
+        s++;
+    }
+    while (s < e && isSPACE(*s)) {
+        s++;
+    }
+
+    if (*s == ',') {
+       s++;
+        while (s < e && isSPACE(*s)) {
+            s++;
+        }
+        while (s < e && isDIGIT(*s)) {
+            has_max = TRUE;
+            s++;
+        }
+        while (s < e && isSPACE(*s)) {
+            s++;
+        }
+    }
+
+    return s < e && *s == '}' && (has_min || has_max);
+}
 
 /* Parse backref decimal value, unless it's too big to sensibly be a backref,
  * in which case return I32_MAX (rather than possibly 32-bit wrapping) */
@@ -12832,6 +12878,12 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth)
             /* FALLTHROUGH */
 
           finish_meta_pat:
+            if (   UCHARAT(RExC_parse + 1) == '{'
+                && UNLIKELY(! new_regcurly(RExC_parse + 1, RExC_end)))
+            {
+                RExC_parse += 2;
+                vFAIL("Unescaped left brace in regex is illegal here");
+            }
            nextchar(pRExC_state);
             Set_Node_Length(ret, 2); /* MJD */
            break;
@@ -13381,22 +13433,25 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth)
                    } /* End of switch on '\' */
                    break;
                case '{':
-                   /* Currently we don't care if the lbrace is at the start
-                    * of a construct.  This catches it in the middle of a
-                    * literal string, or when it's the first thing after
-                    * something like "\b" */
-                   if (len || (p > RExC_start && isALPHA_A(*(p -1)))) {
-
-                        /* GNU Autoconf is depended on by a lot of code, and
-                         * can't seem to release a new version that avoids the
-                         * deprecation now made fatal.  (A commit to do so has
-                         * been in its repository since early 2013; only one
-                         * pattern is affected.)  As a work-around, don't
-                         * fatalize this if the pattern being compiled is the
-                         * precise one that trips up Autoconf.  See [perl
-                         * #130497] for more details. */
-                        if (memNEs(RExC_start, RExC_end - RExC_start,
-                                   "\\${[^\\}]*}"))
+                    /* Currently we allow an lbrace at the start of a construct
+                     * without raising a warning.  This is because we think we
+                     * will never want such a brace to be meant to be other
+                     * than taken literally. */
+                   if (len || (p > RExC_start && isALPHA_A(*(p - 1)))) {
+
+                        /* But, we raise a fatal warning otherwise, as the
+                         * deprecation cycle has come and gone.  Except that it
+                         * turns out that some heavily-relied on upstream
+                         * software, notably GNU Autoconf, have failed to fix
+                         * their uses.  For these, don't make it fatal unless
+                         * we anticipate using the '{' for something else.
+                         * This happens after any alpha, and for a looser {m,n}
+                         * quantifier specification */
+                        if (      RExC_strict
+                            || (  p > parse_start + 1
+                                && isALPHA_A(*(p - 1))
+                                && *(p - 2) == '\\')
+                            || new_regcurly(p, RExC_end))
                         {
                             RExC_parse = p + 1;
                             vFAIL("Unescaped left brace in regex is "
@@ -13446,10 +13501,10 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth)
                  * this character again next time through, when it will be the
                  * only thing in its new node */
 
-                if ((next_is_quantifier = (   LIKELY(p < RExC_end)
-                                           && UNLIKELY(ISMULT2(p))))
-                    && LIKELY(len))
-               {
+                next_is_quantifier =    LIKELY(p < RExC_end)
+                                     && UNLIKELY(ISMULT2(p));
+
+                if (next_is_quantifier && LIKELY(len)) {
                     p = oldp;
                     goto loopdone;
                 }
diff --git a/t/harness b/t/harness
index b46582ddd9..3fc3ae082e 100644
--- a/t/harness
+++ b/t/harness
@@ -145,7 +145,7 @@ if (@ARGV) {
        push @seq, $next;
 
        my @last;
-       push @last,  sort { lc $a cmp lc $b }
+       push @last,
            _tests_from_manifest($Config{extensions}, 
$Config{known_extensions});
        my %times;
        if ($state) {
@@ -163,7 +163,9 @@ if (@ARGV) {
                s,\\,/,g; # canonicalize path
            };
            # Treat every file matching lib/*.t as a "directory"
-           m!\A(\.\./lib/[^/]+\.t\z|.*[/])! or die "'$_'";
+           m! \A ( \.\. / (?: lib | ext/XS-APItest/t )
+                         / [^/]+ \.t \z | .* [/] ) !x
+                or die "'$_'";
            push @{$dir{$1}}, $_;
            $total_time{$1} += $times{$_} || 0;
        }
@@ -174,7 +176,7 @@ if (@ARGV) {
        # sequentially.
        push @seq, { par => [ map { s!/$!/*!; { seq => $_ } } sort {
            # Directories, ordered by total time descending then name ascending
-           $total_time{$b} <=> $total_time{$a} || $a cmp $b
+           $total_time{$b} <=> $total_time{$a} || lc $a cmp lc $b
        } keys %dir ] };
 
        $rules = { seq => \@seq };
diff --git a/t/op/fork.t b/t/op/fork.t
index b69a929fe0..be3125d673 100644
--- a/t/op/fork.t
+++ b/t/op/fork.t
@@ -22,7 +22,7 @@ SKIP: {
     skip "This test can only be run under bash or zsh"
         unless $shell =~ m{/(?:ba|z)sh$};
     my $probe = qx{
-        $shell -c 'ulimit -u 1 2>&1 && echo good'
+        $shell -c 'ulimit -u 1 2>/dev/null && echo good'
     };
     chomp $probe;
     skip "Can't set ulimit -u on this system: $probe"
diff --git a/t/re/anyof.t b/t/re/anyof.t
index 12ae043fc3..84f2881e0d 100644
--- a/t/re/anyof.t
+++ b/t/re/anyof.t
@@ -29,10 +29,10 @@ BEGIN {
 # skipped and not skipped.
 
 my @tests = (
-    '[[{]' => 'ANYOF[[\{]',
+    '[[{]' => 'ANYOF[\[\{]',
     '[^\S ]' => 'ANYOFD[\t\n\x0B\f\r{utf8}\x85\xA0][1680 2000-200A 2028-2029 
202F 205F 3000]',
     '[^\n\r]' => 'ANYOF[^\n\r][0100-INFINITY]',
-    '[^\/\|,\$\%%\@\ \%"\<\>\:\#\&\*\{\}\[\]\(\)]' => 'ANYOF[^ 
"#$%&()*,/:<>@[\]\{|\}][0100-INFINITY]',
+    '[^\/\|,\$\%%\@\ \%"\<\>\:\#\&\*\{\}\[\]\(\)]' => 'ANYOF[^ 
"#$%&()*,/:<>@\[\]\{|\}][0100-INFINITY]',
     '[^[:^print:][:^ascii:]]' => 'ANYOF[\x20-\x7E]',
     '[ [:blank:]]' => 'ANYOFD[\t {utf8}\xA0][1680 2000-200A 202F 205F 3000]',
     '[_[:^blank:]]' => 'ANYOFD[^\t {utf8}\xA0][0100-167F 1681-1FFF 200B-202E 
2030-205E 2060-2FFF 3001-INFINITY]',
diff --git a/t/re/reg_mesg.t b/t/re/reg_mesg.t
index b80b692e62..090eccbbb4 100644
--- a/t/re/reg_mesg.t
+++ b/t/re/reg_mesg.t
@@ -106,9 +106,14 @@ my $high_mixed_digit = ('A' lt '0') ? '0' : 'A';
 my $colon_hex = sprintf "%02X", ord(":");
 my $tab_hex = sprintf "%02X", ord("\t");
 
-##
-## Key-value pairs of code/error of code that should have fatal errors.
-##
+# Key-value pairs of strings eval'd as patterns => warn/error messages that
+# they should generate.  In some cases, the value is an array of multiple
+# messages.  Some groups have the message(s) be default on; others, default
+# off.  This can be overridden on an individual key basis by preceding the
+# pattern string with either 'default_on' or 'default_off'
+#
+# The first set are those that should be fatal errors.
+
 my @death =
 (
  '/[[=foo=]]/' => 'POSIX syntax [= =] is reserved for future extensions {#} 
m/[[=foo=]{#}]/',
@@ -284,8 +289,6 @@ my @death =
  '/\w{/' => 'Unescaped left brace in regex is illegal here {#} m/\w{{#}/',
  '/\q{/' => 'Unescaped left brace in regex is illegal here {#} m/\q{{#}/',
  '/\A{/' => 'Unescaped left brace in regex is illegal here {#} m/\A{{#}/',
- '/:{4,a}/' => 'Unescaped left brace in regex is illegal here {#} 
m/:{{#}4,a}/',
- '/xa{3\,4}y/' => 'Unescaped left brace in regex is illegal here {#} 
m/xa{{#}3\,4}y/',
  '/abc/xix' => "",
  '/(?xmsixp:abc)/' => "",
  '/(?xmsixp)abc/' => "",
@@ -372,6 +375,12 @@ my @death_only_under_strict = (
                                      => 'False [] range "[:digit:]-" {#} 
m/[[:digit:]-{#}[:alpha:]]\x{100}/',
     '/[a\zb]\x{100}/' => 'Unrecognized escape \z in character class passed 
through {#} m/[a\z{#}b]\x{100}/',
                       => 'Unrecognized escape \z in character class {#} 
m/[a\z{#}b]\x{100}/',
+    'default_on/:{4,a}/'     => 'Unescaped left brace in regex is deprecated 
here (and will be fatal in Perl 5.30), passed through {#} m/:{{#}4,a}/',
+                             => 'Unescaped left brace in regex is illegal here 
{#} m/:{{#}4,a}/',
+    'default_on/xa{3\,4}y/'  => 'Unescaped left brace in regex is deprecated 
here (and will be fatal in Perl 5.30), passed through {#} m/xa{{#}3\,4}y/',
+                             => 'Unescaped left brace in regex is illegal here 
{#} m/xa{{#}3\,4}y/',
+  'default_on/\\${[^\\}]*}/' => 'Unescaped left brace in regex is deprecated 
here (and will be fatal in Perl 5.30), passed through {#} m/\\${{#}[^\\}]*}/',
+                             => 'Unescaped left brace in regex is illegal here 
{#} m/\\${{#}[^\\}]*}/',
 );
 
 # These need the character 'ã' as a marker for mark_as_utf8()
@@ -652,7 +661,6 @@ my @deprecated = (
  '/.{/'         => 'Unescaped left brace in regex is deprecated here (and will 
be fatal in Perl 5.30), passed through {#} m/.{{#}/',
  '/[x]{/'       => 'Unescaped left brace in regex is deprecated here (and will 
be fatal in Perl 5.30), passed through {#} m/[x]{{#}/',
  '/\p{Latin}{/' => 'Unescaped left brace in regex is deprecated here (and will 
be fatal in Perl 5.30), passed through {#} m/\p{Latin}{{#}/',
- '/\\${[^\\}]*}/' => 'Unescaped left brace in regex is deprecated here (and 
will be fatal in Perl 5.30), passed through {#} m/\\${{#}[^\\}]*}/',
 );
 
 for my $strict ("", "use re 'strict';") {
@@ -673,7 +681,7 @@ for my $strict ("", "use re 'strict';") {
         }
     }
     for (my $i = 0; $i < @death; $i += 2) {
-        my $regex = $death[$i];
+        my $regex = $death[$i] =~ s/ default_ (on | off) //rx;
         my $expect = fixup_expect($death[$i+1], $strict);
         no warnings 'experimental::regex_sets';
         no warnings 'experimental::re_strict';
@@ -736,7 +744,11 @@ for my $strict ("",  "no warnings 
'experimental::re_strict'; use re 'strict';")
             $default_on = 1;
         }
         for (my $i = 0; $i < @$ref; $i += 2) {
+            my $this_default_on = $default_on;
             my $regex = $ref->[$i];
+            if ($regex =~ s/ default_ (on | off) //x) {
+                $this_default_on = $1 eq 'on';
+            }
             my @expect = fixup_expect($ref->[$i+1], $strict);
 
             # A length-1 array with an empty warning means no warning gets
@@ -790,7 +802,7 @@ for my $strict ("",  "no warnings 
'experimental::re_strict'; use re 'strict';")
                                                         eval "$strict $regex" 
});
                         # Warning should be on as well if is testing
                         # '(?[...])' which turns on strict
-                        if ($default_on || grep { $_ =~ /\Q(?[/ } @expect ) {
+                        if ($this_default_on || grep { $_ =~ /\Q(?[/ } @expect 
) {
                            ok @warns > 0, "... and the warning is on by 
default";
                         }
                         else {
diff --git a/utf8.h b/utf8.h
index affa2d67f5..b2e338a80a 100644
--- a/utf8.h
+++ b/utf8.h
@@ -266,13 +266,15 @@ C<cp> is Unicode if above 255; otherwise is 
platform-native.
 /* Misleadingly named: is the UTF8-encoded byte 'c' part of a variant sequence
  * in UTF-8?  This is the inverse of UTF8_IS_INVARIANT.  The |0 makes sure this
  * isn't mistakenly called with a ptr argument */
-#define UTF8_IS_CONTINUED(c)        (((U8)((c) | 0)) &  UTF_CONTINUATION_MARK)
+#define UTF8_IS_CONTINUED(c)  (__ASSERT_(FITS_IN_8_BITS(c))                 \
+                               ((U8)((c) | 0)) &  UTF_CONTINUATION_MARK)
 
 /* Is the byte 'c' the first byte of a multi-byte UTF8-8 encoded sequence?
  * This doesn't catch invariants (they are single-byte).  It also excludes the
  * illegal overlong sequences that begin with C0 and C1.  The |0 makes sure
  * this isn't mistakenly called with a ptr argument */
-#define UTF8_IS_START(c)            (((U8)((c) | 0)) >= 0xc2)
+#define UTF8_IS_START(c)      (__ASSERT_(FITS_IN_8_BITS(c))                 \
+                               ((U8)((c) | 0)) >= 0xc2)
 
 /* For use in UTF8_IS_CONTINUATION() below */
 #define UTF_IS_CONTINUATION_MASK    0xC0
@@ -280,20 +282,22 @@ C<cp> is Unicode if above 255; otherwise is 
platform-native.
 /* Is the byte 'c' part of a multi-byte UTF8-8 encoded sequence, and not the
  * first byte thereof?  The |0 makes sure this isn't mistakenly called with a
  * ptr argument */
-#define UTF8_IS_CONTINUATION(c)                                             \
-    ((((U8)((c) | 0)) & UTF_IS_CONTINUATION_MASK) == UTF_CONTINUATION_MARK)
+#define UTF8_IS_CONTINUATION(c)     (__ASSERT_(FITS_IN_8_BITS(c))           \
+     (((U8)((c) | 0)) & UTF_IS_CONTINUATION_MASK) == UTF_CONTINUATION_MARK)
 
 /* Is the UTF8-encoded byte 'c' the first byte of a two byte sequence?  Use
  * UTF8_IS_NEXT_CHAR_DOWNGRADEABLE() instead if the input isn't known to
  * be well-formed.  Masking with 0xfe allows the low bit to be 0 or 1; thus
  * this matches 0xc[23].  The |0 makes sure this isn't mistakenly called with a
  * ptr argument */
-#define UTF8_IS_DOWNGRADEABLE_START(c) ((((U8)((c) | 0)) & 0xfe) == 0xc2)
+#define UTF8_IS_DOWNGRADEABLE_START(c) (__ASSERT_(FITS_IN_8_BITS(c))       \
+                                         (((U8)((c) | 0)) & 0xfe) == 0xc2)
 
 /* Is the UTF8-encoded byte 'c' the first byte of a sequence of bytes that
  * represent a code point > 255?  The |0 makes sure this isn't mistakenly
  * called with a ptr argument */
-#define UTF8_IS_ABOVE_LATIN1(c)     (((U8)((c) | 0)) >= 0xc4)
+#define UTF8_IS_ABOVE_LATIN1(c)     (__ASSERT_(FITS_IN_8_BITS(c))           \
+                                     ((U8)((c) | 0)) >= 0xc4)
 
 /* This is the number of low-order bits a continuation byte in a UTF-8 encoded
  * sequence contributes to the specification of the code point.  In the bit
@@ -309,7 +313,8 @@ C<cp> is Unicode if above 255; otherwise is platform-native.
  * problematic in some contexts.  This allows code that needs to check for
  * those to to quickly exclude the vast majority of code points it will
  * encounter */
-#define isUTF8_POSSIBLY_PROBLEMATIC(c) ((U8) c >= 0xED)
+#define isUTF8_POSSIBLY_PROBLEMATIC(c) (__ASSERT_(FITS_IN_8_BITS(c))        \
+                                        (U8) c >= 0xED)
 
 /* A helper macro for isUTF8_CHAR, so use that one instead of this.  This was
  * generated by regen/regcharclass.pl, and then moved here.  Then it was
@@ -529,7 +534,8 @@ encoded as UTF-8.  C<cp> is a native (ASCII or EBCDIC) code 
point if less than
  * that this is asymmetric on EBCDIC platforms, in that the 'new' parameter is
  * the UTF-EBCDIC byte, whereas the 'old' parameter is a Unicode (not EBCDIC)
  * code point in process of being generated */
-#define UTF8_ACCUMULATE(old, new) (((old) << UTF_ACCUMULATION_SHIFT)           
\
+#define UTF8_ACCUMULATE(old, new) (__ASSERT_(FITS_IN_8_BITS(new))              
\
+                                   ((old) << UTF_ACCUMULATION_SHIFT)           
\
                                    | ((NATIVE_UTF8_TO_I8((U8)new))             
\
                                        & UTF_CONTINUATION_MASK))
 
@@ -571,8 +577,10 @@ encoded as UTF-8.  C<cp> is a native (ASCII or EBCDIC) 
code point if less than
  * Note that the result can be larger than 255 if the input character is not
  * downgradable */
 #define TWO_BYTE_UTF8_TO_NATIVE(HI, LO) \
-    ( __ASSERT_(PL_utf8skip[HI] == 2)                                          
 \
-      __ASSERT_(UTF8_IS_CONTINUATION(LO))                                      
 \
+    (__ASSERT_(FITS_IN_8_BITS(HI))                                             
 \
+     __ASSERT_(FITS_IN_8_BITS(LO))                                             
 \
+     __ASSERT_(PL_utf8skip[HI] == 2)                                           
 \
+     __ASSERT_(UTF8_IS_CONTINUATION(LO))                                       
 \
      UNI_TO_NATIVE(UTF8_ACCUMULATE((NATIVE_UTF8_TO_I8(HI) & 
UTF_START_MASK(2)), \
                                    (LO))))
 

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.27.0-49-g53fdf12aed

Reply via email to