Author: larry Date: Fri Apr 21 19:18:36 2006 New Revision: 8906 Modified: doc/trunk/design/syn/S02.pod doc/trunk/design/syn/S04.pod doc/trunk/design/syn/S05.pod doc/trunk/design/syn/S06.pod
Log: Finished rule => regex conversion. Modified: doc/trunk/design/syn/S02.pod ============================================================================== --- doc/trunk/design/syn/S02.pod (original) +++ doc/trunk/design/syn/S02.pod Fri Apr 21 19:18:36 2006 @@ -14,7 +14,7 @@ Date: 10 Aug 2004 Last Modified: 21 Apr 2006 Number: 2 - Version: 27 + Version: 28 This document summarizes Apocalypse 2, which covers small-scale lexical items and typological issues. (These Synopses also contain @@ -338,7 +338,7 @@ Built-in object types start with an uppercase letter. This includes immutable types (e.g. C<Int>, C<Num>, C<Complex>, C<Rational>, C<Str>, -C<Bit>, C<Rule>, C<Set>, C<Junction>, C<Code>, B<Block>, C<List>, +C<Bit>, C<Regex>, C<Set>, C<Junction>, C<Code>, B<Block>, C<List>, C<Tuple>), as well as mutable (container) types, such as C<Scalar>, C<Array>, C<Hash>, C<Buf>, C<Routine>, C<Module>, etc. @@ -1179,7 +1179,7 @@ punctuation anywhere a single adverb is acceptible. When used as named arguments, you may put comma between. See S06. -The negated form (C<$!a>) and the sigiled forms (C<:$a>, C<:@a>, +The negated form (C<:!a>) and the sigiled forms (C<:$a>, C<:@a>, C<:%a>) never take an argument and don't care what the next character is. They are considered complete. @@ -1267,10 +1267,11 @@ macro quote:<qX> (*%adverbs) {...} -Note: macro adverbs are automatically evaluated at macro call -time if the adverbs are included in the parse. If the adverbs are -to affect the parsing of the quoted text of the macro, then the text must -be parsed by the body of the macro rather than by an C<is parsed> rule. +Note: macro adverbs are automatically evaluated at macro call time if +the adverbs are included in the parse. If an adverb needs to affect +the parsing of the quoted text of the macro, then an explicit named +parameter may be passed on as a parameter to the C<is parsed> subrule, +or used to select which subrule to invoke. =item * @@ -1457,7 +1458,7 @@ Backslash sequences still interpolate, but there's no longer any C<\v> to mean I<vertical tab>, whatever that is... (C<\v> now match vertical -whitespace in a rule.) +whitespace in a regex.) =item * @@ -1822,11 +1823,11 @@ postfix:<++> $x++ circumfix:<[ ]> [ @x ] postcircumfix:<[ ]> $x[$y] or $x .[$y] - rule_metachar:<,> /,/ - rule_backslash:<w> /\w/ and /\W/ - rule_assertion:<*> /<*stuff>/ - rule_mod_internal:<perl5> m:/ ... :perl5 ... / - rule_mod_external:<nth> m:nth(3)/ ... / + regex_metachar:<,> /,/ + regex_backslash:<w> /\w/ and /\W/ + regex_assertion:<*> /<*stuff>/ + regex_mod_internal:<perl5> m:/ ... :perl5 ... / + regex_mod_external:<nth> m:nth(3)/ ... / trait_verb:<handles> has $.tail handles <wag> trait_auxiliary:<shall> my $x shall conform<TR123> scope_declarator:<has> has $.x; Modified: doc/trunk/design/syn/S04.pod ============================================================================== --- doc/trunk/design/syn/S04.pod (original) +++ doc/trunk/design/syn/S04.pod Fri Apr 21 19:18:36 2006 @@ -12,9 +12,9 @@ Maintainer: Larry Wall <[EMAIL PROTECTED]> Date: 19 Aug 2004 - Last Modified: 15 Apr 2006 + Last Modified: 21 Apr 2006 Number: 4 - Version: 15 + Version: 17 This document summarizes Apocalypse 4, which covers the block and statement syntax of Perl. @@ -752,13 +752,13 @@ Hash Array hash value slice truth match if $_{any(@$x)} Hash any(list) hash key slice existence match if exists $_{any(list)} Hash all(list) hash key slice existence match if exists $_{all(list)} - Hash Rule hash key grep match if any($_.keys) ~~ /$x/ + Hash Regex hash key grep match if any($_.keys) ~~ /$x/ Hash Any hash entry existence match if exists $_{$x} Hash .{Any} hash element truth* match if $_{Any} Hash .<string> hash element truth* match if $_<string> Array Array arrays are identical match if $_ »~~« $x Array any(list) list intersection match if any(@$_) ~~ any(list) - Array Rule array grep match if any(@$_) ~~ /$x/ + Array Regex array grep match if any(@$_) ~~ /$x/ Array Num array contains number match if any($_) == $x Array Str array contains string match if any($_) eq $x Array .[number] array element truth* match if $_[number] @@ -771,7 +771,7 @@ Any Num numeric equality match if $_ == $x Any Str string equality match if $_ eq $x Any .method method truth* match if $_.method - Any Rule pattern match match if $_ ~~ /$x/ + Any Regex pattern match match if $_ ~~ /$x/ Any subst substitution match* match if $_ ~~ subst Any boolean simple expression truth* match if true given $_ Any undef undefined match unless defined $_ @@ -846,10 +846,10 @@ C<fail> in such a case to return an exception object. Exception objects also behave like undefined generators in list context. In any case, returning an unthrown exception is considered failure -from the standpoint of C<let>. Backtracking over a closure in a rule +from the standpoint of C<let>. Backtracking over a closure in a regex is also considered failure of the closure, which is how hypothetical -variables are managed by rules. (And on the flip side, use of C<fail> -within a rule closure initiates backtracking of the rule.) +variables are managed by regexes. (And on the flip side, use of C<fail> +within a regex closure initiates backtracking of the regex.) =head1 When is a closure not a closure Modified: doc/trunk/design/syn/S05.pod ============================================================================== --- doc/trunk/design/syn/S05.pod (original) +++ doc/trunk/design/syn/S05.pod Fri Apr 21 19:18:36 2006 @@ -2,7 +2,7 @@ =head1 TITLE -Synopsis 5: Rules +Synopsis 5: Regexes and Rules =head1 AUTHORS @@ -16,7 +16,7 @@ Date: 24 Jun 2002 Last Modified: 21 Apr 2006 Number: 5 - Version: 19 + Version: 20 This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them I<regex> because they haven't been @@ -726,7 +726,7 @@ a Regex object, it is not recompiled. If it is a string, the compiled form is cached with the string so that it is not recompiled next time you use it unless the string changes. (Any external lexical -variable names must be rebound each time though.) Rules may not be +variable names must be rebound each time though.) Subrules may not be interpolated with unbalanced bracketing. An interpolated subrule keeps its own inner C<$/>, so its parentheses never count toward the outer regexes groupings. (In other words, parenthesis numbering is always @@ -1017,7 +1017,7 @@ sub my_grep($selector, [EMAIL PROTECTED]) { given $selector { - when Rule { ... } + when Regex { ... } when Code { ... } when Hash { ... } # etc. @@ -1025,7 +1025,7 @@ } Using C<{...}> or C</.../> in the scalar context of the first argument -causes it to produce a C<Code> or C<Rule> object, which the switch +causes it to produce a C<Code> or C<Regex> object, which the switch statement then selects upon. =item * Modified: doc/trunk/design/syn/S06.pod ============================================================================== --- doc/trunk/design/syn/S06.pod (original) +++ doc/trunk/design/syn/S06.pod Fri Apr 21 19:18:36 2006 @@ -15,7 +15,7 @@ Date: 21 Mar 2003 Last Modified: 21 Apr 2006 Number: 6 - Version: 28 + Version: 29 This document summarizes Apocalypse 6, which covers subroutines and the @@ -34,12 +34,17 @@ subroutines masquerading as methods. They have an invocant and belong to a particular kind or class. -B<Rules> (keyword: C<rule>) are methods (of a grammar) that perform +B<Regexes> (keyword: C<regex>) are methods (of a grammar) that perform pattern matching. Their associated block has a special syntax (see -Synopsis 5). +Synopsis 5). (We also use the term "regex" for anonymous patterns +of the traditional form.) -B<Tokens> (keyword: C<token>) are rules that perform low-level -pattern matching (and also enable rules to do whitespace dwimmery). +B<Tokens> (keyword: C<token>) are regexes that perform low-level +non-backtracking (by default) pattern matching. + +B<Rules> (keyword: C<rule>) are regexes that perform non-backtracking +(by default) pattern matching (and also enable rules to do whitespace +dwimmery). B<Macros> (keyword: C<macro>) are routines whose calls execute as soon as they are parsed (i.e. at compile-time). Macros may return another @@ -297,13 +302,13 @@ within a C<< <...> >> or C<«...»> slice, as in the example above). A null operator name does not define a null or whitespace operator, but -a default matching rule for that syntactic category, which is useful when +a default matching subrule for that syntactic category, which is useful when there is no fixed string that can be recognized, such as tokens beginning with digits. Such an operator I<must> supply an C<is parsed> trait. -The Perl grammar uses a default rule for the C<:1st>, C<:2nd>, C<:3rd>, -etc. rule modifiers, something like this: +The Perl grammar uses a default subrule for the C<:1st>, C<:2nd>, C<:3rd>, +etc. regex modifiers, something like this: - sub rule_mod_external:<> ($x) is parsed(rx:p/\d+[st|nd|rd|th]/) {...} + sub regex_mod_external:<> ($x) is parsed(token { \d+[st|nd|rd|th] }) {...} Such default rules are attempted in the order declared. (They always follow any rules with a known prefix, by the longest-token-first rule.) @@ -1181,7 +1186,7 @@ my Dog $ ($fido, $spot) := twodogs(); # one twodog object my Dog :($fido, $spot) := twodogs(); # one twodog object -Subsignatures can be matched directly with rules by using C<:(...)> +Subsignatures can be matched directly within regexes by using C<:(...)> notation. push @a, "foo"; @@ -1202,7 +1207,7 @@ within the signature. Otherwise it will try to bind an external C<$i> instead, and fail if no such variable is declared. -Note that unlike a sub declaration, a rule-embedded signature has no +Note that unlike a sub declaration, a regex-embedded signature has no associated "returns" syntactic slot, so you have to use C<< --> >> within the signature to specify the type of the tuple, or match as an arglist: @@ -1234,7 +1239,7 @@ and match a tuple-ish item with a single value of type Dog. -Note also that bare C<\(1,2,3)> is never legal in a rule since the +Note also that bare C<\(1,2,3)> is never legal in a regex since the first paren would try to match literally. =head2 Attributive parameters @@ -1348,7 +1353,7 @@ Method Perl method Submethod Perl subroutine acting like a method Macro Perl compile-time subroutine - Rule Perl pattern + Regex Perl pattern Match Perl match, usually produced by applying a pattern Package Perl 5 compatible namespace Module Perl 6 standard namespace @@ -1603,7 +1608,9 @@ =item C<is parsed> -Specifies the rule by which a macro call is parsed. +Specifies the subrule by which a macro call is parsed. The parse +always starts after the macro token, but the token may be referred +to within the subrule as C<< $<KEY> >>. =item C<is cached> @@ -1795,9 +1802,11 @@ =head2 The C<leave> function -A C<return> statement causes the innermost surrounding subroutine, method, -rule, macro, or multimethod to return. Only declarations with an explicit -keyword such as "sub" may be returned from. +A C<return> statement causes the innermost surrounding subroutine, +method, rule, token, regex (as a keyword), macro, or multimethod +to return. Only declarations with an explicit keyword such as "sub" +may be returned from. You may not return from a quotelike operator such +as C<rx//>. To return from other types of code structures, the C<leave> function is used: @@ -1807,6 +1816,13 @@ leave &foo <== 1,2,3; # Return from innermost surrounding call to &foo leave Loop, :label<COUNT>; # Same as: last COUNT; +Note that the last is equivalent to + + last COUNT; + +and, in fact, you can return a final loop value that way: + + last COUNT <== 42; =head2 Temporization @@ -2106,7 +2122,7 @@ delimiters in a row should not be a problem. It has to be a special grammar rule, though, not a fixed token, since we need to be able to nest code blocks with different delimiters. Likewise when parsing the -inner expression, the inner parser rule is parameterized to know that +inner expression, the inner parser subrule is parameterized to know that C<}}}> or whatever is its closing delimiter.) Unquoted expressions are inserted appropriately depending on the