Hi,
I've read and reread the macro explanation but I'm still not entirely
clear on number of things. The questions and thoughts below are based
on my (mis)understanding.
On 03/02/06 02:05, Larry Wall wrote:
Macros are functions or operators that are called by the compiler as
soon as their arguments are parsed (if not sooner). The syntactic
effect of a macro declaration or importation is always lexically
scoped, even if the name of the macro is visible elsewhere.
And presumably they can be lexically unimported, or whatever the verb is
for what "no" does.
As with
ordinary operators, macros may be classified by their grammatical
category. For a given grammatical category, a default parsing rule or
set of rules is used, but those rules that have not yet been "used"
by the time the macro keyword or token is seen can be replaced by
use of "is parsed" trait. (This means, for instance, that an infix
operator can change the parse rules for its right operand but not
its left operand.)
In the absence of a signature to the contrary, a macro is called as
if it were a method on the current match object returned from the
grammar rule being reduced; that is, all the current parse information
is available by treating C<self> as if it were a C<$/> object.
Is this a :keepall match object?
Or is the Perl6 grammar conserving by default?
(The "Syntax trees [...] are reversible" suggests so)
Or is this one of the "signature to the contrary" possibilities?
[Conjecture: alternate representations may be available if arguments
are declared with particular AST types.]
Macros may return either a string to be reparsed, or a syntax tree
that needs no further parsing. The textual form is handy, but the
syntax tree form is generally preferred because it allows the parser
and debugger to give better error messages. Textual substitution
on the other hand tends to yield error messages that are opaque to
the user. Syntax trees are also better in general because they are
reversible, so things like syntax highlighters can get back to the
original language and know which parts of the derived program come
from which parts of the user's view of the program.
In aid of returning syntax tree, Perl provides a "quasiquoting"
mechanism using the keyword "CODE", followed by a block intended to
represent an AST:
return CODE { say $a };
I guess the string form is C<eval "CODE { $str }">
If CODE may enclose arbitrary source text of whatever DSL poeple invent,
alternate braces would probably be useful. Either q()-like, HERE-doc
or pod's C<< >> nesting style.
[Conjecture: Other keywords are possible if we have more than one
AST type.]
Ocaml and camlp4 are probably a good source of ideas for quasiquoting.
I've only perused the documentation, has one actually used Ocaml here?
See: http://caml.inria.fr/pub/docs/tutorial-camlp4/tutorial004.html
Rather than misrepresenting Ocaml with my sketchy understanding,
I'll just mention some possibly interesting features:
Specific expander rules from the grammar can be used, <:rulename< ... >>
They have a C -> AST expander. I can imagine a SQL -> AST expander
would find some use in Perl. I don't think the same AST type is used but
that's just a guess.
Two of the "p"s in p4 stand for pretty-printer, which is the AST->source
conversion. In addition to aiding debugging and reformatting, it allows
interconversion between different syntaxes (sp?). Ocaml comes with two
grammars, one is backwards compatible and the other has jettisoned
the baggage.
Within a quasiquote, variable and function names resolve first of
all according to the lexical scope of the macro definition, and if
unrecognized in that scope, are assumed to be bound from the scope
of the macro call each time it is called. If they cannot be bound
from the scope of the macro call, a compile-time exception is thrown.
Variables that resolve from the lexical scope of the macro definition
will be inserted appropriately depending on the type of the variable,
which may be either a syntax tree or a string. (Again, syntax tree
is preferred.) The case is similar to that of a macro called from
within the quasiquote, insofar as reparsing only happens with the
string version of interpolation, except that such a reparse happens
at macro call time rather than macro definition time, so its result
cannot change the parser's expectations about what follows the
interpolated variable.
Is there any cpp-like protection against self-referential expansions
when using the string returning form?
The last S06 sentence above overflowed my mental stack, so I'm unsure whether
self-referential expansions are somehow impossible.
Hence, while the quasiquote itself is being parsed, the syntactic
interpolation of a variable into the quasiquote always results in
the expectation of an operator following the variable. (You must
use a call to a submacro if you want to expect something else.)
Of course, the macro definition as a whole can expect whatever it
likes afterwards, according to its syntactic category. (Generally,
a term expects a following postfix or infix operator, and an operator
expects a following term or prefix operator.)
Do @arrays of ASTs interpolate/splice?
Lisp needs ,@ (comma-at) to do splatty interpolation, that is remove the
outer pair of parens. Depending on what the ASTs look like and how they
splice together, such a form may or may not be necessary.
In case of name ambiguity, prefix with C<COMPILING::> to indicate a
name in the compiling scope, and anything else (such as C<OUTER::>)
to indicate a name in the macro definition's scope, since that's the
default. In particular, any variable declared within the quasiquote
block is assumed to scope to the quasiquote; to scope the declaration
to the macro call's scope, you must say
my COMPILING::<$foo> = 123;
env COMPILING::<@bar> = ();
our COMPILING::<%baz>;
or some such if you wish to force the compiler to install the variable
into the symbol table being constructed by the macro call.
"COMPILING" here means the scope in which the macro is being expanded, rather
than the scope in which the macro itself is being compiled, is that correct?
Perhaps a twigil would be clearer? Such huffmanization is probably
undeserved and would be seen as encouraging promiscuous lexical intercourse...
What are the variable visibility rules when interpolating in quasiquotes?
Does a variable unbound in a spliced AST bind to one in the enclosing
quasiquote?
The consequences of this when inserting an AST from a parsed parameter
need to be considered. If the enclosing quotes variables are visible
then an unintended binding may occur.
[Conjecture: Due to these dwimmy scoping rules, there is no need of
a special "unquote" construct as in Scheme et al.]
No gensym shenanigans either. The scoping rules seem to be hygienic,
no unintended variable leaking. Unintended variable capture seems unlikely
too, only if you forget to declare a variable with the macro declaration
and coincidently declare the same variable in the macro use scope will
everything go haywire.
Brad
--
That one's own district is unsophisticated and unpolished is a great
treasure. Imitating another style is simply a sham.
-- Hagakure http://bereft.net/hagakure/