Am 13.03.2012 17:07, schrieb Sergiu Ivanov:
Thank you for the very good overview; I wish I had been given
something like this in my course on compilers.

You're welcome :-)

Some of that knowledge came with bruises.

A parser generator is required for LL and LR variants.
The problem with these is that they usually come with their own syntax, so
using them requires learning new skills.

I think the effort required to acquire these new skills is still less
than that of mastering a certain parsing technique and then
implementing it.  (Just for reference, I think I can remember I once
learnt the basics of JavaCC in a day.)

You learnt the syntax of the tool.
The hard part is dealing with conflicts.
And to deal with conflicts, you need to have mastered the parsing technique. In the sense of "what does it actually do internally".
At which point you're already fully equipped to roll your own parser.

Of course, writing a parser engine is still more work than using a prewritten one. But it's not THAT much; 500-1000 lines of code could be enough. Besides, there are tons of recipes in the internet - the real challenge would be to decide which one is best for our purposes.

On the other hand, you can switch the parsing techniques much easier
if you are using parser generators.

No, not at all.
All parser generators have a different input syntax, different mechanisms to attach semantic actions, etc.
That's a decision that cannot be reversed easily.

  I'm not sure about the state of parser
generators for Python, but this page
http://wiki.python.org/moin/LanguageParsing may provide some
information.


A parser that's embedded into Python as a DSL would be an option.

I'm not sure I can properly understand your idea; why do we need to
*embed* the DSL into Python?

Wikipedia:
"embedded (or internal) domain-specific languages, implemented as libraries which exploit the syntax of their host general purpose language or a subset thereof, while adding domain-specific language elements (data types, routines, methods, macros etc.)."

That is, have a few functions, and the grammar becomes a set of function calls that will contain the grammar, or generate the data to drive the parser engine.

I've checked the link I provided one more time; it lists a number of
implementations that look quite appealing, including
http://www.acooke.org/lepl/

- Recursive descent cannot (usually) handle infix operators.
+ Parsers are Python code, defined in Python itself.
  (i.e. it is an embedded DSL.)

> and http://pypi.python.org/pypi/modgrammar

- Cannot handle left recursion (BLOCKER)

Right-associative operators need left recursion.

How so?  I can't see how it is hard to replace ("roots" or
"solutions") with "solve" before parsing.

Oh. So you'd replace "rootset" with "solveet".

It's easier to build alternate definitions into the semantic processing that
happens after the parse.

Hm, I think I can see your point now.  It is indeed more correct to
process alternate definitions into the after-parse semantic
processing.  However, I'm not sure the resulting code is going to be
simpler.

If you parse things, you need a place to link up a subtree with some semantics. Providing alternate syntaxes for the same semantics just means you have to assign the same semantics to some different kinds of subtrees. You just reuse the existing semantic object (callable, class, whatever). Maybe with a small adapter if the semantic object assumes a specific structure in the tree (most parser toolkits abstract away even from that).

Having a larger grammar isn't so much a maintenance problem either if each
rule follows a pattern from a small set of standard patterns.

I think I don't understand this.  Could you please give more details?

It's no problem if you have a+b, a-b, a*b, a^b, a&&b, etc. etc.
It *is* a problem if you have a-b, a*b, and -a, and want -a to bind tighter than a*b.

It is not a problem to have a gazillion of control structures like while...end, switch, etc. It *is* a problem if you have if...then...else... (without a closing endif). Google for "dangling else".

Spell checking, again, is something that is easily done on the parse tree
after it has been created.

I'm not sure I agree.  Consider the (supposedly valid) sentences
"integrate sin(x) by x" and "limit sin(x) when x goes to zero".  I
don't think I'd recommend parsing these two sentences with one
(general) rule, which means that the words "integrate" and "limit"
actually determine which of the rules to use.  If spell checking
doesn't happen before lexing, the necessary difference between
"integrate" and "limit" may not be detected.

Where does spell checking factor in here?

The preprocessor could also drop
incomprehensible (and thus supposedly meaningless) words, like in
"find the integral of x^2".

Experience with this kind of approach was that it tends to reduce
predictability. In this case, the user might consider a word meaningless but
the program has some obscure definition for it and suddenly spits out error
messages that refer to stuff that the user doesn't know about

I don't think this is a problem because the user is not supposed to
purposefully input meaningless words in normal scenarios.

Then I don't understand what purpose the mechanism serves.

--
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to sympy@googlegroups.com.
To unsubscribe from this group, send email to 
sympy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Reply via email to