Am 14.03.2012 14:15, schrieb Sergiu Ivanov:
On the other hand, you can switch the parsing techniques much easier
if you are using parser generators.

No, not at all.
All parser generators have a different input syntax, different mechanisms to
attach semantic actions, etc.
That's a decision that cannot be reversed easily.

I see.  I thought the differences in syntax were not that influential.

Well, it's not just syntax. The way you express semantic actions can have a deep impact. It's alike the similarities and differences between C and Pascal: both are statically-types, imperative language with little thought for modularity. Still, you wouldn't want to rewrite programs from one language to the other, not even as small ones as 500 lines.

and http://pypi.python.org/pypi/modgrammar

- Cannot handle left recursion (BLOCKER)

Hm, thanks for pointing out.  I tried to check that yesterday and I
still cannot find the information in their docs (at least direct
searching for "recursion" doesn't work for me).

http://packages.python.org/modgrammar/tutorial.html#left-recursion

Last paragraph of the section.

The point here is that while a left-recursive and a right-recursive grammar are equivalent in the set of inputs that they accept, they generate different parse trees.

Spell checking, again, is something that is easily done on the parse tree
after it has been created.


I'm not sure I agree.  Consider the (supposedly valid) sentences
"integrate sin(x) by x" and "limit sin(x) when x goes to zero".  I
don't think I'd recommend parsing these two sentences with one
(general) rule, which means that the words "integrate" and "limit"
actually determine which of the rules to use.  If spell checking
doesn't happen before lexing, the necessary difference between
"integrate" and "limit" may not be detected.


Where does spell checking factor in here?

I wanted to say that if "integrate" and "limit" belong to different
classes of symbols, then, should the lexer encounter "intgrate", it
wouldn't be able to correctly classify it as "integrate".

Hmmm... that usually doesn't end well. You never know whether a particular misspelling was intentional or not.

E.g. is "limits" a misspelled keyword, or a free variable because he's doing some first-order logic on interval arithmetic (where he might be manipulating sets of limits, i.e. upper/lower bounds).

The standard advice to programming language designers is that this is One Of Those Well-Intentioned Bad Ideas From The 50ies. Other ideas like what were "make a programming language look like English" (that gave us Cobol), or "put all good ideas into one language to give use the One True Language That Has It All" (that gave us PL/I).

That doesn't mean that this kind of advice is always valid, but you need to know what you're doing and why those approaches failed, to avoid falling to the same traps that have been known for over half a century now.

The preprocessor could also drop
incomprehensible (and thus supposedly meaningless) words, like in
"find the integral of x^2".

Experience with this kind of approach was that it tends to reduce
predictability. In this case, the user might consider a word meaningless
but
the program has some obscure definition for it and suddenly spits out
error
messages that refer to stuff that the user doesn't know about

I don't think this is a problem because the user is not supposed to
purposefully input meaningless words in normal scenarios.

Then I don't understand what purpose the mechanism serves.

I was thinking about the situation when the user intuitively enters
some text which includes elements of a natural language (like "find"
and "the" in "find the limit sin(x)/x when x goes to 0").  In this
case the user thinks that all words are meaningless, but, for the
application, "find" and "the" bear no meaning.

Well, it's really hard to write a parser that does this kind of stuff well enough to be worth it. I do not know of a single project that successfully implemented such a thing, but about several that failed abysmally.

You're trying to get a DWIM thing implemented.
See http://catb.org/jargon/html/D/DWIM.html .
The first rule for this kind of thing is: Don't guess what the problems are, go out and watch what problems bite the users in reality. Then add some (cautious) spell checking and other DWIM mechanisms, and iteratively correct them until they do *far* more good than harm. It's nothing that can be designed. You need to deal with what people actually do (and, hence, it is probably too much work for the current size of the SymPy community).

We could start that by adding some statistics code to SymPy: which functions are used, what kinds of errors happen, where do they originate. Define a mail address to send them to, write the tools to extract information from the raw data. Think about how to ask the user about whether collecting usage data is okay for them (some might disagree, an NSA mathematician working with SymPy on some secret research would most certainly disagree).

--
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to sympy@googlegroups.com.
To unsubscribe from this group, send email to 
sympy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Reply via email to