Re: [sympy] feedback for GSOC 2012 idea

Joachim Durchholz Thu, 15 Mar 2012 15:58:56 -0700

Am 14.03.2012 14:15, schrieb Sergiu Ivanov:

On the other hand, you can switch the parsing techniques much easier
if you are using parser generators.


No, not at all.
All parser generators have a different input syntax, different mechanisms to
attach semantic actions, etc.
That's a decision that cannot be reversed easily.


I see.  I thought the differences in syntax were not that influential.

Well, it's not just syntax. The way you express semantic actions canhave a deep impact.It's alike the similarities and differences between C and Pascal: bothare statically-types, imperative language with little thought formodularity. Still, you wouldn't want to rewrite programs from onelanguage to the other, not even as small ones as 500 lines.

and http://pypi.python.org/pypi/modgrammar


- Cannot handle left recursion (BLOCKER)


Hm, thanks for pointing out.  I tried to check that yesterday and I
still cannot find the information in their docs (at least direct
searching for "recursion" doesn't work for me).


http://packages.python.org/modgrammar/tutorial.html#left-recursion

Last paragraph of the section.

The point here is that while a left-recursive and a right-recursivegrammar are equivalent in the set of inputs that they accept, theygenerate different parse trees.

Spell checking, again, is something that is easily done on the parse tree
after it has been created.



I'm not sure I agree.  Consider the (supposedly valid) sentences
"integrate sin(x) by x" and "limit sin(x) when x goes to zero".  I
don't think I'd recommend parsing these two sentences with one
(general) rule, which means that the words "integrate" and "limit"
actually determine which of the rules to use.  If spell checking
doesn't happen before lexing, the necessary difference between
"integrate" and "limit" may not be detected.



Where does spell checking factor in here?


I wanted to say that if "integrate" and "limit" belong to different
classes of symbols, then, should the lexer encounter "intgrate", it
wouldn't be able to correctly classify it as "integrate".

Hmmm... that usually doesn't end well. You never know whether aparticular misspelling was intentional or not.

E.g. is "limits" a misspelled keyword, or a free variable because he'sdoing some first-order logic on interval arithmetic (where he might bemanipulating sets of limits, i.e. upper/lower bounds).

The standard advice to programming language designers is that this isOne Of Those Well-Intentioned Bad Ideas From The 50ies.Other ideas like what were "make a programming language look likeEnglish" (that gave us Cobol), or "put all good ideas into one languageto give use the One True Language That Has It All" (that gave us PL/I).

That doesn't mean that this kind of advice is always valid, but you needto know what you're doing and why those approaches failed, to avoidfalling to the same traps that have been known for over half a century now.

The preprocessor could also drop
incomprehensible (and thus supposedly meaningless) words, like in
"find the integral of x^2".


Experience with this kind of approach was that it tends to reduce
predictability. In this case, the user might consider a word meaningless
but
the program has some obscure definition for it and suddenly spits out
error
messages that refer to stuff that the user doesn't know about


I don't think this is a problem because the user is not supposed to
purposefully input meaningless words in normal scenarios.


Then I don't understand what purpose the mechanism serves.


I was thinking about the situation when the user intuitively enters
some text which includes elements of a natural language (like "find"
and "the" in "find the limit sin(x)/x when x goes to 0").  In this
case the user thinks that all words are meaningless, but, for the
application, "find" and "the" bear no meaning.

Well, it's really hard to write a parser that does this kind of stuffwell enough to be worth it.I do not know of a single project that successfully implemented such athing, but about several that failed abysmally.


You're trying to get a DWIM thing implemented.
See http://catb.org/jargon/html/D/DWIM.html .

The first rule for this kind of thing is: Don't guess what the problemsare, go out and watch what problems bite the users in reality. Then addsome (cautious) spell checking and other DWIM mechanisms, anditeratively correct them until they do *far* more good than harm.It's nothing that can be designed. You need to deal with what peopleactually do (and, hence, it is probably too much work for the currentsize of the SymPy community).

We could start that by adding some statistics code to SymPy: whichfunctions are used, what kinds of errors happen, where do theyoriginate. Define a mail address to send them to, write the tools toextract information from the raw data. Think about how to ask the userabout whether collecting usage data is okay for them (some mightdisagree, an NSA mathematician working with SymPy on some secretresearch would most certainly disagree).


--
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to sympy@googlegroups.com.
To unsubscribe from this group, send email to 
sympy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Re: [sympy] feedback for GSOC 2012 idea

Reply via email to