Re: [sympy] feedback for GSOC 2012 idea

Sergiu Ivanov Sun, 11 Mar 2012 13:40:26 -0700

On Sun, Mar 11, 2012 at 6:58 PM, Bharath M R <catchmrbhar...@gmail.com> wrote:
>
> On Sunday, March 11, 2012 7:00:15 PM UTC+5:30, Joachim Durchholz wrote:
>>
>> Am 11.03.2012 05:27, schrieb Bharath M R:
>> > @Sergiu I was looking at top down operator precedence parsing for the
>> > task
>>
>> That's the technique where the grammar requires the least contortions.
>> It also composes well: you can define sublanguages (such as arithmetic
>> and boolean logic), and as long as their operator sets do not overlap,
>> you can combine them without any problems.
>>
>
> Thanks for the reply. The parser is for building the web interface similar
> to
> Wolfram Alpha for SymPy.  Basically I want to parse things like
> integrate x**2 dx
> roots x**2==1


I've never had any serious experience with operator precedence
parsing, but I have the intuition that this technique is going to be
quite unwieldy if we would like to go beyond simple expressions like
the ones you have shown.

I'd advocate a more general approach to this problem, and namely
something like LL(1), SLR or LALR(1).  When writing from scratch, this
is obviously going to require more effort than operator precedence but
instead we will have a much wider class of languages covered.
However, I'd recommend using a parser generator instead of writing a
parser from scratch yourself.  It will introduce some slowdown, but
will instead make the whole thing a lot easier to write, maintain and,
more importantly, to extend.  I'm not sure about the state of parser
generators for Python, but this page
http://wiki.python.org/moin/LanguageParsing may provide some
information.

I can see a different problem here though: choosing a parsing method
and producing the grammar of the language (or what information the
parser would require) may not be enough to get the desired
functionality.  What we are trying to get at should incorporate
something similar to natural language processing.  Therefore, before
actually starting the parsing process, I think the input should be
brought to some kind of canonical form.  I am strongly inclined to
believe that Wolfram|Alpha takes exactly this way of doing things (but
obviously with a lot more details).

One simple thing I can think of is detecting synonyms.  For example,
"roots x**2 == 1" and "solutions x**2 == 1" and "solve x**2 == 1" are
one and the same thing.  Therefore, it may be useful to start with
substituting each of these synonyms with some canonical word ("solve",
for example).  It is possible to argue that this can be solved by
introducing additional rules into the grammar.  However, I am inclined
to believe that this will make the grammar unnecessarily large and,
even worse, slow down the parser.

Another simple thing is detecting spelling errors.  Suppose I type
"intgrate" or any of the other (numerous) wrong possibilities.  I
think it would be very nice to have such mistakes detected and
corrected automatically.  This http://packages.python.org/pyenchant/
seems on topic.

I'd also recommend showing the substitutions and the resulting
canonical form.  In this way the user will be able to see how their
input was interpreted and, maybe, change their formulation to get
nearer to what they wanted.

The list of actions the preprocessor should can be extended
arbitrarily, I guess.  For example, it could try to fix wrongly
balanced parentheses.  It might also try to cope with a different
order of keywords in the string, like in "sin(x) integral".  It would
be nice to parse single-argument functions written without parentheses
("sin x" instead of "sin(x)").  The preprocessor could also drop
incomprehensible (and thus supposedly meaningless) words, like in
"find the integral of x^2".

Apparently, the elements in this list should also be given priorities,
because some of them are essential (synonyms, for example, as I see
it), others are less critical.

Sergiu

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to sympy@googlegroups.com.
To unsubscribe from this group, send email to 
sympy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Re: [sympy] feedback for GSOC 2012 idea

Reply via email to