On Sun, Mar 11, 2012 at 6:58 PM, Bharath M R <catchmrbhar...@gmail.com> wrote: > > On Sunday, March 11, 2012 7:00:15 PM UTC+5:30, Joachim Durchholz wrote: >> >> Am 11.03.2012 05:27, schrieb Bharath M R: >> > @Sergiu I was looking at top down operator precedence parsing for the >> > task >> >> That's the technique where the grammar requires the least contortions. >> It also composes well: you can define sublanguages (such as arithmetic >> and boolean logic), and as long as their operator sets do not overlap, >> you can combine them without any problems. >> > > Thanks for the reply. The parser is for building the web interface similar > to > Wolfram Alpha for SymPy. Basically I want to parse things like > integrate x**2 dx > roots x**2==1
I've never had any serious experience with operator precedence parsing, but I have the intuition that this technique is going to be quite unwieldy if we would like to go beyond simple expressions like the ones you have shown. I'd advocate a more general approach to this problem, and namely something like LL(1), SLR or LALR(1). When writing from scratch, this is obviously going to require more effort than operator precedence but instead we will have a much wider class of languages covered. However, I'd recommend using a parser generator instead of writing a parser from scratch yourself. It will introduce some slowdown, but will instead make the whole thing a lot easier to write, maintain and, more importantly, to extend. I'm not sure about the state of parser generators for Python, but this page http://wiki.python.org/moin/LanguageParsing may provide some information. I can see a different problem here though: choosing a parsing method and producing the grammar of the language (or what information the parser would require) may not be enough to get the desired functionality. What we are trying to get at should incorporate something similar to natural language processing. Therefore, before actually starting the parsing process, I think the input should be brought to some kind of canonical form. I am strongly inclined to believe that Wolfram|Alpha takes exactly this way of doing things (but obviously with a lot more details). One simple thing I can think of is detecting synonyms. For example, "roots x**2 == 1" and "solutions x**2 == 1" and "solve x**2 == 1" are one and the same thing. Therefore, it may be useful to start with substituting each of these synonyms with some canonical word ("solve", for example). It is possible to argue that this can be solved by introducing additional rules into the grammar. However, I am inclined to believe that this will make the grammar unnecessarily large and, even worse, slow down the parser. Another simple thing is detecting spelling errors. Suppose I type "intgrate" or any of the other (numerous) wrong possibilities. I think it would be very nice to have such mistakes detected and corrected automatically. This http://packages.python.org/pyenchant/ seems on topic. I'd also recommend showing the substitutions and the resulting canonical form. In this way the user will be able to see how their input was interpreted and, maybe, change their formulation to get nearer to what they wanted. The list of actions the preprocessor should can be extended arbitrarily, I guess. For example, it could try to fix wrongly balanced parentheses. It might also try to cope with a different order of keywords in the string, like in "sin(x) integral". It would be nice to parse single-argument functions written without parentheses ("sin x" instead of "sin(x)"). The preprocessor could also drop incomprehensible (and thus supposedly meaningless) words, like in "find the integral of x^2". Apparently, the elements in this list should also be given priorities, because some of them are essential (synonyms, for example, as I see it), others are less critical. Sergiu -- You received this message because you are subscribed to the Google Groups "sympy" group. To post to this group, send email to sympy@googlegroups.com. To unsubscribe from this group, send email to sympy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sympy?hl=en.