On Tue, Mar 13, 2012 at 2:58 PM, Joachim Durchholz <j...@durchholz.org> wrote: > Am 13.03.2012 21:17, schrieb Aaron Meurer: > >> So would it help to start a wiki page where we list all the things we >> want to support, in the order of importance? > >> Here's a beginning of that list (in order): >> >> >> - SymPy syntax: This is probably obvious, but correct SymPy/Python >> syntax should always be parsed exactly as it is given. If the >> heuristic parser has ambiguities problems that would prevent this, we >> can just preparse with a call to sympify(), and only use heuristics if >> that fails. >> >> - Mathematica, Maple, Maxima, etc. syntax. Where they conflict, we >> should pick the more popular variant, or if that's nontrivial, we can >> just let it decide as a side-effect of the implementation (i.e., leave >> the behavior undefined in that case). >> >> - LaTeX. The ability to parse LaTeX math into something that can be >> computed would be very useful. WolframAlpha has support for this. > > > It's almost guaranteed that combining syntaxes from different sources gives > an ambiguous grammar. The only technique that can deal with that would those > in the succession of the Earley parser. > > I see that http://en.wikipedia.org/wiki/Earley_parser lists four different > Python implementations, one of them just 150 lines.
Just about all of them are relatively short. I suppose it wouldn't be hard, then, to just implement this from scratch. > > >> - Text based math. What I mean here is, it should support parsing >> things as you would type them in plain text without trying to follow >> any kind of set grammar. Stuff like 3x^2 + 2, d/dx x^2. > > > That's really hard to do well. Most of the time, the users's guess of the > parser's guess will be quite different than the actuall guess of the parser. > > >> - Special symbols: Support stuff like √x or ∫x^2 dx. Allow, to some >> degree, pasting in stuff from the SymPy pretty printer (in particular, >> things that are not printed on more than one line, like 3⋅x₃). > > > That's simple. Just plop in the appropriate grammar rules. Make √ a prefix > operator, ∫...dx a "circumfix" one. > ₃ would probably have to be lexed as <sub>3<endsub>, where <sub> and > <endsub> are synthetic lexer symbols. Or preparse and replace ∫ with "integrate" and so on. Subscripts have no syntactical meaning, so those should actually just be considered part of the Symbol name (maybe translated from "₃" to "_3"). > > >> - Text based functions: Stuff like "integrate x^2 dx", "limit x^2 >> with respect to x as x approaches infinity". >> >> - Natural language processing: There is a vagary between this and the >> last bullet point. What I mean here is that it tries to guess what is >> meant from a plain text description without using a set grammar. This >> could even support stuff like "the integral of x squared with respect >> to x". > > > The same caveat as with "text-based math" apply. > > >> Shall I start a wiki page? I know there have been other things >> discussed here, like unary minus and bra-ket, that can be problems >> that are important to consider. > > > I see two things that need a decision: > > 1) Whether supporting such a wide array of syntaxes is such an important > goal. > If yes, Earley parsing it is. > If no, it would be defining our own syntax, possibly similar to existing > symbolic math languages, but still a separate syntax. > > From user's perspective, the consequence of an Earley parser would be that > an additional error mode: the input text might have multiple valid parses. > (How to best present that to the user might be one or more GSoC projects.) > > The consequence of a non-Earley parser, regardless of technology, would be > that we'd have to drastically cut down on the allowed syntax. > Essentially, we'd have to resolve all potential syntactic ambiguities when > writing the grammar. > > I think this decision does not benefit from a Wiki. If you put the information you put in the earlier post together on a wiki, and format it so it's nice and readable, it can make things easier to go through than trying to read through this thread again. > > 2) Which grammar rules to support. > > This is a bit tedious: look up what the various syntaxes have, write it > down. > > A wiki page would be useful for that. OK, I started https://github.com/sympy/sympy/wiki/parsing. I included what I said above, and also your bit about the different parsers. Feel free to edit that page however you want. Aaron Meurer -- You received this message because you are subscribed to the Google Groups "sympy" group. To post to this group, send email to sympy@googlegroups.com. To unsubscribe from this group, send email to sympy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sympy?hl=en.