On Tue, Mar 13, 2012 at 2:58 PM, Joachim Durchholz <j...@durchholz.org> wrote:
> Am 13.03.2012 21:17, schrieb Aaron Meurer:
>
>> So would it help to start a wiki page where we list all the things we
>> want to support, in the order of importance?
>
>> Here's a beginning of that list (in order):
>>
>>
>> - SymPy syntax:  This is probably obvious, but correct SymPy/Python
>> syntax should always be parsed exactly as it is given.  If the
>> heuristic parser has ambiguities problems that would prevent this, we
>> can just preparse with a call to sympify(), and only use heuristics if
>> that fails.
>>
>> - Mathematica, Maple, Maxima, etc. syntax. Where they conflict, we
>> should pick the more popular variant, or if that's nontrivial, we can
>> just let it decide as a side-effect of the implementation (i.e., leave
>> the behavior undefined in that case).
>>
>> - LaTeX.  The ability to parse LaTeX math into something that can be
>> computed would be very useful.  WolframAlpha has support for this.
>
>
> It's almost guaranteed that combining syntaxes from different sources gives
> an ambiguous grammar. The only technique that can deal with that would those
> in the succession of the Earley parser.
>
> I see that http://en.wikipedia.org/wiki/Earley_parser lists four different
> Python implementations, one of them just 150 lines.

Just about all of them are relatively short.  I suppose it wouldn't be
hard, then, to just implement this from scratch.

>
>
>> - Text based math.  What I mean here is, it should support parsing
>> things as you would type them in plain text without trying to follow
>> any kind of set grammar.  Stuff like 3x^2 + 2, d/dx x^2.
>
>
> That's really hard to do well. Most of the time, the users's guess of the
> parser's guess will be quite different than the actuall guess of the parser.
>
>
>> - Special symbols: Support stuff like √x or ∫x^2 dx.  Allow, to some
>> degree, pasting in stuff from the SymPy pretty printer (in particular,
>> things that are not printed on more than one line, like 3⋅x₃).
>
>
> That's simple. Just plop in the appropriate grammar rules. Make √ a prefix
> operator, ∫...dx a "circumfix" one.
> ₃ would probably have to be lexed as <sub>3<endsub>, where <sub> and
> <endsub> are synthetic lexer symbols.

Or preparse and replace ∫ with "integrate" and so on.

Subscripts have no syntactical meaning, so those should actually just
be considered part of the Symbol name (maybe translated from "₃" to
"_3").

>
>
>> - Text based functions:  Stuff like "integrate x^2 dx", "limit x^2
>> with respect to x as x approaches infinity".
>>
>> - Natural language processing:  There is a vagary between this and the
>> last bullet point.  What I mean here is that it tries to guess what is
>> meant from a plain text description without using a set grammar.  This
>> could even support stuff like "the integral of x squared with respect
>> to x".
>
>
> The same caveat as with "text-based math" apply.
>
>
>> Shall I start a wiki page?  I know there have been other things
>> discussed here, like unary minus and bra-ket, that can be problems
>> that are important to consider.
>
>
> I see two things that need a decision:
>
> 1) Whether supporting such a wide array of syntaxes is such an important
> goal.
> If yes, Earley parsing it is.
> If no, it would be defining our own syntax, possibly similar to existing
> symbolic math languages, but still a separate syntax.
>
> From user's perspective, the consequence of an Earley parser would be that
> an additional error mode: the input text might have multiple valid parses.
> (How to best present that to the user might be one or more GSoC projects.)
>
> The consequence of a non-Earley parser, regardless of technology, would be
> that we'd have to drastically cut down on the allowed syntax.
> Essentially, we'd have to resolve all potential syntactic ambiguities when
> writing the grammar.
>
> I think this decision does not benefit from a Wiki.

If you put the information you put in the earlier post together on a
wiki, and format it so it's nice and readable, it can make things
easier to go through than trying to read through this thread again.

>
> 2) Which grammar rules to support.
>
> This is a bit tedious: look up what the various syntaxes have, write it
> down.
>
> A wiki page would be useful for that.

OK, I started https://github.com/sympy/sympy/wiki/parsing.  I included
what I said above, and also your bit about the different parsers.
Feel free to edit that page however you want.

Aaron Meurer

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to sympy@googlegroups.com.
To unsubscribe from this group, send email to 
sympy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Reply via email to