Parsing a LaTeX expression should ideally return candidate SymPy expressions with a matching probability. In case of unambiguous matching, only one expression should have a high matching probability. In case of ambiguous matching, two or more SymPy expressions should have high probability.
Topic also matters. If you have a physics paper, you'd probably want it to match some particular kind of expression subsets. On Tuesday, 26 May 2020 13:33:14 UTC+2, Ben wrote: > > > > On Tuesday, May 26, 2020 at 7:23:42 AM UTC-4, David Bailey wrote: >> >> On 25/05/2020 23:42, Ben wrote: >> >> You're totally correct -- Latex is ambiguous. I don't find your >>> observation discouraging since it is perfectly reasonable. >>> >> >> The issue I'm interested in tackling is the conversion of math presented >> in Physics papers (e.g., .tex files on arxiv.org) to a semantically >> meaningful and unambiguous representation (e.g., Sympy). >> >> This issue would be moot if Physics papers were written in Sympy. I >> don't have insight on how to construct incentives that would lead to use of >> Sympy in Physics papers, so I'm working on the Latex-to-Sympy approach. >> >> Right - well in that case, maybe a system of hints that the user could >> add to your parser, would be really useful. For example if a user could >> tell your parser that superscripts were usually tensor subscripts rather >> than exponents (or alternatively that certain symbols used as superscripts >> would never mean exponents) you could come out with a better translation. >> Another useful hint, might be a list of the multi-letter symbols in use - >> sin, cos, exp, ln etc. so that you could resolve your ambiguity of what ab >> means - I mean sometimes sin(x) might mean s*i*n(x) and that could be >> handled by user specifying that only certain multi-letter symbols were in >> use. >> >> David >> >> >> > Yeah, in talking this over with a collaborator about this, we think there > are various sources to help with parsing. > > - within the math latex string to parse, what can be deduced about the > expected context? > - given other math expressions in the same paper, what would be > consistent? > - given the text in a paper surrounding the math expressions, what > would be expected based on keywords? > - given other papers in the same domain or based on citations, what > would be likely? > - what is statistically likely give the corpus of all articles? > > This is, in some sense, the same process a human goes through to decode > the intended meaning of any given math expression in a scientific paper. We > are looking to encode that process as a Python program. (That's beyond the > scope of Sympy but is context for the issue.) > > -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sympy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/aa518861-6ca7-4edb-be2e-e05c4f1fdf7d%40googlegroups.com.