On Sun, 30 Apr 2006, Daniel Watkins wrote:
> I'm currently working on a program to parse LaTeX style maths > expressions and provide an answer. For example, I have the expression > "2^\frac{1}{2}". I'm trying to work out a way to split this into it's > most basic blocks of LaTeX (i.e. 2^ and \frac{1}{2}) while maintaining a > record of the depth of the expression (i.e. (2^,0),(\frac{1}{2},1)). Hi Daniel, Forgive me for going off a side tangent here, but do we have to work directly on LaTeX? I don't mean this to be a silly question! What kind of problem are we trying to solve? It sounds like we're trying to evaluate LaTeX equations, so that presents the problem of parsing LaTeX. Diagrammically: LaTeX source ---- parsing --> evaluated expressions We could talk about parsing LaTeX. But could we have equations written in something easier to parse, and generate LaTeX out of that? That is: some easy-to-parse equations | +-- simple parser --> LaTeX writer ---> LaTeX source | +-- simple parser --> evaluator --> evaluated expressions If we have the freedom to do this, we may want to take this route, because it frees us from being so tied to LaTeX. For example, although this will look silly at first, we could do something like: text = """<para> Hello world. <math name="expr1"> <fraction> <numer>3</numer> <denom>6</denom> </fraction> </math> equals: <evaluated ref="expr1"/> Does that look ok? </para>""" This is just an ad-hoc XML document I've cooked up. The idea is that, given something like the above, we can: * write a XML-processing program to turn this into legitimate LaTeX source, or * write a program to evaluate all the math expressions and give us their values. If we take this route, the parsing problem here is already solved for us because there are XML-parsing modules like minidom or ElementTree. http://docs.python.org/lib/module-xml.dom.minidom.html http://effbot.org/zone/element-index.htm For example, here's a quick and dirty one that doesn't do a perfect job, but it's a start. (For this example, I'm using minidom because it handles text nodes as uniformly as other nodes, whereas ElementTree treats them as weird special cases.) ################################################################## import xml.dom.minidom def node_to_latex(node): """node_to_latex: node -> string Converts an arbitrary node to LaTeX source.""" if node.nodeType == node.TEXT_NODE: return text_to_latex(node) elif node.tagName == 'para': return para_to_latex(node) elif node.tagName == 'math': return math_to_latex(node) elif node.tagName == 'evaluated': return evaluated_to_latex(node) def text_to_latex(node): """text_to_latex: node -> string Assuming node is a text node, returns its textual data.""" return node.data def para_to_latex(para_node): """para_to_latex: para -> string Converts a paragraph node to LaTeX source.""" results = [] for sub_node in para_node.childNodes: results.append(node_to_latex(sub_node)) results.append('\n') return ''.join(results) def math_to_latex(math): """math_to_latex: node -> string Converts a math node to latex source.""" results = ['\n\\begin{math}\n'] for sub_math_node in math.childNodes: results.append(math_subnode_to_latex(sub_math_node)) results.append('\n\\end{math}\n') return ''.join(results) def math_subnode_to_latex(sub_node): """math_subnode_to_latex: node -> string Converst a math subnode into LaTeX. At the moment, we handle only fractions and text.""" results = [] if sub_node.nodeType == sub_node.TEXT_NODE: results.append(sub_node.data) elif sub_node.tagName == 'fraction': results.append(fraction_to_latex(sub_node)) return ''.join(results) def fraction_to_latex(node): """fraction_to_latex: fraction -> string Converts fractions to LaTeX.""" results = ['\\frac'] results.append('{') for child_math in ( node.getElementsByTagName('numer')[0].childNodes): results.append(math_subnode_to_latex(child_math)) results.append('}') results.append('{') for child_math in ( node.getElementsByTagName('denom')[0].childNodes): results.append(math_subnode_to_latex(child_math)) results.append('}') return ''.join(results) def evaluated_to_latex(node): """evaluated_to_latex: evaluated -> string Converts evaluated to LaTeX. Doesn't quite work right yet.""" results = [] results.append("EVALUATED-NOT-HANDLED-YET") return ''.join(results) if __name__ == '__main__': text = """<para> Hello world. <math name="expr1"> <fraction> <numer>3</numer> <denom>6</denom> </fraction> </math> equals: <evaluated ref="expr1"/> Does that look ok? </para>""" tree = xml.dom.minidom.parseString(text) print node_to_latex(tree.documentElement) ################################################################## My apologies for the length here, but I needed all those functions. This does a recursive traversal through the structure of the XML tree. It handles text, paragraphs, and very simple math. The XML document is neutral in the ways we can interpret it. Not only can we use it as a source for making LaTeX documents, but we can also extract all the math expressions in there by doing a similar kind of recursive traversal. ##################################################################### def find_math(node): """find_math: node -> (listof node) Returns a list of all the math sub-nodes embedded in the node.""" if node.nodeType == node.TEXT_NODE: return [] elif node.tagName == 'math': return [node] else: results = [] for child in node.childNodes: results.extend(find_math(child)) return results ##################################################################### Evaluating these math nodes should just be a matter of doing similar kinds of recursive traversals on their structure. I'm sorry I went really fast on this, but does this make sense so far? _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor