2010/3/23 Roan Kattouw <roan.katt...@gmail.com>:
> 2010/3/23 Aryeh Gregor <simetrical+wikil...@gmail.com>:
>> This much I know, but is LaTeX actually a regular language?
>>
> I don't know; I was just making the point that writing a DFA parser in
> PHP is probably not very useful.

Sorry, I got confused and wrote DFA when I should have written LALR.
DFAs cannot parse even the allowed subset of AMS-LaTeX, because there
are some permitted environments.

Without claiming to know much formal language theory, a rule of thumb is
that languages with matched delimiters were never regular, because of
the pumping lemma:
    http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages

So, for example, it's theoretically impossible to check that parentheses
nested correctly using regular expressions, and similarly it'd be
impossible to check that the \begin and \end commands matched up.

In practice there might be ways to hack around that by using multiple
regular expressions and manually tracking how they nest, but at that
point we're basically writing half of a bad LALR parser.

Fortunately, though, Python has parser generators! And if we're really
concerned about speed, there's PyBison, which does the parsing in C and
apparently produces (at least) five-fold improvements over Python-native
alternatives.

Yours,
Damon Wang

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to