Hi group, this is not really a Python question, but I use Python to lex/parse some input. In particular, I use the amazing TPG (http://cdsoft.fr/tpg/). However, I'm now stuck at a point and am sure I'm not doing something correctly -- since there's a bunch of really smart people here, I hope to get some insights. Here we go:
I've created a minimal example in which I'm trying to parse some tokens (strings and ints in the minimal example). Strings are delimited by braces (). Therefore (Foo) -> "Foo" Braces inside braces are taken literally when balanced. If not balanced, it's a parsing error. (Foo (Bar)) -> "Foo (Bar)" Braces may be escaped: (Foo \)Bar) -> "Foo )Bar" In my first (naive) attempt, I ignored the escaping and went with lexing and then these rules: token string_token '[^()]*'; [...] String/s -> start_string $ s = "" ( string_token/e $ s += e | String/e $ s += "(" + e + ")" )* end_string ; While this worked a little bit (with some erroneous parsing, admittedly), at least it *somewhat* worked. In my second attempt, I tried to do it properly. I omitted the tokenization and instead used inline terminals (which have precendence in TPG): String/s -> start_string $ s = "" ( '\\.'/e $ s += "ESCAPED[" + e + "]" | '[^\\()]+'/e $ s += e | String/e $ s += "(" + e + ")" )* end_string ; (the "ESCAPED" part is just for demonstration to get the idea). While the latter parser parses all strings perfectly, it now isn't able to parse anything else anymore (including integer values!). Instead, it appears to match the inline terminal '[^\\()]+' to my integer and then dies (when trying, for example, to parse "12345"): [ 1][ 3]START.Expression.Value: (1,1) _tok_2 12345 != integer [ 2][ 3]START.Expression.String: (1,1) _tok_2 12345 != start_string Traceback (most recent call last): File "example.py", line 56, in <module> print(Parser()(example)) File "example/tpg.py", line 942, in __call__ return self.parse('START', input, *args, **kws) File "example/tpg.py", line 1125, in parse return Parser.parse(self, axiom, input, *args, **kws) File "example/tpg.py", line 959, in parse value = getattr(self, axiom)(*args, **kws) File "<string>", line 3, in START File "<string>", line 14, in Expression UnboundLocalError: local variable 'e' referenced before assignment "_tok_2" seems to correspond to one of the inline terminal symbols, the only one that fits would be '[^\\()]+'. But why would that *ever* match? I thought it'd only match once a "start_string" was encountered (which it isn't). Since I'm the parsing noob, I don't think TPG (which is FREAKING AMAZING, seriously!) is at fault but rather my understanding of TPG. Can someone help me with this? I've uploaded a complete working example to play around with here: http://wikisend.com/download/642120/example.tar.gz (if it's not working, please tell me and I'll look for some place else). Thank you so much for your help, Best regards, Johannes -- >> Wo hattest Du das Beben nochmal GENAU vorhergesagt? > Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1...@speranza.aioe.org> -- https://mail.python.org/mailman/listinfo/python-list