Hi, I'm new to working with ANTLR and ANTLRWorks. I really appreciate what ANTLR and ANTLRWorks offer and purchased "The Definitive ANTLR Reference" and "Language Implementation Patterns" to get a better understanding on how to use ANTLR. I have some background with lexers and parsers and EBNF.
I'd like to write a lexer/parser which is able to recognize a character-delimited format with nested field groups and transform this data in XML-style data. Here are some examples how the data looks like: Example 1: Obj1^Verb1^field1^field2^ Example 2: Obj1^Verb1^field1^field2^1^Obj2^Verb2^field3^field4^ Example 3: Obj1^Verb1^field1^field2^2^Obj2^Verb2^field3_1^field4_1^^Obj2^Verb2^field3_2^field4_2^ Example 4: Obj1^Verb1^field1^field2^2^Obj2^Verb2^field3_1^field4_1^^Obj2^Verb2^field3_2^field4_2^1^Obj3^Verb3^field5^ The core grammar behind this looks like this: object SEP verb SEP (fieldContents SEP)+ (recordCount SEP (object SEP verb SEP (fieldContents SEP)+)+)* where SEP is the delimiter ('^' in this case) and recordCount is an integer which indicates how many (sub)records come after it. >From my understanding this grammar is of type LL(*) because the "recordCount" can occur after an arbitrary number of fields due this part of the rule: (fieldContents SEP)+. I managed to write a grammar which can parse example 1 but fails for all other examples: grammar DLM; data : objectGroup subObjectGroup* ; objectGroup : objectName SEP verbName SEP (fieldData SEP)+; subObjectGroup : recordCount SEP objectGroup+; objectName : 'Obj1' | 'Obj2' | 'Obj3' ; verbName : 'Verb1' | 'Verb2' | 'Verb3' ; fieldData : NONSEP* ; // field can be empty; recordCount : INT ; NONSEP : ~('^')+ ; SEP : '^'; fragment INT : '0'..'9'+; This grammar just stops when it reaches token "Obj2". I rewrote rule "data" like this: data : objectGroup subObjectGroup+ | objectGroup; This time it failed at token "Obj2" with a NoViableAltException. I tried to use options {backtrack=true; memoize=true;} for the whole grammar and rule "data" only but this didn't help. I also tried to use predicates like this: subObjectGroup : (INT SEP objectName) => recordCount SEP objectGroup+; but this didn't help either. So I'd really appreciate some hints on how to make the other examples parse. Thanks. Best regards, Florian List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.