[il-antlr-interest: 31329] Re: [antlr-interest] Catching errors

Victor Giordano Wed, 02 Feb 2011 19:36:52 -0800

[Updated]I am watching when i use the generated lexer and parser 
(Generated from the LinearMath grammar below) in a java application is 
that do really emit somekind of warning about two thinks:


1)extraneous input '<some_token>' expecting EOF *Only when a append the 
EOF token at the end of the rule*
2)required (...)+ loop did not match anything at input <some_token>' 
*Only when i use the '+' quantity token modifier*

where <some_token> there is actually token.

In fact the warnings is actually are a strings sended to the standart error.

The matter is, again, how do i do to manage those errors altering normal 
flow with a real exception and treating it like one.
Ok, so far this.
Sorry for the bombing of emails!. Thanks for advance.
Víctor.




El 02/02/2011 11:22 p.m., Victor Giordano escribió:
> Okey. So adding and EOF forces the parser to go to the end of the input
> in search of others tokens in correct order.
>
> 1)But a still have a problem, consider the following grammar:
>
> grammar LinearMath;
>
> tokens
> {
>       PLUS     = '+';
>       MINUS     = '-';
>       MUL        = '*';
>       DIV        = '/';
> }
>
> inecuation:   linexpr ((RELATIONSHIP) linexpr)+ EOF!;
> catch [UnwantedTokenException ute]
> {
>       System.out.println ("inecuation UnwantedTokenException  " +
> ute.toString());
>       throw ute;
> }
>
> linexpr : (MINUS|PLUS)? linterm ((PLUS|MINUS) linterm)* EOF;
>
> linterm : factor? ID;
>
> expr returns [double value]
>       : e=term {$value = $e.value;}
>       (    PLUS e=term {$value += $e.value;}
>       |    MINUS e=term {$value -= $e.value;}
>       )*;
>
> term returns [double value]
>       : f=factor {$value = $f.value;}
>       (    MUL f=factor {$value *= $f.value;}
>       |    DIV f=factor {$value /= $f.value;}
>       )*;
>
> factor returns [double value]
>       : DOUBLE {$value = Double.parseDouble($DOUBLE.text);}
>       | '(' e=expr ')'{$value = $e.value;};
>
> ID  :    ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
>
> DOUBLE
>       :   ('0'..'9')+
>       |    ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
>         |   '.' ('0'..'9')+ EXPONENT?
>         |   ('0'..'9')+ EXPONENT
>         ;
>
> fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
>
> NEWLINE:'\r'? '\n' { $channel = HIDDEN; };
>
> WS  :   (' '|'\t'|'\n'|'\r')+ { $channel = HIDDEN; };
>
>
> RELATIONSHIP :        '<'|'<='|'='|'>'|'>=';
>
> and with the following input: "x<  y x"
> that isn't a valid inecuation beacause the y x must have a binary
> aritmetic operator (PLUS OR MINUS). The parser do his job very well, he
> consume the "x" then "<" later "y" and when it reachs the seconds "x" it
> emits an "UnwantedTokenException". The think is, that i am not being
> able to catch it, and display an error to the final user. Look that i am
> using to parse that input the inecuation "rule".
>
> Hope anyone can help me with this again.
>
> 2) Other thing is about invalid tokens, i manage to treat then
> overriding a member function of the lexer called nextToken(), like this:
>
> @lexer::members
> {
>       @Override
>       public Token nextToken()
>       {
>               while (true) {
>                       state.token = null;
>                       state.channel = Token.DEFAULT_CHANNEL;
>                       state.tokenStartCharIndex = input.index();
>                       state.tokenStartCharPositionInLine = 
> input.getCharPositionInLine();
>                       state.tokenStartLine = input.getLine();
>                       state.text = null;
>                       if ( input.LA(1)==CharStream.EOF ) {
>                               return Token.EOF_TOKEN;
>                       }
>                       try {
>                               mTokens();
>                               if ( state.token==null ) {
>                                       emit();
>                               }
>                               else if ( state.token==Token.SKIP_TOKEN ) {
>                                       continue;
>                               }
>                               return state.token;
>                       }
>                       catch (RecognitionException re) {
>                               reportError(re);
>                               throw new RuntimeException("Invalid Character  
> : " + (char) (re.c));
> // or throw Error
>                       }
>               }
>       }
> }
>
> ¿It's that the correct way?
>
> Well that is all!!!
> Thanks for advance!.
> Victor!!
>
> El 02/02/2011 05:32 p.m., John B. Brodie escribió:
>> Your grammar does not mention the EOF token. (more below...)
>> On Wed, 2011-02-02 at 16:18 -0300, Victor Giordano wrote:
>>> Hi there. I am having trouble with the error handling.
>>> I have a grammar for recoignize linear expression. And it works great!.
>>> The grammar for a linear expresion is the following:
>>>
>>> tokens
>>> {
>>>     PLUS    = '+';
>>>     MINUS   = '-';
>>>     MUL             = '*';
>>>     DIV             = '/';
>>> }
>>>
>>> linexpr : (MINUS|PLUS)? linterm ((PLUS|MINUS) linterm)*;
>>> linterm : factor? ID;
>>>
>>> expr returns [double value]
>>>     : e=term {$value = $e.value;}
>>>     (       PLUS e=term {$value += $e.value;}
>>>     |       MINUS e=term {$value -= $e.value;}
>>>     )*;
>>>
>>> term returns [double value]
>>>     : f=factor {$value = $f.value;}
>>>     (       MUL f=factor {$value *= $f.value;}
>>>     |       DIV f=factor {$value /= $f.value;}
>>>     )*;
>>>
>>> factor returns [double value]
>>>     : DOUBLE {$value = Double.parseDouble($DOUBLE.text);}
>>>     | '(' e=expr ')'{$value = $e.value;};
>>>     
>>> ID  :       ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
>>>
>>> DOUBLE
>>>     :   ('0'..'9')+
>>>     |       ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
>>>        |   '.' ('0'..'9')+ EXPONENT?
>>>        |   ('0'..'9')+ EXPONENT
>>>        ;
>>>
>>> fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
>>>
>>> NEWLINE:'\r'? '\n' { $channel = HIDDEN; };
>>>
>>> WS  :   (' '|'\t'|'\n'|'\r')+ { $channel = HIDDEN; };
>>>
>>>
>>> But the problem ocurrs when, for example, i have:
>>> "x x x"
>>>
>>> Then the parsers stop after processing the first "x".
>>> ¿How do i correctly emit an invalid syntax error?.
>>> I Try with the catch EarlyExitException, but it doesn't works.
>>> I Want, inside my java aplicacition to catch this, and show to the final
>>> user.
>>> Something like this...
>>> //line is equals to the user input...
>>>
>>>                CharStream cs = new ANTLRStringStream(line);
>>>                LinearExpressionLexer lexer = new LinearExpressionLexer(cs);
>>>                CommonTokenStream tokens = new CommonTokenStream(lexer);
>>>                LinearExpressionParser parser = new
>>> LinearExpressionParser(tokens);
>>>                res = parser.linexpr (); // and here, it's suppose to fail,
>>> but it isn't.
>>> Actually, the linexpr does returns some kind of data whose type is a
>>> custom class called LinearExpresion. I omit to put the return in the
>>> linearexpr parser rule to simplify things.
>>>
>>> Hope anyone can help me.
>>> Greettings and thanks for advance.
>>
>> Greetings!
>>
>> By design ANTLR parsers stop after consuming the longest possible VALID
>> input sequence. I believe the rational for this is that any remaining
>> input will be available for some other tool to process.
>>
>> If you want ANTLR to try to process the entire input, reporting and
>> recovering from syntax errors in the input; you must tell it to do that.
>>
>> By referring to the EOF token (a special built-in token) in your
>> top-most rule will cause ANTLR to consume the entire input string. E.g.
>> the parse will not have a valid input until the EOF is seen and so will
>> consume all of the input sentence.
>>
>> I suggest adding a top-level rule similar to:
>>
>> start : linexpr EOF! ;
>>
>> and then call parser.start() instead of parser.linexpr() in your driver.
>>
>> (note the ! meta-character after the EOF token above will keep the EOF
>> out of any AST produced, but you do not seem to be building an AST so it
>> won't make any difference...)
>>
>> Hope this helps...
>>      -jbb
>>
>>
>>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 31329] Re: [antlr-interest] Catching errors

Reply via email to