[il-antlr-interest: 33298] Re: [antlr-interest] 'Dude' error in v3.4 and possible bugs explained [was: on "crap" grammars]

Justin Murray Thu, 21 Jul 2011 12:28:03 -0700

I think that Vlad may be onto something here. From what I can tell from my 
generated grammar, this only affects ANTLR3_MISMATCHED_SET_EXCEPTION type 
exceptions. My grammar has several hundred parser rules, but only in 4 cases is 
a ANTLR3_MISMATCHED_SET_EXCEPTION generated. In all 4 cases, the expectingSet 
is being set to NULL, and in no other cases is expectingSet being set to NULL. 
I agree that this would be improved if changed as Vlad described.


It just so happens that the way I implemented my exception handling, I treat 
ANTLR3_MISMATCHED_SET_EXCEPTION the same as ANTLR3_RECOGNITION_EXCEPTION, and 
don't bother to display the expectingSet, so I never would have discovered this 
problem.

Since I recently figured out how the C template works, I decided to take a 
peek. I found the following in 
antlr-3.4-complete-no-antlrv2.jar/org/antlr/codegen/templates/C/C.stg:

<if(PARSER)>
EXCEPTION->expectingSet = NULL;
<! use following code to make it recover inline;
EXCEPTION->expectingSet = &FOLLOW_set_in_<ruleName><elementIndex>;
!>
<endif>

So it appears that this was done explicitly at some point. You could edit C.stg 
to uncomment the code above, and I imagine that it will generate the correct 
follow set pointer. Perhaps Jim knows why this is like this? This may be 
avoiding some other problems, so I don't know how safe of a change this would 
be.

- Justin

On 7/21/2011 2:45 PM, Vlad wrote: 

        Previously I was on 3.2 runtime. It occurred to me to try 3.4 released 
a day ago. To this end I've switched to 3.4-beta4 runtime as well. Using one of 
the testerrors.g grammars with non-inlined int/float tokens and parser 
generated by antlr-3.4-complete.jar I now get on input string "name : bad": 

        <string>(1)  : error 4 : Unexpected token, at offset 6
            near [Index: 4 (Start: 31458399-Stop: 31458401) ='bad', type<6> 
Line: 1 LinePos:6]
             : unexpected input...
          expected one of : Actually dude, we didn't seem to be expecting 
anything here, or at least
        I could not work out what I was expecting, like so many of us these 
days!

        (this required switching to antlr3StringStreamNew() from 
antlr3NewAsciiStringInPlaceStream() as was posted by Jim here: 
http://groups.google.com/group/il-antlr-interest/browse_thread/thread/981a79239e352c89
 and as is mentioned within that thread the last argument can't be NULL to 
avoid a segfault).

        So, this is better because at least the offending token is identified 
correctly. The reason the expected set is still not identified correctly (the 
'Dude' part) is because the generated error path for the 'type' non-terminal 
always sets the exception's expectingSet to NULL:

                {
                    if ( ((LA(1) >= AT_FLOAT_) && (LA(1) <= AT_INT_)) )
                    {
                        CONSUME();
                        PERRORRECOVERY=ANTLR3_FALSE;
                    }
                    else
                    {
                        CONSTRUCTEX();
                        EXCEPTION->type         = 
ANTLR3_MISMATCHED_SET_EXCEPTION;
                        EXCEPTION->name         = (void 
*)ANTLR3_MISMATCHED_SET_NAME;
                        EXCEPTION->expectingSet = NULL; // <--- ????

                        goto ruletypeEx;
                    }


                }

        I might be called names again, but I'd say this error handling does not 
look correct because the rule knows exactly what token set it expects right 
here but then goes ahead and ignores that info for the purposes of generating 
exception info (what's the point in indicating ANTLR3_MISMATCHED_SET_NAME if 
that set is always set to NULL).

        Examining the generated parser code, I in fact see what appears to be a 
correct set that would be FOLLOW(':'): it has bits set for AT_FLOAT_ and 
AT_INT_ and is FOLLOWPUSH()ed before the rule is entered.

        By manually doctoring the parser code to set  EXCEPTION->expectingSet 
to point to this FOLLOW set, I get rid of the 'Dude' message but hit on another 
bug in displayRecognitionError() that prints the wrong two token names:

        <string>(1)  : error 4 : Unexpected token, at offset 6
            near [Index: 4 (Start: 13845599-Stop: 13845601) ='bad', type<6> 
Line: 1 LinePos:6]
             : unexpected input...
          expected one of : <EOR>, <DOWN>

        Looking at the stock displayRecognitionError() code, it is clear that 
the loop over the set bits is not correct (the TODO is right). Fixing it by 
adding errBits->isMember(errBits, bit):

        for (bit = 1; bit < numbits && count < 8 && count < size; bit++)
        {
        // TODO: This doesn;t look right - should be asking if the bit is set!!
        //
        if  (errBits->isMember(errBits, bit) && tokenNames[bit]) // <--- ???? 
was missing bitset member check
        {
        ANTLR3_FPRINTF(stderr, "%s%s", count > 0 ? ", " : "", tokenNames[bit]); 
        count++;
        }
        }

        finally gets me the error message that makes sense:

        <string>(1)  : error 4 : Unexpected token, at offset 6
            near [Index: 4 (Start: 30442591-Stop: 30442593) ='bad', type<6> 
Line: 1 LinePos:6]
             : unexpected input...
          expected one of : AT_FLOAT_, AT_INT_


        "Crap" grammars, I hear somebody said? Hmm, I don't think so...





List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 33298] Re: [antlr-interest] 'Dude' error in v3.4 and possible bugs explained [was: on "crap" grammars]

Reply via email to