I may have hit a bug in the lemon parser generator.

It looks like lookahead symbols aren't propagated properly, in some cases.

The consequence is that in some states, valid symbols are rejected with
a syntax error.

The complete grammar can be found at

    http://www.zweije.nl/~vzweije/parser.y

I'm processing the grammar with lemon version 3.7.7-2ubuntu2. I've
compared that version to the current head in sqlite, and it only differs
in non-essential areas (calls to fclose recently added in trunk, calls
to snprintf added in ubuntu). I'm pretty sure the current head version
of lemon will exhibit the problem as well.

For completeness' sake, this was found on an armv7l system.

I have just reproduced the problem with the head lemon.c on an i386
system. Valgrind reports no stray memory accesses (though quite a few
memory leaks, but those are probably harmless).

The command to process the parser is:

lemon -c -s parser.y

The generated parser has, among others, the following state:

State 244:
          constructor_def ::= existental_quant_variables_opt upper_name * 
fix_opt type_clos
    (200) fix_opt ::= *
          fix_opt ::= * Fix

                          Open reduce 200
                     CurlyOpen reduce 200
                          Hash reduce 200
                      RectOpen reduce 200
                             A reduce 200
                           Dot reduce 200
                       Exclaim reduce 200
                          Star reduce 200
                           Int reduce 200
                          Real reduce 200
                          Char reduce 200
                          Bool reduce 200
      RectOpenExclaimRectClose reduce 200
                       Dynamic reduce 200
                          File reduce 200
                        String reduce 200
                         World reduce 200
                           Fix shift  486
                   LowerCaseId reduce 200
                   UpperCaseId reduce 200
                       FunnyId reduce 200
                       fix_opt shift  130

As can be seen, fix_opt is nullable - it may derive the empty string.
Therefore, any symbol that can start what follows it (type_clos),
or any lookahead symbol (because type_clos is nullable too), must
be acceptable in this state. However, some tokens (Bar, Semicolon,
LowerCaseId, UpperCaseId, FunnyId) are not.

The next state lists the acceptable symbols there:

State 130:
          type_clos ::= * open_type_clos
          type_clos ::= * closed_type_clos
          open_type_clos ::= * closed_type_clos type_prefix_seq
          closed_type_seq ::= * closed_type_clos type_prefix_clos 
type_expression
          closed_type_clos ::= * closed_type_seq
    (144) closed_type_clos ::= *
          constructor_def ::= existental_quant_variables_opt upper_name fix_opt 
* type_clos

                          Open reduce 144
                     CurlyOpen reduce 144
                           Bar reduce 144
                          Hash reduce 144
                      RectOpen reduce 144
                             A reduce 144
                           Dot reduce 144
                       Exclaim reduce 144
                          Star reduce 144
                           Int reduce 144
                          Real reduce 144
                          Char reduce 144
                          Bool reduce 144
      RectOpenExclaimRectClose reduce 144
                       Dynamic reduce 144
                          File reduce 144
                        String reduce 144
                         World reduce 144
                   LowerCaseId reduce 144
                   UpperCaseId reduce 144
                       FunnyId reduce 144
                     Semicolon reduce 144
                     type_clos shift  563
               closed_type_seq shift  375
              closed_type_clos shift  179
                open_type_clos shift  491

Some of the missing symbols (LowerCaseId, UpperCaseId, FunnyId) are
starters of type_clos:

  164: type_clos: Open CurlyOpen Hash RectOpen A Dot Exclaim Star Int Real Char 
Bool RectOpenExclaimRectClose Dynamic File String World LowerCaseId UpperCaseId 
FunnyId

The other two (Bar, Semicolon) are not; these are followers of the
constructor_def symbol.

Looking at it now, <lambda> is not in the starters of type_clos,
though according to the grammar rules, it is obviously nullable
(type_clos => closed_type_close => <lambda>). Curious.

The problem disappears when fix_opt is deleted from the RHS of the
rule. (Of course, this means that a present Fix token is no longer
accepted.)

Ciao.                                                          Vincent.
-- 
Vincent Zweije <vinc...@zweije.nl>   | "If you're flamed in a group you
<http://www.xs4all.nl/~zweije/>      | don't read, does anybody get burnt?"
[Xhost should be taken out and shot] |            -- Paul Tomblin on a.s.r.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to