The GLSL Language Specification (version 4.30.6) is quite clear about the GLSL
character set and the expected behavior for other characters:

    Section 3.1 Character Set

    The source character set used for the OpenGL shading languages, outside of
    comments, is a subset of UTF-8. It includes the following characters:

        The letters a-z, A-Z, and the underscore ( _ ).

        The numbers 0-9.

        The symbols period (.), plus (+), dash (-), slash (/), asterisk (*),
        percent (%), angled brackets (< and >), square brackets ( [ and ] ),
        parentheses ( ( and ) ), braces ( { and } ), caret (^), vertical bar
        (|), ampersand (&), tilde (~), equals (=), exclamation point (!),
        colon (:), semicolon (;), comma (,), and question mark (?).

        The number sign (#) for preprocessor use.

        The backslash (\) as the line-continuation character when used as the
        last character of a line, just before a new line.

        White space: the space character, horizontal tab, vertical tab, form
        feed, carriage-return, and line-feed.

    A compile-time error will be given if any other character is used outside
    a comment.

By taking the set of all possible 8-bit characters, and subtracting the above,
we have the set of illegal characters:

    0x00 - 0x08 (^A - ^H)
    0x0E - 0x1F (^N - ^Z, ^[, ^\, ^], ^^, ^_)
    0x22 (")
    0x24 ($)
    0x27 (')
    0x40 (@)
    0x60 (')
    0x7F (DEL or ^?)
    0x80 - 0xFF (non-ASCII)

As well as (#) outside of uses defined by the preprocessor (not starting a
directive, nor as part of a legal paste operator in a replacement list), and
(\) appearing anywhere but at the end of a line.

So instead of the previous whitelist we had for "OTHER" characters, we know
add a blacklist for "ILLEGAL" characters based on the above, and then use a
simply regular expression of "." to catch any characters that get past the
blacklist.

This approach also means the internal-error rule with "." can no longer be
matched, so it goes away now.
---
 src/glsl/glcpp/glcpp-lex.l | 32 +++++++++++---------------------
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/src/glsl/glcpp/glcpp-lex.l b/src/glsl/glcpp/glcpp-lex.l
index 0dbdab0..790035c 100644
--- a/src/glsl/glcpp/glcpp-lex.l
+++ b/src/glsl/glcpp/glcpp-lex.l
@@ -175,15 +175,7 @@ HASH               #
 IDENTIFIER     [_a-zA-Z][_a-zA-Z0-9]*
 PP_NUMBER      [.]?[0-9]([._a-zA-Z0-9]|[eEpP][-+])*
 PUNCTUATION    [][(){}.&*~!/%<>^|;,=+-]
-
-/* The OTHER class is simply a catch-all for things that the CPP
-parser just doesn't care about. Since flex regular expressions that
-match longer strings take priority over those matching shorter
-strings, we have to be careful to avoid OTHER matching and hiding
-something that CPP does care about. So we simply exclude all
-characters that appear in any other expressions. */
-
-OTHER          [^][_#[:space:]#a-zA-Z0-9(){}.&*~!/%<>^|;,=+-]
+ILLEGAL                [\x00-\x08\x0E-\x1F"$'@`\x7F\x80-\xFF\\]
 
 DIGITS                 [0-9][0-9]*
 DECIMAL_INTEGER                [1-9][0-9]*[uU]?
@@ -276,9 +268,10 @@ HEXADECIMAL_INTEGER        0[xX][0-9a-fA-F]+[uU]?
          * token. */
        if (parser->first_non_space_token_this_line) {
                BEGIN HASH;
+               RETURN_TOKEN_NEVER_SKIP (HASH_TOKEN);
+       } else {
+               glcpp_error(yylloc, yyextra, "Illegal character '#' (not a 
preprocessing directive)");
        }
-
-       RETURN_TOKEN_NEVER_SKIP (HASH_TOKEN);
 }
 
 <HASH>version{HSPACE}+ {
@@ -505,8 +498,8 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]?
        RETURN_TOKEN (yytext[0]);
 }
 
-{OTHER}+ {
-       RETURN_STRING_TOKEN (OTHER);
+{ILLEGAL} {
+       glcpp_error(yylloc, yyextra, "Illegal character '%c'", yytext[0]);
 }
 
 {HSPACE} {
@@ -539,14 +532,7 @@ HEXADECIMAL_INTEGER        0[xX][0-9a-fA-F]+[uU]?
                RETURN_TOKEN (NEWLINE);
 }
 
-       /* This is a catch-all to avoid the annoying default flex action which
-        * matches any character and prints it. If any input ever matches this
-        * rule, then we have made a mistake above and need to fix one or more
-        * of the preceding patterns to match that input. */
-
-<*>. {
-       glcpp_error(yylloc, yyextra, "Internal compiler error: Unexpected 
character: %s", yytext);
-
+<UNREACHABLE>. {
        /* We don't actually use the UNREACHABLE start condition. We
        only have this block here so that we can pretend to call some
        generated functions, (to avoid "defined but not used"
@@ -557,6 +543,10 @@ HEXADECIMAL_INTEGER        0[xX][0-9a-fA-F]+[uU]?
        }
 }
 
+<*>. {
+       RETURN_STRING_TOKEN (OTHER);
+}
+
 %%
 
 void
-- 
2.0.0

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to