The GLSL Language Specification (version 4.30.6) is quite clear about the GLSL character set and the expected behavior for other characters:
Section 3.1 Character Set The source character set used for the OpenGL shading languages, outside of comments, is a subset of UTF-8. It includes the following characters: The letters a-z, A-Z, and the underscore ( _ ). The numbers 0-9. The symbols period (.), plus (+), dash (-), slash (/), asterisk (*), percent (%), angled brackets (< and >), square brackets ( [ and ] ), parentheses ( ( and ) ), braces ( { and } ), caret (^), vertical bar (|), ampersand (&), tilde (~), equals (=), exclamation point (!), colon (:), semicolon (;), comma (,), and question mark (?). The number sign (#) for preprocessor use. The backslash (\) as the line-continuation character when used as the last character of a line, just before a new line. White space: the space character, horizontal tab, vertical tab, form feed, carriage-return, and line-feed. A compile-time error will be given if any other character is used outside a comment. By taking the set of all possible 8-bit characters, and subtracting the above, we have the set of illegal characters: 0x00 - 0x08 (^A - ^H) 0x0E - 0x1F (^N - ^Z, ^[, ^\, ^], ^^, ^_) 0x22 (") 0x24 ($) 0x27 (') 0x40 (@) 0x60 (') 0x7F (DEL or ^?) 0x80 - 0xFF (non-ASCII) As well as (#) outside of uses defined by the preprocessor (not starting a directive, nor as part of a legal paste operator in a replacement list), and (\) appearing anywhere but at the end of a line. So instead of the previous whitelist we had for "OTHER" characters, we know add a blacklist for "ILLEGAL" characters based on the above, and then use a simply regular expression of "." to catch any characters that get past the blacklist. This approach also means the internal-error rule with "." can no longer be matched, so it goes away now. --- src/glsl/glcpp/glcpp-lex.l | 32 +++++++++++--------------------- 1 file changed, 11 insertions(+), 21 deletions(-) diff --git a/src/glsl/glcpp/glcpp-lex.l b/src/glsl/glcpp/glcpp-lex.l index 0dbdab0..790035c 100644 --- a/src/glsl/glcpp/glcpp-lex.l +++ b/src/glsl/glcpp/glcpp-lex.l @@ -175,15 +175,7 @@ HASH # IDENTIFIER [_a-zA-Z][_a-zA-Z0-9]* PP_NUMBER [.]?[0-9]([._a-zA-Z0-9]|[eEpP][-+])* PUNCTUATION [][(){}.&*~!/%<>^|;,=+-] - -/* The OTHER class is simply a catch-all for things that the CPP -parser just doesn't care about. Since flex regular expressions that -match longer strings take priority over those matching shorter -strings, we have to be careful to avoid OTHER matching and hiding -something that CPP does care about. So we simply exclude all -characters that appear in any other expressions. */ - -OTHER [^][_#[:space:]#a-zA-Z0-9(){}.&*~!/%<>^|;,=+-] +ILLEGAL [\x00-\x08\x0E-\x1F"$'@`\x7F\x80-\xFF\\] DIGITS [0-9][0-9]* DECIMAL_INTEGER [1-9][0-9]*[uU]? @@ -276,9 +268,10 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]? * token. */ if (parser->first_non_space_token_this_line) { BEGIN HASH; + RETURN_TOKEN_NEVER_SKIP (HASH_TOKEN); + } else { + glcpp_error(yylloc, yyextra, "Illegal character '#' (not a preprocessing directive)"); } - - RETURN_TOKEN_NEVER_SKIP (HASH_TOKEN); } <HASH>version{HSPACE}+ { @@ -505,8 +498,8 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]? RETURN_TOKEN (yytext[0]); } -{OTHER}+ { - RETURN_STRING_TOKEN (OTHER); +{ILLEGAL} { + glcpp_error(yylloc, yyextra, "Illegal character '%c'", yytext[0]); } {HSPACE} { @@ -539,14 +532,7 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]? RETURN_TOKEN (NEWLINE); } - /* This is a catch-all to avoid the annoying default flex action which - * matches any character and prints it. If any input ever matches this - * rule, then we have made a mistake above and need to fix one or more - * of the preceding patterns to match that input. */ - -<*>. { - glcpp_error(yylloc, yyextra, "Internal compiler error: Unexpected character: %s", yytext); - +<UNREACHABLE>. { /* We don't actually use the UNREACHABLE start condition. We only have this block here so that we can pretend to call some generated functions, (to avoid "defined but not used" @@ -557,6 +543,10 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]? } } +<*>. { + RETURN_STRING_TOKEN (OTHER); +} + %% void -- 2.0.0 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev