Module Name: othersrc Committed By: agc Date: Thu Feb 23 19:16:31 UTC 2023
Modified Files: othersrc/external/bsd/elex: README Log Message: Fix typos and case issues, and properly hyphenate words. From Brad Harder. To generate a diff of this commit: cvs rdiff -u -r1.1 -r1.2 othersrc/external/bsd/elex/README Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
Modified files: Index: othersrc/external/bsd/elex/README diff -u othersrc/external/bsd/elex/README:1.1 othersrc/external/bsd/elex/README:1.2 --- othersrc/external/bsd/elex/README:1.1 Thu Dec 9 04:15:25 2021 +++ othersrc/external/bsd/elex/README Thu Feb 23 19:16:31 2023 @@ -3,7 +3,7 @@ Elex - an embeddable regexp-based lexer I have found myself fairly often needing a lexer utility to tokenise input (for configuration files, for various file-scanning utilities, -and for other applciations), but using full-blown lex(1) program to do +and for other applications), but using full-blown lex(1) program to do this is overkill, or designed for a separate process, which doesn't fit well with the design - syntax-coloring editors, for example. @@ -35,13 +35,13 @@ another side effect is the ability to us such as perl escapes, UTF-8 matching, in-subexpression ignore case, etc. elex implements start states, similar to flex. These are useful for -recognising multiline comments (almost any language), or multi-line +recognising multi-line comments (almost any language), or multi-line strings (perl, python, lua etc). elex dynamically sizes the regmatch arrays used to accommodate the largest regexp in the input, and matching subexpressions can be returned to the caller. The 0'th subexpression is the whole matching -expression, and is the same as "yytext". +expression, and is the same as "yytext" in lex(1). And so on to an elex definition which recognises C and some C++: @@ -81,7 +81,7 @@ And so on to an elex definition which re <INITIAL>#[ \t]*(define|el(se|if)|endif|error|if|ifn?def|include|line|pragma|undef)[^\n]*</> { return PREPROC; } Start states are explicitly used for rules, since it is easier to read -in practice. Elex comments are eol-style comments, beginning '#' and +in practice. elex comments are eol-style comments, beginning '#' and ending with '\n'. Types can be defined using the "%type" directive, and the unsigned 32bit value they take will be returned. This is more work than using magic constants, but much more readable in practice - @@ -102,7 +102,7 @@ Start states can be defined using the %s transitioned to using the BEGIN() action, in the same way as standard lex(1). -Elex provides bookmarks, which are numbered numerically from 0. +elex provides bookmarks, which are numbered numerically from 0. Assuming a mark has already been successfully created using "set-mark", the bookmark offset can be retrieved by using its index using "get-mark", and the user can then seek to that offset. This @@ -300,7 +300,7 @@ return to the calling program. Since a input, it is advised to define return types starting at 1. Historically, in lex definitions, the user-defined types started at 256, and it was common to return ASCII values for single characters up -to 256. Since this is no longer acceptable in a world with multibyte +to 256. Since this is no longer acceptable in a world with multi-byte characters, and because we tend to tokenise based on types of input tokens, hopefully this practice will never be used again. @@ -330,7 +330,7 @@ in this. Usually, when tokenising programming language, there would be a number of definitions for reserved words, and standard identifiers. There would also be definitions for punctuation, and numeric and string constants. -Some languages have definitions for multiline strings. +Some languages have definitions for multi-line strings. Alistair Croooks Thu Nov 18 16:57:44 PST 2021