I'm a firm believer in the nextgroup directive for defining syntaxes.
It allows you to define grammars, which I really enjoy doing.
However, one problem is that many languages allow things to appear in
their input that's not part of the language's grammar.  For example,
many languages allow comments to appear almost anywhere in the input,
which are stripped out of the input while lexing the input into tokens
that are then fed to the actual parser.  Now comments could be a part
of the grammar, simply being thrown away at that point in the process,
but it forces you to provide for the possibility of a comment
appearing basically anywhere between terminals/non-terminals.

Anyway, what I'm actually suggesting is a way to get around this issue
by adding a new directive to the :syntax command that can be used
alongside nextgroup to skip certain syntax groups before trying the
groups defined by nextgroup.  This is much like skipwhite, skipnl, and
skipempty, but for arbitrary syntax groups.

Here's an example of what I intend for it to do:

syn keyword tocTodo
     \ contained
     \ TODO
     \ FIXME
     \ XXX
     \ NOTE

syn match   tocComment
     \ contains=tocTodo,@Spell
     \ '//.*$'

syn keyword tocHeaderKeyword
     \ nextgroup=tocCatalogNumber
     \ skip=tocComment
     \ skipwhite
     \ skipempty
     \ CATALOG

syn match   tocCatalogNumber
     \ contained
     \ '"\d\{13\}"'

This is a partial grammar that matches comments and the CATALOG
keyword in the header part of a cdrdao(1) TOC file (yes, I'm writing a
grammar for such files).  Comments begin with a set of slashes and can
appear anywhere in the file.  The CATALOG keyword is followed by a
(optional, but let's keep it simple for this example) catalog number.
The idea here is that the skip=tocComment directive to
tocHeaderKeyword will tell the syntax highlighting engine that it
should skip any matches to tocComment that follow tocHeaderKeyword,
just as the skipwhite and skipempty pair tells it to skip whitespace
and empty lines (before and after any tocComments) before it tries to
match a tocCatalogNumber.

I have no idea how hard this would be to implement, but I'm thinking
that it can't be too difficult.  It should "only" be to add some
handling around the code that handles skipwhite/skipnl/skipempty to go
through a list of syntax groups and try to match them, highlighting
them, and then trying to highlight whatever is in nextgroup
afterwards.

I'm sure there are edge cases to consider, but I can't think that it
should be impossible.  I sadly don't have any understanding of the Vim
syntax highlighter, so someone with more knowledge will have to help
me out.

Comments?  Patches?  Complaints?

 nikolai

P.S.
Yes, I know that this can be solved by keeping track of the context by
adding a tocXComment for each and every :syntax ... X ... definition
(production) that keeps track of what the nextgroup of the production
in question was and adding the tocXComment production to that
productions nextgroup, but that doubles the number of productions and
makes it a lot harder to change it later on.

Here's an example of what that looks like for the grammar above:

syn keyword tocTodo
     \ contained
     \ TODO
     \ FIXME
     \ XXX
     \ NOTE

syn match   tocComment
     \ contains=tocTodo,@Spell
     \ '//.*$'

syn keyword tocHeaderKeyword
     \ nextgroup=
     \   tocCatalogNumberComment,
     \   tocCatalogNumber
     \ skipwhite
     \ skipempty
     \ CATALOG

syn match   tocCatalogNumberComment
     \ nextgroup=tocCatalogNumber
     \ skipwhite
     \ skipempty
     \ contains=tocTodo,@Spell
     \ contained
     \ '//.*$'

syn match   tocCatalogNumber
     \ contained
     \ '"\d\{13\}"'

Of course, all those additional groups can be automatically generated,
given a grammar, but again, it makes it harder to follow, harder to
change, and more memory-hungry than what a grammar using the (still
fictional) skip directive.
D.S.

Reply via email to