2010-08-12 09:30, Andreas Jonsson skrev:
[...]
>
> However, requiring a link to be properly closed in order to be a link
> is fairly complex.  What should the parser should do with the link
> title, if it desides that it is not really a link title after all?  It
> may contain tokens.  Thus, the lexer must use lookahead and not
> produce any spurious link open tokens.  To avoid the n^2 worst case, a
> full extra pass to compute hints would be necessary before doing the
> actual lexing.
Replying to myself.  I might be wrong about the complexity of finding
the closing token.  The below lexer hack may actually do the trick:
a rule that matches the empty string if there is a valid closing tag
ahead.  Since it does not search past '[[' tokens, no content will be
scanned more than once by this rule.  So the worst case running time
is still linear.

fragment
LINK_CLOSE_LOOKAHEAD
@init{
         bool success = false;
}:
     (
         ( /*
            * List of all other lexer rules that may contain the strings
            * ']]' or '[['.
            */
              BEGIN_TABLE
            | TABLE_ROW_SEPARATOR
            | TABLE_CELL
            | TABLE_CELL_INLINE
           /*
            * Alternative: don't search beyond other block elements:
            */
            //   ({BOL}?=> '{|')=>   '{|'            {false}?=>
            // | (LIST_ELEMENT)=>    LIST_ELEMENT    {false}?=>
            // | (NEWLINE NEWLINE)=> NEWLINE NEWLINE {false}?=>
           /*
            * Otherwise, anything goes except ']]' or '[['.
            */
            | ~('['|']')
            | {!PEEK(2, '[')}?=> '['
            | {!PEEK(2, ']')}?=> ']'
         )+
         (
              ']]' {(success = true), false}?=>
            | {false}?=>
         )
     )
     |
     {success}?=>
     ;


_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to