2010-08-30 13:22, Jan Paul Posma skrev:
>>      
>>> It would be really nice to be able to include hooks after the lexer, but 
>>> before actual parsing.
>>>
>>>        
>> That could be done,  but I would not recommend it.  What application do
>> you have in mind?
>>      
> Well, the current implementation of my editor uses a bunch of regexes (like 
> the current parser) to determine where to inject spans or divs into the 
> wikitext. Having a more accurate representation (the tokenized wikitext that 
> the lexer outputs) would allow for more accurate injection. Then again, it 
> would be complicated to interface that with PHP, I guess?
>
>    
Between the lexer and the parser there is just the stream of tokens.
How that relates to the ultimately rendered content is non-trivial.  I
think that you would be much better off by working on top of the
listener interface.  It would be a help for you to, i'd guess,
introduce the period character (or more generally, a localizable
sentence seprator character) as it's own token and pass that as an
event.  But that cannot be efficiently implemented as a "hook", it has
to be integrated in the lexer.  But it should be perfectly possible to
define sentences in the event stream even without such a token.


> How would you handle hooks, tag extensions, parser functions and magic words 
> anyway? Will you leave this to some post-processing stage in PHP or have 
> things interact during parsing?
>    
The listener interface in itself constitutes a collection of hooks.
 From the parser's point of view, a tag extension works the same as
<nowiki>.  It's up to the listening application to call the
appropriate function to process the content.  Magic words and parser
functions should be handled by a preprocessor, as the substitution of
these may yield new tokens.

/Andreas

_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to