Re: [Wikitech-l] Parser implementaton for MediaWiki syntax

Andreas Jonsson Thu, 23 Sep 2010 06:26:03 -0700

2010-09-23 14:56, Krinkle skrev:
> Op 23 sep 2010, om 14:47 heeft Andreas Jonsson het volgende geschreven:
>
>    
>> 2010-09-23 14:17, Krinkle skrev:
>>      
>>> Op 23 sep 2010, om 14:14 heeft Andreas Jonsson het volgende
>>> geschreven:
>>>
>>>
>>>        
>>>> 2010-09-23 11:34, Bryan Tong Minh skrev:
>>>>
>>>>          
>>>>> Hi,
>>>>>
>>>>>
>>>>> Pretty awesome work you've done!
>>>>>
>>>>> On Thu, Sep 23, 2010 at 11:27 AM, Andreas Jonsson
>>>>> <andreas.jons...@kreablo.se>    wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> I think that this demonstrates the feasability of replacing the
>>>>>> MediaWiki parser.  There is still a lot of work to do in order to
>>>>>> turn
>>>>>> it into a full replacement, however.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> Have you already tried to run the parsertests that come with
>>>>> MediaWiki? Do they produce (roughly) the same output as with the
>>>>> PHP
>>>>> parser?
>>>>>
>>>>>
>>>>>
>>>>>            
>>>> No, I haven't.  I have produced my own set of unit tests that are
>>>> based on the original parser.  For the features that I have
>>>> implemented, the output should be roughly the same under "normal"
>>>> circumstances.
>>>>
>>>> But the original parser have tons of border cases where the behavior
>>>> is not very well defined.  For instance, the table on the test page
>>>> will render very differently with the original parser (it will
>>>> actually turn into two separate tables).
>>>>
>>>> I am employing a consistent and easily understood strategy for
>>>> handling html intermixed with wikitext markup; it is easy to explain
>>>> that the |} token is disabled in the context of an html-table.
>>>> There
>>>> is no such simple explanation for the behavior of the original
>>>> parser,
>>>> even though in this particular example the produced html code
>>>> happens
>>>> to be valid (which isn't always the case).
>>>>
>>>> So, what I'm trying to say is that for the border cases where my
>>>> implementation differs from the original, the behavior of my parser
>>>> should be considered the correct one. :-)
>>>>
>>>> /Andreas
>>>>
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> Wikitech-l@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>
>>>>          
>>> Hm...
>>> Depending on how 'edge' those edge cases are, and on how much they
>>> are
>>> known.
>>> Doing that would may render it unusable for established wikis and
>>> would never become the default anytime soon, right ?
>>>
>>>
>>>        
>> We are talking about the edge cases that arise when intermixing
>> wikitext and html code in "creative" ways.  This is for instance ok
>> with the original parser:
>>
>> * item 1<li>  item 2
>> * item 3
>>
>> That may seem harmless and easy to handle, but suprise!  explicitly
>> adding the</li>  token doesn't work as expected:
>>
>> * item 1<li>  item 2</li>
>> * item 3
>>
>> And what happens when you add a new html list inside a wikitext list
>> item without closing it?
>>
>> * item 1<ul><li>  item 2
>> * item 3
>>
>> Which list should item 3 belong to?  You can can come up with
>> thousands of situations like this, and without a consistent plan on
>> how to handle them, you will need to add thousands of border cases to
>> the code to handle them all.
>>
>> I have avoided this by simply disabling all html block tokens inside
>> wikitext list items.  Of course, it may be that someone is actually
>> relying on being able to mix in this way, but it doesn't seem likely
>> as the result tends to be strange.
>>
>> /Andreas
>>
>>      
> I agree that making in consistant is important and will only cause
> good things (such as people getting used to behaviour and being able
> to predict what something would logically do).
>
> About the html in wikitext mixup: Although not directly, it is most
> certainly done indirectly.
>
> Imagine a template which is consists of a table in wikitext. A certain
> parameters value is outputted in a table cel.
> On some page that template is called and the parameter is filled with
> the help of a parser function (like #if or #expr).
> To avoid mess and escape templates, the table inside this table cell
> is build from there in HTML in a lot of cases instead of wiki text
> (pipe problem, think {{!}})
>
> Result is html table in wikitext table.
>
>    
Yes, but that is supported by the parser.  What isn't supported is
mixing tokens from html tables with tokens from a wikitext table.  So
you have:


<table><td>this is a column inside an html table, and as such,
| token and
|- token are disabled.  However,
{|
| opens up a wikitext table, which changes the context so that now <td>
<tr> and </table> tokens are disabled.  But it is still possible to once 
again
<table><td> open up a html table and thus the context is switched so 
that the
|} token is disabled.
</table>
|}
</table>

And here we're back to an ordinary paragraph.

> Or for example the thing with whitespace and parser functions /
> template parameters. Starting something like a table or list requires
> the block level hack (like<br />  or<div></div>  after the pipe, and
> then the {| table |} or *list on the next time). To avoid those in
> complex templates often HTML is used.
> If that template would be called on a page with an already existing
> wikitext list in place there would be a html list inside a wikitext
> list.
>
>    
A feasible alternative is to parse these as inline block elements
inside wikitext list elements, which I'm already doing for image links
with caption.  But I think that it is preferable to just disable them.

> I dont know in which order the parser works, but I think if the
> behaviour changes of that lots of complicated templates will break,
> and not just on Wikimedia projects.
>    
That's possible, but I believe that the set of broken templates can be
limited to a great extent.  To deploy a new parser on an existing
site, one would need a tool that walks the existing pages and warns
about suspected problems.

/Andreas

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parser implementaton for MediaWiki syntax

Reply via email to