Re: [Wikitech-l] Parser implementaton for MediaWiki syntax

Andreas Jonsson Thu, 23 Sep 2010 05:47:30 -0700

2010-09-23 14:17, Krinkle skrev:
> Op 23 sep 2010, om 14:14 heeft Andreas Jonsson het volgende geschreven:
>
>    
>> 2010-09-23 11:34, Bryan Tong Minh skrev:
>>      
>>> Hi,
>>>
>>>
>>> Pretty awesome work you've done!
>>>
>>> On Thu, Sep 23, 2010 at 11:27 AM, Andreas Jonsson
>>> <[email protected]>   wrote:
>>>
>>>        
>>>> I think that this demonstrates the feasability of replacing the
>>>> MediaWiki parser.  There is still a lot of work to do in order to
>>>> turn
>>>> it into a full replacement, however.
>>>>
>>>>
>>>>          
>>> Have you already tried to run the parsertests that come with
>>> MediaWiki? Do they produce (roughly) the same output as with the PHP
>>> parser?
>>>
>>>
>>>        
>> No, I haven't.  I have produced my own set of unit tests that are
>> based on the original parser.  For the features that I have
>> implemented, the output should be roughly the same under "normal"
>> circumstances.
>>
>> But the original parser have tons of border cases where the behavior
>> is not very well defined.  For instance, the table on the test page
>> will render very differently with the original parser (it will
>> actually turn into two separate tables).
>>
>> I am employing a consistent and easily understood strategy for
>> handling html intermixed with wikitext markup; it is easy to explain
>> that the |} token is disabled in the context of an html-table.  There
>> is no such simple explanation for the behavior of the original parser,
>> even though in this particular example the produced html code happens
>> to be valid (which isn't always the case).
>>
>> So, what I'm trying to say is that for the border cases where my
>> implementation differs from the original, the behavior of my parser
>> should be considered the correct one. :-)
>>
>> /Andreas
>>
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>      
> Hm...
> Depending on how 'edge' those edge cases are, and on how much they are
> known.
> Doing that would may render it unusable for established wikis and
> would never become the default anytime soon, right ?
>
>    
We are talking about the edge cases that arise when intermixing
wikitext and html code in "creative" ways.  This is for instance ok
with the original parser:


* item 1 <li> item 2
* item 3

That may seem harmless and easy to handle, but suprise!  explicitly
adding the </li> token doesn't work as expected:

* item 1 <li> item 2 </li>
* item 3

And what happens when you add a new html list inside a wikitext list
item without closing it?

* item 1 <ul><li> item 2
* item 3

Which list should item 3 belong to?  You can can come up with
thousands of situations like this, and without a consistent plan on
how to handle them, you will need to add thousands of border cases to
the code to handle them all.

I have avoided this by simply disabling all html block tokens inside
wikitext list items.  Of course, it may be that someone is actually
relying on being able to mix in this way, but it doesn't seem likely
as the result tends to be strange.

/Andreas


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parser implementaton for MediaWiki syntax

Reply via email to