2010-09-23 14:56, Krinkle skrev: > Op 23 sep 2010, om 14:47 heeft Andreas Jonsson het volgende geschreven: > > >> 2010-09-23 14:17, Krinkle skrev: >> >>> Op 23 sep 2010, om 14:14 heeft Andreas Jonsson het volgende >>> geschreven: >>> >>> >>> >>>> 2010-09-23 11:34, Bryan Tong Minh skrev: >>>> >>>> >>>>> Hi, >>>>> >>>>> >>>>> Pretty awesome work you've done! >>>>> >>>>> On Thu, Sep 23, 2010 at 11:27 AM, Andreas Jonsson >>>>> <andreas.jons...@kreablo.se> wrote: >>>>> >>>>> >>>>> >>>>>> I think that this demonstrates the feasability of replacing the >>>>>> MediaWiki parser. There is still a lot of work to do in order to >>>>>> turn >>>>>> it into a full replacement, however. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> Have you already tried to run the parsertests that come with >>>>> MediaWiki? Do they produce (roughly) the same output as with the >>>>> PHP >>>>> parser? >>>>> >>>>> >>>>> >>>>> >>>> No, I haven't. I have produced my own set of unit tests that are >>>> based on the original parser. For the features that I have >>>> implemented, the output should be roughly the same under "normal" >>>> circumstances. >>>> >>>> But the original parser have tons of border cases where the behavior >>>> is not very well defined. For instance, the table on the test page >>>> will render very differently with the original parser (it will >>>> actually turn into two separate tables). >>>> >>>> I am employing a consistent and easily understood strategy for >>>> handling html intermixed with wikitext markup; it is easy to explain >>>> that the |} token is disabled in the context of an html-table. >>>> There >>>> is no such simple explanation for the behavior of the original >>>> parser, >>>> even though in this particular example the produced html code >>>> happens >>>> to be valid (which isn't always the case). >>>> >>>> So, what I'm trying to say is that for the border cases where my >>>> implementation differs from the original, the behavior of my parser >>>> should be considered the correct one. :-) >>>> >>>> /Andreas >>>> >>>> >>>> _______________________________________________ >>>> Wikitech-l mailing list >>>> Wikitech-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >>>> >>>> >>> Hm... >>> Depending on how 'edge' those edge cases are, and on how much they >>> are >>> known. >>> Doing that would may render it unusable for established wikis and >>> would never become the default anytime soon, right ? >>> >>> >>> >> We are talking about the edge cases that arise when intermixing >> wikitext and html code in "creative" ways. This is for instance ok >> with the original parser: >> >> * item 1<li> item 2 >> * item 3 >> >> That may seem harmless and easy to handle, but suprise! explicitly >> adding the</li> token doesn't work as expected: >> >> * item 1<li> item 2</li> >> * item 3 >> >> And what happens when you add a new html list inside a wikitext list >> item without closing it? >> >> * item 1<ul><li> item 2 >> * item 3 >> >> Which list should item 3 belong to? You can can come up with >> thousands of situations like this, and without a consistent plan on >> how to handle them, you will need to add thousands of border cases to >> the code to handle them all. >> >> I have avoided this by simply disabling all html block tokens inside >> wikitext list items. Of course, it may be that someone is actually >> relying on being able to mix in this way, but it doesn't seem likely >> as the result tends to be strange. >> >> /Andreas >> >> > I agree that making in consistant is important and will only cause > good things (such as people getting used to behaviour and being able > to predict what something would logically do). > > About the html in wikitext mixup: Although not directly, it is most > certainly done indirectly. > > Imagine a template which is consists of a table in wikitext. A certain > parameters value is outputted in a table cel. > On some page that template is called and the parameter is filled with > the help of a parser function (like #if or #expr). > To avoid mess and escape templates, the table inside this table cell > is build from there in HTML in a lot of cases instead of wiki text > (pipe problem, think {{!}}) > > Result is html table in wikitext table. > > Yes, but that is supported by the parser. What isn't supported is mixing tokens from html tables with tokens from a wikitext table. So you have:
<table><td>this is a column inside an html table, and as such, | token and |- token are disabled. However, {| | opens up a wikitext table, which changes the context so that now <td> <tr> and </table> tokens are disabled. But it is still possible to once again <table><td> open up a html table and thus the context is switched so that the |} token is disabled. </table> |} </table> And here we're back to an ordinary paragraph. > Or for example the thing with whitespace and parser functions / > template parameters. Starting something like a table or list requires > the block level hack (like<br /> or<div></div> after the pipe, and > then the {| table |} or *list on the next time). To avoid those in > complex templates often HTML is used. > If that template would be called on a page with an already existing > wikitext list in place there would be a html list inside a wikitext > list. > > A feasible alternative is to parse these as inline block elements inside wikitext list elements, which I'm already doing for image links with caption. But I think that it is preferable to just disable them. > I dont know in which order the parser works, but I think if the > behaviour changes of that lots of complicated templates will break, > and not just on Wikimedia projects. > That's possible, but I believe that the set of broken templates can be limited to a great extent. To deploy a new parser on an existing site, one would need a tool that walks the existing pages and warns about suspected problems. /Andreas _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l