On 07/23/2013 06:02 PM, Subramanya Sastry wrote:
On 07/23/2013 05:28 PM, John Vandenberg wrote:
VE and Parsoid devs have put in a lot and lot of effort to recognize
broken
wikitext source, fix it or isolate it,
My point was that you dont appear to be doing analysis of how of all
Wikipedia content is broken; at least I dont see a public document
listing which templates and pages are causing the parser problems, so
the communities on each Wikipedia can fix them ahead of deployment.
Unfortunately, this is much harder to do. What we can consider is to
periodically swap out our test pages to consider a fresh patch of
pages so new kinds of problems show up in automated testing. In some
cases, detecting problems automatically is equivalent to be able to
fix them up automatically as well.
Actually, we do have a beginnings of a page for this that I had
forgotten about:
http://www.mediawiki.org/wiki/Parsoid/Broken_wikitext_tar_pit I dont
think this is very helpful at this time and is what you are asking for,
but just pointing it out for the record that we've thought about it some.
Some of these cases -- we are actually beginning to address
* fostered content in top-level pages (we handle fostering from templates)
* handling of templates that produce part of a table-cell, or multiple
cells, or multiple attributes of an image.
Ideally, we would not have to support these kind of use cases, but given
what we are seeing in production now, we might try to deal with some of
these cases ... Interestingly enough, we do a much better job of
protecting against unclosed tables, fostered content out of tables, etc.
when they come from templates rather than when such wikitext occurs in
the page content itself. We have a couple of DOM analysis passes to
detect those problems and protect them from editing ... but that needs
to be extended to top level page content.
Subbu.
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l