Subramanya Sastry wrote:
>* Unclosed HTML tags (very common)
>* Misnested tags
>* Misnesting of tags (ex: links in links .. [http://foo.bar this is a
>[[foobar]] company])
>* Fostered content in tables
>(<table>this-content-will-show-up-outside-the-table<tr><td>....
></td></tr></table>)
>... this has been one of the biggest source of complexity inside Parsoid
>... in combination with templates, this is nasty.
>* Other ways in which HTML5 content model might be violated. (ex:
><small>\n*a\n*b\n</small>)
>* Look at the parser tests file and see all the tests we've added with
>annotations that say "php parser relies on tidy"

I don't see why we would want to incur the maintenance cost of continuing
to support any of these bad inputs. I think we should look to deprecate,
not replace, Tidy. This is a case of the cure being worse than the disease.

>So, you cannot just rip out Tidy and not replace it with something in
>its place. Even replacing it with a HTML5 parser (as per the current
>plan) is not entirely straightforward simply because of all the other
>unrelated-to-html5-semantics behavior. Part of the task of replacing
>Tidy is to figure out all the ways those pages might break and the best
>way to handle that breakage.

We shouldn't rip out Tidy immediately, we should implement a means of
disabling Tidy on a per-page or per-user basis and allow the wiki process
to correct bad markup over time. Cunningham's Law applies here.

MZMcBride



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to