Karl Dubost <karl <at> w3.org> writes: > I have written a short document to explain the project [Cleaning the > Web][1]. > It describes what is html5 and what would be the benefits of > implementing the html 5 parsing algorithm in libxml2 html parser.
There's already an HTML5 implementation in Python (html5lib) which you can use together with lxml (so you can benefit from both HTML5 *and* libxml2 already). IIRC, there was also a push towards a C implementation, but I'm not sure that really lead anywhere. What's in SVN doesn't look very complete: http://html5lib.googlecode.com/svn/trunk/c/chtml5lib/ IMHO, it's better to stick with higher level implementations during the specification phase, and to push the work on an optimised, low-level C implementation back until the target is a bit more focussed. But then, maybe that's just me... I didn't read your proposal, so I'll just assume you meant to extend the existing HTML parser instead of writing a new one. That would sound more promising than a start from scratch. Stefan _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
