Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2

Stefan Behnel Fri, 08 Aug 2008 01:31:00 -0700

Karl Dubost <karl <at> w3.org> writes:
> I have written a short document to explain the project [Cleaning the  
> Web][1].
> It describes what is html5 and what would be the benefits of  
> implementing the html 5 parsing algorithm in libxml2 html parser.


There's already an HTML5 implementation in Python (html5lib) which you can use 
together with lxml (so you can benefit from both HTML5 *and* libxml2 already). 
IIRC, there was also a push towards a C implementation, but I'm not sure that 
really lead anywhere. What's in SVN doesn't look very complete:

http://html5lib.googlecode.com/svn/trunk/c/chtml5lib/

IMHO, it's better to stick with higher level implementations during the 
specification phase, and to push the work on an optimised, low-level C 
implementation back until the target is a bit more focussed. But then, maybe 
that's just me...

I didn't read your proposal, so I'll just assume you meant to extend the 
existing HTML parser instead of writing a new one. That would sound more 
promising than a start from scratch.

Stefan


_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2

Reply via email to