Stefan Behnel wrote:
> Hi,
> 
> Karl Dubost wrote:
>>    Nick Kew weighed in and proposed that we should target [6]libxml
>>    which includes an HTML parser and is already supported by Apache
>>    server and many other tools.
>>
>>       [6] http://xmlsoft.org/html/libxml-HTMLparser.html
>>
>>    From here it would be interesting to implement HTML 5 parsing
>>    algorithm into libxml2. It would benefit the community as large.
> 
> Have you tried joining forces with the people who started the C implementation
> of html5lib? Maybe they have ideas to contribute or (partially) working code
> that you can look at. It may even happen that you get them convinced of the
> project.
> 
> In any case, having working implementations in Python and Java should get you
> a lot closer to your goal by looking under the hood.

FWIW, I've spent the summer working on a C HTML5 parser which is
approaching stability, called Hubbub[1].  It's about as half as fast as
libxml2 at parsing the HTML 5 spec with an O(1) treebuilder, and it's
fairly easy to bind to the libxml2 interfaces (and is being used in lieu
of the libxml2 HTML parser in a small Web browser, NetSurf[2], in the
development branch).  Note it's a) not buildable as a shared library or
b) had a formal release, but if someone wants an HTML5 parser in C, then
it's probably not a bad bet.

[1] http://www.netsurf-browser.org/projects/hubbub/
[2] http://www.netsurf-browser.org/

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to