On Wed, Nov 03, 2010 at 10:56:17AM -0400, Sam Ruby wrote:
> Retrying...
> 
> -------- Original Message --------
> Subject: HTML5 test cases
> Date: Thu, 21 Oct 2010 13:30:05 -0400
> From: Sam Ruby <[email protected]>
> To: [email protected]
> 
> I've taken a quick look at comparing the output of htmlParseDocument
> (via nokogiri[1]) against the HTML5 test cases, and noted quite a few
> differences:
> 
> http://intertwingly.net/stories/2010/10/21/libxml2-html5-test.out
> http://intertwingly.net/stories/2010/10/21/libxml2-html5-tree-test.out
> 
> Further background on my weblog[2].

  Ah the W3C tech plenary, I'm not far away, just one hour drive,
but since XML Core didn't met there this year I didn't plan to come.

> Any thoughts on the best path towards making a HTML5 compliant parser
> available?

 Well if there is now a good semantic about what an HTML parser should
do in corner cases, I have no problem with getting patches in !
 The current HTML parser was basically implemented using the HTML4 spec
but without the crazyness of trying to mimics what browsers do with
that input. The main usage is screen-scraping or conversion to XML
(at least for me) and that wasn't looking worth the effort.
  Now if there is a decent semantic about what a parser should do with
HTML5 and HTML5-like (that's the problem) kind of input, then nice,
I'm sure once it gets REC status then people will be enthusistaic to
develop small parsers and maybe libxml2 can be one of them.
  Me I'm really welcoming HTML5 parser patches, one can probably make
a new parsing option for the existing parser to allow old and new
behaviour (or switch automatically but we all know it's error prone :-)
But I have no time developping this myself, libvirt is what I'm
working on ATM,

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[email protected]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to