On Wed, Nov 03, 2010 at 04:02:57PM -0400, Sam Ruby wrote:
> On 11/03/2010 02:50 PM, Daniel Veillard wrote:

> This does not need to wait until REC status, the parsing algorithm
> is fairly stable.

  okay

> Some background: Henri wrote a fully compliant HTML parser in Java,
> and has been keeping it in sync with the specification (at times
> even writing bug reports against the HTML5 spec as required):
> 
> http://about.validator.nu/htmlparser/

  okay

> He then wrote a translator which mechanically converts his usage of
> Java into a C++ program with dependencies on some Mozilla libraries:
> 
> http://groups.google.com/group/mozilla.dev.platform/msg/35ace94ab1ae1511?pli=1
> http://mxr.mozilla.org/mozilla-central/source/parser/

  fine for Mozilla, maybe the Java code is easier to maintain

> The result is not only compliant with the HTML5 specification, it is
> the actual parser which will ship with Firefox 4:
> 
> http://hg.mozilla.org/mozilla-central/rev/129e19d979f0
> 
> Oversimplifying, but if this same code could target the underlying
> string and DOM handling routines, the result of an parse would be
> immediately useful to applications which build on top of libxml2.

  Well I see 2 major issues with that even without getting into the
details:
  - that's generated code, that mean it cannot be modified/patched
    within the libxml2 project. That untenable from a maintainance
    POV if it were to be embbedded in libxml2
  - the internal string format of Mozilla is UTF-16, and libxml2
    operates on UTF-8, that's already one of the major problem
    we faced when we looked at using libxslt for mozilla

  That doesn't sound too easy,

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to