On Wed, Nov 03, 2010 at 04:02:57PM -0400, Sam Ruby wrote: > On 11/03/2010 02:50 PM, Daniel Veillard wrote:
> This does not need to wait until REC status, the parsing algorithm > is fairly stable. okay > Some background: Henri wrote a fully compliant HTML parser in Java, > and has been keeping it in sync with the specification (at times > even writing bug reports against the HTML5 spec as required): > > http://about.validator.nu/htmlparser/ okay > He then wrote a translator which mechanically converts his usage of > Java into a C++ program with dependencies on some Mozilla libraries: > > http://groups.google.com/group/mozilla.dev.platform/msg/35ace94ab1ae1511?pli=1 > http://mxr.mozilla.org/mozilla-central/source/parser/ fine for Mozilla, maybe the Java code is easier to maintain > The result is not only compliant with the HTML5 specification, it is > the actual parser which will ship with Firefox 4: > > http://hg.mozilla.org/mozilla-central/rev/129e19d979f0 > > Oversimplifying, but if this same code could target the underlying > string and DOM handling routines, the result of an parse would be > immediately useful to applications which build on top of libxml2. Well I see 2 major issues with that even without getting into the details: - that's generated code, that mean it cannot be modified/patched within the libxml2 project. That untenable from a maintainance POV if it were to be embbedded in libxml2 - the internal string format of Mozilla is UTF-16, and libxml2 operates on UTF-8, that's already one of the major problem we faced when we looked at using libxslt for mozilla That doesn't sound too easy, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml