On Mon, Feb 12, 2007 at 08:38:55AM -0500, Daniel Veillard wrote: > On Mon, Feb 12, 2007 at 07:42:13AM -0500, Elliotte Harold wrote: > > I'm working on a book about converting messy old HTML to clean XHTML, > > and I'm trying to decide exactly how much of each tool to recommend when. > > libxml2 HTML parser has been used for many real world tools, like HTML > indexers, it will consume mostly anything, but it doesn't try to add much > correcting recipes on top of it. This was discussed on the list a couple > of years ago, and that's where libxml2 HTML parsing error handling principle > were set up.
BTW, now that I think about it, I have done that for years and years but slightly differently. The majority of xmlsoft.org content is kept in HTML files edited with whatever preferred tool available, and then the web site is generated as XHTML1 content using xsltproc --html option, allowing to parse the HTML input and feed it to a stylesheet which then split, format, add presentation, generates indexes and dumps as XHTML1. Next rule in the Makefile uses xmllint --valid --noout the resulting .html files to check for well-formedness and validation against the XHTML1 DTDs, this is all in the doc subdir starting from xml.html initial file. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
