Re: [xml] xmllint --html --xmlout

Daniel Veillard Mon, 12 Feb 2007 09:46:24 -0800

On Mon, Feb 12, 2007 at 08:38:55AM -0500, Daniel Veillard wrote:
> On Mon, Feb 12, 2007 at 07:42:13AM -0500, Elliotte Harold wrote:
> > I'm working on a book about converting messy old HTML to clean XHTML, 
> > and I'm trying to decide exactly how much of each tool to recommend when.
> 
>   libxml2 HTML parser has been used for many real world tools, like HTML
> indexers, it will consume mostly anything, but it doesn't try to add much
> correcting recipes on top of it. This was discussed on the list a couple
> of years ago, and that's where libxml2 HTML parsing error handling principle
> were set up.


  BTW, now that I think about it, I have done that for years and years but 
slightly differently. The majority of xmlsoft.org content is kept in HTML
files edited with whatever preferred tool available, and then the web site
is generated as XHTML1 content using xsltproc --html option, allowing to parse
the HTML input and feed it to a stylesheet which then split, format, add
presentation, generates indexes and dumps as XHTML1. Next rule in the
Makefile uses xmllint --valid --noout the resulting .html files to check for
well-formedness and validation against the XHTML1 DTDs, this is all
in the doc subdir starting from xml.html initial file.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] xmllint --html --xmlout

Reply via email to