On Fri, Jun 10, 2011 at 02:26:56PM +0200, Joachim Zobel wrote: > Hi. > > It looks like the xmlReader parser is able to parse HTML. At least it > accepts doctype at document start. It does however behave differently > than the SAX/DOM HTML parser. For example it wants closing tags for META > and LI. > > To what extend does xmlReader support HTML? I think a lot of things > would be easier for me if I could move from SAX to xmlReader, however I > need to be able to parse HTML.
It doesn't, right now the reader is always operating on top of an XML parser, not an HTML one, hence your result. Except modifying it to allow HTML parsing (probably around xmlTextReaderSetup() ) the only way would be to process HTML documents by parsing them to an htmlDocPtr and then passing that htmlDocPtr as the input to xmlReaderWalker() i.e. providing the iteration on a full document. That's the only solution I can think of without extending the current code. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml