Re: [xml] xmlReader and HTML

Daniel Veillard Fri, 10 Jun 2011 07:30:17 -0700

On Fri, Jun 10, 2011 at 02:26:56PM +0200, Joachim Zobel wrote:
> Hi.
> 
> It looks like the xmlReader parser is able to parse HTML. At least it
> accepts doctype at document start. It does however behave differently
> than the SAX/DOM HTML parser. For example it wants closing tags for META
> and LI.
> 
> To what extend does xmlReader support HTML? I think a lot of things
> would be easier for me if I could move from SAX to xmlReader, however I
> need to be able to parse HTML.


  It doesn't, right now the reader is always operating on top of an
XML parser, not an HTML one, hence your result.
  Except modifying it to allow HTML parsing (probably around
xmlTextReaderSetup() ) the only way would be to process HTML documents
by parsing them to an htmlDocPtr and then passing that htmlDocPtr as the
input to xmlReaderWalker() i.e. providing the iteration on a full
document. That's the only solution I can think of without extending
the current code.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] xmlReader and HTML

Reply via email to