Hello all.  I've hit a problem using libxml2 to parse HTML files.  Usually
everything works great, but on a particular input file I'm getting a hang
with the process hogging the CPU indefinitely until killed.  When I run it
through xmllint I see (aside from a bunch of run-of-the-mill HTML parsing
warnings):

  $ /usr/local/bin/xmllint --html fail.html
  fail.html:927: parser error : Excessive depth in document: change
xmlParserMaxDepth = 1024
  marcy playground<br /><option><em>

Then xmllint hangs, using 100% of the CPU until killed.

Another note - my first attempt to work around this was to add an alarm()
call before parsing, hoping to terminate the failed parse if it took too
long.  For some reason that didn't work - the alarm signal never reach my
signal handler.  Any ideas why?  I'm ok with the parser failing to parse bad
HTML - that's just a fact of life - but I can't allow it to hang
indefinitely!

This is libxml2 v2.6.30 on Linux:

  $ /usr/local/bin/xmllint --html --version
  /usr/local/bin/xmllint: using libxml version 20630
     compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1
FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv
ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib


Would you like me to send in the killer file?  It's around 208k, so I didn't
think it would be very polite to send unasked-for.

Thanks for any help you can give me!

-sam
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to