The ElementTidy library is an add-on to ElementTree that provides an
alternative tree builder that can read (almost) arbitrary HTML, and turn
it into well-formed XHTML element trees.
The ElementTidy library uses a library version of Dave Raggett's HTML
Tidy utility to do the cleanup (source code is included), and does not rely
on external utilities.
The beta 1 release adds improved support for source document encoding,
and more aggressive tidying (producing output also for seriously malformed
HTML).
For downloads and more information, see:
http://effbot.org/downloads#elementtidy
http://effbot.org/zone/element-tidylib.htm
enjoy /F
_______________________________________________
XML-SIG maillist - [email protected]
http://mail.python.org/mailman/listinfo/xml-sig