Stefan Behnel skrev:
> [EMAIL PROTECTED] wrote:
>> I need to parse real world HTML/XML documents and I found two nice python
>> solution: BeautifulSoup and Tidy.
>
> There's also lxml, in case you want a real XML tool.
> http://codespeak.net/lxml/
> http://codespeak.net/lxml/dev/parsing.html#parse
[EMAIL PROTECTED] wrote:
> I need to parse real world HTML/XML documents and I found two nice python
> solution: BeautifulSoup and Tidy.
There's also lxml, in case you want a real XML tool.
http://codespeak.net/lxml/
http://codespeak.net/lxml/dev/parsing.html#parsers
> However I found pyXPCOM th
I need to parse real world HTML/XML documents and I found two nice python
solution: BeautifulSoup and Tidy.
However I found pyXPCOM that is a wrapper for Gecko. So I was thinking
Gecko surely handles bad html in a more consistent and error-proof way
than BS and Tidy.
I'm interested in using Mozil