Re: Parsing HTML/XML documents

2007-04-26 Thread Max M
Stefan Behnel skrev: > [EMAIL PROTECTED] wrote: >> I need to parse real world HTML/XML documents and I found two nice python >> solution: BeautifulSoup and Tidy. > > There's also lxml, in case you want a real XML tool. > http://codespeak.net/lxml/ > http://codespeak.net/lxml/dev/parsing.html#parse

Re: Parsing HTML/XML documents

2007-04-26 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > I need to parse real world HTML/XML documents and I found two nice python > solution: BeautifulSoup and Tidy. There's also lxml, in case you want a real XML tool. http://codespeak.net/lxml/ http://codespeak.net/lxml/dev/parsing.html#parsers > However I found pyXPCOM th

Parsing HTML/XML documents

2007-04-26 Thread [EMAIL PROTECTED]
I need to parse real world HTML/XML documents and I found two nice python solution: BeautifulSoup and Tidy. However I found pyXPCOM that is a wrapper for Gecko. So I was thinking Gecko surely handles bad html in a more consistent and error-proof way than BS and Tidy. I'm interested in using Mozil