Re: Parsing HTML--looking for info/comparison of HTMLParser vs. htmllib modules.

2006-07-08 Thread Fredrik Lundh
Fredrik Lundh wrote: > the only difference between the libs (*) is that HTMLParser is a bit > stricter *) "the libs" referring to htmllib and HTMLParser, not htmllib and sgmllib. -- http://mail.python.org/mailman/listinfo/python-list

Re: Parsing HTML--looking for info/comparison of HTMLParser vs. htmllib modules.

2006-07-08 Thread Fredrik Lundh
Kenneth McDonald wrote: > The problem I'm having with HTMLParser is simple; I don't seem to be > getting the actual text in the HTML document. I've implemented the > do_data method of HTMLParser.HTMLParser in my HTMLParser subclass, but > it never seems to receive any data. Is there another way

Re: Parsing HTML--looking for info/comparison of HTMLParser vs. htmllib modules.

2006-07-07 Thread wes weston
from HTMLParser import HTMLParser class MyHTMLParser(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.TokenList = [] def handle_data( self,data): data = data.strip() if data and len(data) > 0: self.TokenList.append(data)

Parsing HTML--looking for info/comparison of HTMLParser vs. htmllib modules.

2006-07-07 Thread Kenneth McDonald
I'm writing a program that will parse HTML and (mostly) convert it to MediaWiki format. The two Python modules I'm aware of to do this are HTMLParser and htmllib. However, I'm currently experiencing either real or conceptual difficulty with both, and was wondering if I could get some advice. T