Fredrik Lundh wrote:
> the only difference between the libs (*) is that HTMLParser is a bit
> stricter
*) "the libs" referring to htmllib and HTMLParser, not htmllib and sgmllib.
--
http://mail.python.org/mailman/listinfo/python-list
Kenneth McDonald wrote:
> The problem I'm having with HTMLParser is simple; I don't seem to be
> getting the actual text in the HTML document. I've implemented the
> do_data method of HTMLParser.HTMLParser in my HTMLParser subclass, but
> it never seems to receive any data. Is there another way
from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.TokenList = []
def handle_data( self,data):
data = data.strip()
if data and len(data) > 0:
self.TokenList.append(data)
I'm writing a program that will parse HTML and (mostly) convert it to
MediaWiki format. The two Python modules I'm aware of to do this are
HTMLParser and htmllib. However, I'm currently experiencing either real
or conceptual difficulty with both, and was wondering if I could get
some advice.
T