I have been working my way through Chun's book /Core Python Applications.

/In chapter 9 he has a web crawler program that essentially copies all the files from a web site by finding and downloading the links on that domain.

One of the classes has a procedure definition, and I'm having trouble finding documentation for the functions. The code is:

 def parse_links(self):
        'Parse out the links found in downloaded HTML file'
        f = open(self.file, 'r')
        data = f.read()
        f.close()
        parser = HTMLParser(formatter.AbstractFormatter(
                formatter.DumbWriter(cStringIO.StringIO())))
        parser.feed(data)
        parser.close()
        return parser.anchorlist

HTMLParser is from htmllib.

I'm having trouble finding clear documentation for what the functions that are on the 'parser =' line do and return. The three modules (htmllib, formatter, & cStringIO are all imported, but I can't seem to find much info on how they work and what they do. What this actually does and what it produces is completely obscure to me.

Any help would be appreciated. Any links to clear documentation and examples?
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to