On Jan 6, 2013, at 22:48, Ed Owens <eowens0...@gmx.com> wrote:
> I have been working my way through Chun's book Core Python Applications.
>
> In chapter 9 he has a web crawler program that essentially copies all the
> files from a web site by finding and downloading the links on that domain.
>
> One of the classes has a procedure definition, and I'm having trouble finding
> documentation for the functions. The code is:
>
> def parse_links(self):
> 'Parse out the links found in downloaded HTML file'
> f = open(self.file, 'r')
> data = f.read()
> f.close()
> parser = HTMLParser(formatter.AbstractFormatter(
> formatter.DumbWriter(cStringIO.StringIO())))
> parser.feed(data)
> parser.close()
> return parser.anchorlist
>
> HTMLParser is from htmllib.
>
> I'm having trouble finding clear documentation for what the functions that
> are on the 'parser =' line do and return. The three modules (htmllib,
> formatter, & cStringIO are all imported, but I can't seem to find much info
> on how they work and what they do. What this actually does and what it
> produces is completely obscure to me.
>
> Any help would be appreciated. Any links to clear documentation and examples?
> _______________________________________________
> Tutor maillist - Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
Hi Ed, maybe this helps:
http://docs.python.org/2/library/htmllib.html
A
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor