Re: [Tutor] html links

Alan Gauld Tue, 15 May 2007 00:43:48 -0700

"max ." <[EMAIL PROTECTED]> wrote

> does anyone know of a tutorial for finding links in a web site with 
> python.
>
Beautifulsuop has been mentioned but its not part of standard python.


Her is an example of the standard library parser:

html = '''
<html><head><title>Test page</title></head>
<body>
<center>
<h1>Here is the first heading</h1>
</center>
<p>A short paragraph
<h1>A second heading</h1>
<p>A paragraph containing a
<a href="www.python.org">hyperlink to python</a>
</body></html>
'''

from HTMLParser import HTMLParser

class H1Parser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.h1_count = 0
        self.isHeading = False

    def handle_starttag(self,tag,attributes=None):
        if tag == 'h1':
            self.h1_count += 1
            self.isHeading = True

    def handle_endtag(self,tag):
        if tag == 'h1':
            self.isHeading = False

    def handle_data(self,data):
        if self.isHeading and self.h1_count == 2:
            print "Second Header contained: ", data

parser = H1Parser()
parser.feed(html)
parser.close()

> or creating files and asking ware to create a file.

I'm not sure what you mean here? Do you mean fetching a file
from a remote server? There is an ftp module if its from an ftp 
site...


-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld 


_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] html links

Reply via email to