I'm still working through Chun's "Core Python Applications". I got the web crawler (Example 9-2) working after I found a ':' typing error. Now I'm trying to convert that to a program that checks for broken links. This is not in the book. The problem I'm having now is knowing whether a link is working.

I've written an example that I hope illustrates my problem:

#!/usr/bin/env python

import urllib2

sites = ('http://www.catb.org', 'http://ons-sa.org', 'www.notasite.org')
for site in sites:
    try:
        page = urllib2.urlopen(site)
        print page.geturl(), "didn't return error on open"
        print 'Reported server is', page.info()['Server']
    except:
        print site, 'generated an error on open'
    try:
        page.close()
        print site, 'successfully closed'
    except:
        print site, 'generated error on close'


Site 1 is alive, the other two dead. Yet this code only returns an error on site three. Notice that I checked for a redirection (I think) of the site if it opened, and that didn't help with site two.

Is there an unambiguous way to determine if a link has died -- knowing nothing about the link in advance?

Ed



_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to