Hi all
I'm trying to use python to automatically download and process a (small)
number of wikipedia articles. However, I keep getting a 403 (Forbidden
Error), when using urllib2:
>>> import urllib2
>>> ip = urllib2.urlopen("http://en.wikipedia.org/wiki/Pythonidae")
which gives this:
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
ip = urllib2.urlopen("http://en.wikipedia.org/wiki/Pythonidae")
File "G:\Python25\lib\urllib2.py", line 121, in urlopen
return _opener.open(url, data)
File "G:\Python25\lib\urllib2.py", line 380, in open
response = meth(req, response)
File "G:\Python25\lib\urllib2.py", line 491, in http_response
'http', request, response, code, msg, hdrs)
File "G:\Python25\lib\urllib2.py", line 418, in error
return self._call_chain(*args)
File "G:\Python25\lib\urllib2.py", line 353, in _call_chain
result = func(*args)
File "G:\Python25\lib\urllib2.py", line 499, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
Now, when I use urllib instead of urllib2, something different happens:
>>> import urllib
>>> ip2 = urllib.urlopen("http://en.wikipedia.org/wiki/Pythonidae")
>>> st = ip2.read()
However, st does not contain the hoped-for page - instead it is a page of
html and (maybe?) javascript, which ends in:
>If reporting this error to the Wikimedia System Administrators, please
include the following >details:<br/>\n<span style="font-style:
>italic">\nRequest: GET
http://en.wikipedia.org/wiki>/Pythonidae<http://en.wikipedia.org/wiki/Pythonidae>,
from 98.195.188.89 via sq27.wikimedia.org (squid/2.6.STABLE13) >to
>()<br/>\nError: ERR_ACCESS_DENIED, errno [No Error] at Sat, 27 Oct 2007
06:45:00 >GMT\n</span>\n</div>\n\n</body>\n</html>\n'
Could anybody tell me what's going on, and what I should be doing
differently?
Thanks for your time
Alex
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor