Eduardo A. Bustamante López added the comment:
Hi Senthil,
> I fail to see the bug in here. Robotparser module is for reading and
> parsing the robot.txt file, the module responsible for fetching it
> could urllib.
You're right, but robotparser's read() does a call to urllib
Eduardo A. Bustamante López added the comment:
I forgot to mention that I ran a nc process in parallel, to see what data is
being sent: ``nc -l -p ``.
--
___
Python tracker
<http://bugs.python.org/issue15
Eduardo A. Bustamante López added the comment:
I'm not sure what's the best approach here.
1. Avoid changes in the Lib, and document a work-around, which involves
installing an opener with the specific User-agent. The draw-back is that it
modifies the behaviour of urlopen() gl
Eduardo A. Bustamante López added the comment:
I guess a workaround is to do:
robotparser.URLopener.version = 'MyVersion'
--
___
Python tracker
<http://bugs.python.o
Changes by Eduardo A. Bustamante López :
Added file: http://bugs.python.org/file27101/myrobotparser.py
___
Python tracker
<http://bugs.python.org/issue15851>
___
___
Pytho
New submission from Eduardo A. Bustamante López:
I found that http://en.wikipedia.org/robots.txt returns 403 if the provided
user agent is in a specific blacklist.
And since robotparser doesn't provide a mechanism to change the default user
agent used by the opener, it becomes unusabl