On Thu, Jan 22, 2009 at 6:08 PM, amit sethi <amit.pureene...@gmail.com> wrote:
> hi , I need help as to how i can fetch a wikipedia article i tried changing
> my user agent but it did not work . Although as far as my knowledge of
> robots.txt goes , looking at en.wikipedia.org/robots.txt it does not seem it
> should block a useragent (*, which is what i would normally use) from
> accesing a simple article like say
> "http://en.wikipedia.org/wiki/Sachin_Tendulkar"; but still robotparser
> returns false
> status=rp.can_fetch("*", "http://en.wikipedia.org/wiki/Sachin_Tendulkar";)
> where rp is a robot parser object . why is that?

Yes, Wikipedia is blocking the Python default user agent. This was
done to block the main internal bot in its early days (it was
misbehaving by getting each page twice); when it got to allowing the
bot again, it had already changed to having its own user agent string,
and apparently it was not deemed necessary to unblock the user
string...




--
André Engels, andreeng...@gmail.com
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to