RERP (Robot Exclusion Rules Parser) is an alternative to Python's standard robotparser module. I was motivated to write this because the Python's robotparser doesn't gracefully handle non-ASCII which occurs in about .1% of robots.txt files. This module (RERP) handles non-ASCII and also adds a few other niceties (like the ability to customize the user-agent string sent when fetching a robots.txt file).
The code, documentation, background, discussion of the specs and examples are all here: http://NikitaTheSpider.com/articles/rerp.html Enjoy! -- Philip http://NiktaTheSpider.com/ Bulk HTML validation, link checking and more -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html