It's a shared apache 2 server that's set up to put daily log files in my home 
directory. I can't muck with config files. What I'm trying to do is to remove 
the entries due to spiders, robots and other requests that don't matter to me.  
My perl script now looks for IP addresses used for /robots.txt requests and 
removes other entries with those IP addresses. But that doesn't get entries 
from the likes of Yahoo which informs me that the word "slurp" is the thing to 
look for in the browser identification entries. That works.

But I find I'm also looking for "bot", "spider", and some others which, I'm 
afraid, will pull out things that I would rather keep because of accidental 
matches. 

Are there any lists of common robots on the net?  Are there some regular 
expressions or searches that would help? Are there known IP addresses that are 
safe to discard?

-- 

--> A fair tax is one that you pay but I don't <--

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
   "   from the digest: users-digest-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org

Reply via email to