Do you want them to crawl, but just get throttled? Or no spider? If
you really don't want to be crawled, you can alter the robots.txt file
(http://www.robotstxt.org/) for your web server, of do a
robots="nofollow" (http://www.robotstxt.org/meta.html). You could also
use a sitemap spec (http://sitemaps.org/) to control how often you get
crawled.
-- Cole
Quoting phpninja <[EMAIL PROTECTED]>:
One thing just off the top of my head would be a list similar to a
c:\windows\system32\drivers\etc\hosts file but for your website site. You
could write an array, or db table of malicious HTTP_USER_AGENT and if they
match just give them an IP/site ban. A quick google search gave me this
list: http://www.pgts.com.au/pgtsj/pgtsj0208d.html but I am sure there are
probably more.
Another thing could be the mod_throttle module (i think it only supports the
1.* version of apache though)
http://gunnm.org/~soda/work/oldstuff/vhffs2/vhffs-modthrottle/doc/ . You can
set policies with this, such as concurrent connections.
Regards,
-phpninja
On 2/22/08, Wade Preston Shearer <[EMAIL PROTECTED]> wrote:
Are there any effective methods for throttling bots and spiders from
spidering your site? If your sessions are database-based, how do you
keep them from bringing your server to it's knees?
_______________________________________________
UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net
_______________________________________________
UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net
_______________________________________________
UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net