Miles Fidelman wrote:
Hi Folks,

Ever once in a while, a crawler comes along and starts indexing our site - and in the process pushes our server's load average through the roof.

Short of blocking the crawlers, can anybody suggest some quick tuning adjustments to make, to reduce load (setting the max. number of servers and/or requests, renicing processes)?

Yes - my next step is to go pour through manuals - but I expect others have done this enough to be able to point me at a few specific config file lines to change, and specific commands for identifying and renicing processes.

Thanks very much,

Miles Fidelman


http://BadBotBlocker.com is some code I use on client sites to do adaptive 
blocking.

   Warning: code is ugly + requires a good cleanup + packaging for different 
OSes.

Pretty simple.

1) Anyone who follows the /bad-spider/ link gets blocked for 1 hour
   via iptables (all ports + protocols)

2) After 1 hour the iptables block rule is removed

3) Add <display:none> links to /bad-spider/ in every file served,
   so only non-humans every "see this link"

4) add the link /bad-spider/ as blocked for everyone to robots.txt

This simple approach has cut down traffic by 80-90% for some of my clients.

Because the rules only last for 1 hour, no site is blocked forever.

Because the rules are adaptive (appear + disappear), there no maintenance.

After reboots, all rules are lost + process just starts again.

Very simple code.

- David, Skype: davidfavor

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org

Reply via email to