On Sat, Jul 12, 2014 at 5:06 PM, Miles Fidelman <mfidel...@meetinghouse.net> wrote:
> Jeff Trawick wrote: > > On Sat, Jul 12, 2014 at 1:25 PM, Miles Fidelman < >> mfidel...@meetinghouse.net <mailto:mfidel...@meetinghouse.net>> wrote: >> >> Hi Folks, >> >> Ever once in a while, a crawler comes along and starts indexing >> our site - and in the process pushes our server's load average >> through the roof. >> >> Short of blocking the crawlers, can anybody suggest some quick >> tuning adjustments to make, to reduce load (setting the max. >> number of servers and/or requests, renicing processes)? >> >> >> Use robots.txt to block access to dynamically generated resources which >> are >> expensive to generate and not necessary for search hits? >> >> Is it using a lot of concurrent requests, or is the main load issue due to >> the cost of the requests it is making? >> >> a bit of both If you want to limit concurrent requests just from web crawlers, try something like mod_qos. (See http://unix.stackexchange.com/questions/37481/throttling-web-crawlers) If it were me, I'd try to block needless, expensive requests with robots.txt too. http://www.robotstxt.org/robotstxt.html > > > > -- > In theory, there is no difference between theory and practice. > In practice, there is. .... Yogi Berra > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org > For additional commands, e-mail: users-h...@httpd.apache.org > > -- Born in Roswell... married an alien... http://emptyhammock.com/