On Sat, Jul 12, 2014 at 5:06 PM, Miles Fidelman <mfidel...@meetinghouse.net>
wrote:

> Jeff Trawick wrote:
>
>  On Sat, Jul 12, 2014 at 1:25 PM, Miles Fidelman <
>> mfidel...@meetinghouse.net <mailto:mfidel...@meetinghouse.net>> wrote:
>>
>>     Hi Folks,
>>
>>     Ever once in a while, a crawler comes along and starts indexing
>>     our site - and in the process pushes our server's load average
>>     through the roof.
>>
>>     Short of blocking the crawlers, can anybody suggest some quick
>>     tuning adjustments to make, to reduce load (setting the max.
>>     number of servers and/or requests, renicing processes)?
>>
>>
>> Use robots.txt to block access to dynamically generated resources which
>> are
>> expensive to generate and not necessary for search hits?
>>
>> Is it using a lot of concurrent requests, or is the main load issue due to
>> the cost of the requests it is making?
>>
>>  a bit of both



If you want to limit concurrent requests just from web crawlers, try
something like mod_qos.  (See
http://unix.stackexchange.com/questions/37481/throttling-web-crawlers)

If it were me, I'd try to block needless, expensive requests with
robots.txt too.  http://www.robotstxt.org/robotstxt.html



>
>
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.   .... Yogi Berra
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
> For additional commands, e-mail: users-h...@httpd.apache.org
>
>


-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Reply via email to