After I finally landed on the right approach to fighting the insane robots, 
specifically requiring all links to detailed stats to have a legitimate 
Referer[sic] header. There is now a set of mod_rewrite rules in the .htaccess 
file which causes any deep links without a Referer to be rejected.

This allowed me to remove the 25k+ iptables rules blocking mostly /20 ranges 
but also nets as big as /11. httpd is now handling the extra 10-30k hits per 
hour without any trouble (403 is cheap.) The normal load is in the low hundreds 
of hits per hour, rarely more than a few dozen hits in any minute. The bots are 
now throwing as many as 1500 queries per minute and there is no serious impact 
to performance. There have been no OOM-Killer events since I made this change.

There is one issue that remains which I have not nailed down: once or twice a 
day, the runRuleQArefresh.sh cron job, which runs 21 hours out of 24 on the 
half-hour, takes a long time in its early stages and gets terminated (by a 
signal, i.e. errno=143) while running its "ruleqa.cgi -refresh" step. This is 
NOT being done by anything that leaves any traces in any logs. I haven't put 
much effort into that, since it is a housekeeping job whose full hourly run is 
not time critical. As long as the next hour's run works, there's no problem.

The VM has not yet been rebooted for updates as warned last week by Infra.

-- 
 Bill Cole
 [email protected] or [email protected]
 (AKA @[email protected] and many *@billmail.scconsult.com addresses)
 Not Currently Available For Hire

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to