Re: [squid-users] squid consuming too much processor/cpu

Marcus Kool Mon, 22 Mar 2010 07:48:08 -0700

Or use an alternative: ufdbGuard.

ufdbGuard is a URL filter for Squid that has a much easier
configuration file than the Squid ACLs and additional
configuration files.
ufdbGuard is also multithreaded and very fast.


And a tip: if you are really serious about blocking
anything, you should also block 'proxy sites' (i.e. sites
used to circumvent URL filters).

-Marcus


Amos Jeffries wrote:

Muhammad Sharfuddin wrote:
On Mon, 2010-03-22 at 08:47 +0100, Marcello Romani wrote:
Muhammad Sharfuddin ha scritto:
On Mon, 2010-03-22 at 19:27 +1300, Amos Jeffries wrote:
Thanks list for help.

restarting squid is not a solution, I noticed only after 20 minutes
after restarting, squid started consuming/eating CPU again.

On Wed, 2010-03-17 at 19:54 +1100, Ivan . wrote:
you might want to check out this thread
http://www.mail-archive.com/squid-users@squid-cache.org/msg56216.html
Neither I installed any package.. i.e not checked

On Wed, 2010-03-17 at 05:27 -0700, George Herbert wrote:
or install the Google malloc library and recompile Squid to
use it instead of default gcc malloc.
On Wed, 2010-03-17 at 15:01 +0200, Henrik K wrote:
If the system regex is issue, wouldn't it be better/simpler to just
compile
with PCRE? (LDFLAGS="-lpcreposix -lpcre"). It doesn't leak and asa bonus
makes your REs faster.
Nor I re-compiled Squid, as I have to use binary/rpm version of squid
that shipped with the Distro I am using

issue resolved via removing acl that blocked almost 60K urls/domains

commenting following worked
##acl porn_deny url_regex "/etc/squid/domains.deny"
##http_access deny porn_deny

so how can I deny illegal contents/website ?
If those were actually domain names...
they are both urls and domain
  * use "dstdomain" type instead of regex.
ok nice suggestion
Optimize order of ACLs so do most rejections as soon as possiblewith fastest match types.
 >>
I think its optimized, as the rule(squeezing cpu) is the first rule in
squid.conf
That's the exact opposite of "optimizing" as the cpu-consuming ruleis _always_ executed.First rules should be non-cpu consuming (i.e. non-regexp) and shouldblock most of the traffic, leaving the cpu-consuming ones at thebottom, ralrely executed.
If you don't mind sharing your squid.conf access lines we can workthrough optimizing with you.
I posted squid.conf when I start this thread/topic, but I have no issue
posting it again ;)
I think he meant the list of blocked sites / url
its 112K after compression, am I allowed to post/attach such a big
file ?
The mailing list will drop all attachments.
squid.conf:
acl myFTP port   20  21
acl ftp_ipes src "/etc/squid/ftp_ipes.txt"
http_access allow ftp_ipes myFTP
The most optimal form of that line is:

  acl myFTP proto FTP
  http_access allow myFTP ftp_ipes
NP: Checking the protocol is faster than checking a whole list of IPs orlist of ports.
http_access deny myFTP
Since you only have two network IP ranges that might be possibly allowedafter the regex checks it's a good idea to start the entire process byblocking the vast range of IPs which are never going to be allowed:
 acl vip src "/etc/squid/vip_ipes.txt"
 acl mynet src "/etc/squid/allowed_ipes.txt"
 http_access deny !vip !mynet
#### this is the acl eating CPU #####
acl porn_deny url_regex "/etc/squid/domains.deny"
http_access deny porn_deny
###############################

acl vip src "/etc/squid/vip_ipes.txt"
http_access allow vip

acl entweb url_regex "/etc/squid/entwebsites.txt"
http_access deny entweb
Doing the same process to entwebsites.txt that was done to domains.denyfile will stop this one becoming a second CPU waste.
acl mynet src "/etc/squid/allowed_ipes.txt"
http_access allow mynet
This is the basic process for reducing a large list of regex down to anoptimal set of ACL tests....
What you can do to start with is separate all the domain-only lines fromthe real regex patterns:
grep -E "^([\^]?[htpf]://)?[a-z0-9\.]+(/?\$?)$"/etc/squid/domains.deny >dstdomain.deny
grep -v -E "^([\^]?[htpf]://)?[a-z0-9\.]+(/?\$?)$"/etc/squid/domains.deny >url_regex.deny
... check the output of those two files. Don't trust my 2-second patterncreation.
You will also need to strip any "^", "$", "http://"; and "/" bits off thedstdomain patterns.
When thats done see if there are any domains you can wildcard in thedstdomain list. Loading the result into squid.conf may produce WARNINGlines about other duplicates that can also be removed. I'll call the ACLusing this file "stopDomains" in the following example.
For the other file with ones where URL still needs a full pattern match,... split that to create another three files:1) dstdomains where the domain is part of the pattern. I'll call this"regexDomains" in the following example.2) the full URL regex patterns with domains in (1). I'll call this"regexUrls" in the example below.3) regex patterns where domain name does not matter to the match.I'll call that "regexPaths".
When thats done, change your config to make your CPU expensive lines:

  acl porn_deny url_regex "/etc/squid/domains.deny"
  http_access deny porn_deny

change into these:

# A
  acl stopDomains dstdomain "/etc/squid/dstdomain.deny"
  http_access deny stopDomains

#B
  acl regexDomains dstdomain "/etc/squid/dstdomain.regexDomains"
  acl regexUrls  url_regex -i "/etc/squid/regex.urls"
  http_access deny regexDomains regexUrls

#C
  acl regexPaths  urlpath -i "/etc/squid/regex.paths"
  http_access deny regexPaths


As you can see regex is not done unless it really has to be done.
At "A" the domains which don't have to use regex at all get blockedvery fast with little CPU usage.At "B" the domains get checked and only the ones which might actuallypatch get a regex done to them.At "C" we have no choice so a regex is done as before. But (a) the listshould now be very small and not use much CPU, and (b) most of theblocked domains are already blocked.
Amos

Re: [squid-users] squid consuming too much processor/cpu

Reply via email to