Or use an alternative: ufdbGuard.

ufdbGuard is a URL filter for Squid that has a much easier
configuration file than the Squid ACLs and additional
configuration files.
ufdbGuard is also multithreaded and very fast.

And a tip: if you are really serious about blocking
anything, you should also block 'proxy sites' (i.e. sites
used to circumvent URL filters).

-Marcus


Amos Jeffries wrote:
Muhammad Sharfuddin wrote:
On Mon, 2010-03-22 at 08:47 +0100, Marcello Romani wrote:
Muhammad Sharfuddin ha scritto:
On Mon, 2010-03-22 at 19:27 +1300, Amos Jeffries wrote:
Thanks list for help.

restarting squid is not a solution, I noticed only after 20 minutes
after restarting, squid started consuming/eating CPU again.

On Wed, 2010-03-17 at 19:54 +1100, Ivan . wrote:
you might want to check out this thread
http://www.mail-archive.com/squid-users@squid-cache.org/msg56216.html
Neither I installed any package.. i.e not checked

On Wed, 2010-03-17 at 05:27 -0700, George Herbert wrote:
or install the Google malloc library and recompile Squid to
use it instead of default gcc malloc.
On Wed, 2010-03-17 at 15:01 +0200, Henrik K wrote:
If the system regex is issue, wouldn't it be better/simpler to just
compile
with PCRE? (LDFLAGS="-lpcreposix -lpcre"). It doesn't leak and as a bonus
makes your REs faster.
Nor I re-compiled Squid, as I have to use binary/rpm version of squid
that shipped with the Distro I am using

issue resolved via removing acl that blocked almost 60K urls/domains

commenting following worked
##acl porn_deny url_regex "/etc/squid/domains.deny"
##http_access deny porn_deny

so how can I deny illegal contents/website ?

If those were actually domain names...
they are both urls and domain

  * use "dstdomain" type instead of regex.
ok nice suggestion


Optimize order of ACLs so do most rejections as soon as possible with fastest match types.
 >>
I think its optimized, as the rule(squeezing cpu) is the first rule in
squid.conf
That's the exact opposite of "optimizing" as the cpu-consuming rule is _always_ executed. First rules should be non-cpu consuming (i.e. non-regexp) and should block most of the traffic, leaving the cpu-consuming ones at the bottom, ralrely executed.

If you don't mind sharing your squid.conf access lines we can work through optimizing with you.
I posted squid.conf when I start this thread/topic, but I have no issue
posting it again ;)
I think he meant the list of blocked sites / url
its 112K after compression, am I allowed to post/attach such a big
file ?

The mailing list will drop all attachments.


squid.conf:
acl myFTP port   20  21
acl ftp_ipes src "/etc/squid/ftp_ipes.txt"
http_access allow ftp_ipes myFTP

The most optimal form of that line is:

  acl myFTP proto FTP
  http_access allow myFTP ftp_ipes

NP: Checking the protocol is faster than checking a whole list of IPs or list of ports.

http_access deny myFTP


Since you only have two network IP ranges that might be possibly allowed after the regex checks it's a good idea to start the entire process by blocking the vast range of IPs which are never going to be allowed:

 acl vip src "/etc/squid/vip_ipes.txt"
 acl mynet src "/etc/squid/allowed_ipes.txt"
 http_access deny !vip !mynet


#### this is the acl eating CPU #####
acl porn_deny url_regex "/etc/squid/domains.deny"
http_access deny porn_deny
###############################

acl vip src "/etc/squid/vip_ipes.txt"
http_access allow vip

acl entweb url_regex "/etc/squid/entwebsites.txt"
http_access deny entweb

Doing the same process to entwebsites.txt that was done to domains.deny file will stop this one becoming a second CPU waste.


acl mynet src "/etc/squid/allowed_ipes.txt"
http_access allow mynet



This is the basic process for reducing a large list of regex down to an optimal set of ACL tests....


What you can do to start with is separate all the domain-only lines from the real regex patterns:

grep -E "^([\^]?[htpf]://)?[a-z0-9\.]+(/?\$?)$" /etc/squid/domains.deny >dstdomain.deny

grep -v -E "^([\^]?[htpf]://)?[a-z0-9\.]+(/?\$?)$" /etc/squid/domains.deny >url_regex.deny

... check the output of those two files. Don't trust my 2-second pattern creation.

You will also need to strip any "^", "$", "http://"; and "/" bits off the dstdomain patterns.

When thats done see if there are any domains you can wildcard in the dstdomain list. Loading the result into squid.conf may produce WARNING lines about other duplicates that can also be removed. I'll call the ACL using this file "stopDomains" in the following example.


For the other file with ones where URL still needs a full pattern match, ... split that to create another three files: 1) dstdomains where the domain is part of the pattern. I'll call this "regexDomains" in the following example. 2) the full URL regex patterns with domains in (1). I'll call this "regexUrls" in the example below. 3) regex patterns where domain name does not matter to the match. I'll call that "regexPaths".


When thats done, change your config to make your CPU expensive lines:

  acl porn_deny url_regex "/etc/squid/domains.deny"
  http_access deny porn_deny

change into these:

# A
  acl stopDomains dstdomain "/etc/squid/dstdomain.deny"
  http_access deny stopDomains

#B
  acl regexDomains dstdomain "/etc/squid/dstdomain.regexDomains"
  acl regexUrls  url_regex -i "/etc/squid/regex.urls"
  http_access deny regexDomains regexUrls

#C
  acl regexPaths  urlpath -i "/etc/squid/regex.paths"
  http_access deny regexPaths


As you can see regex is not done unless it really has to be done.
At "A" the domains which don't have to use regex at all get blocked very fast with little CPU usage. At "B" the domains get checked and only the ones which might actually patch get a regex done to them. At "C" we have no choice so a regex is done as before. But (a) the list should now be very small and not use much CPU, and (b) most of the blocked domains are already blocked.




Amos

Reply via email to