Re[2]: [SAtalk] Sanity checking new uri rules?

Robert Menschel Mon, 17 Nov 2003 21:12:53 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Chris, William,

Monday, November 17, 2003, 10:52:10 AM, Chris wrote:

CS> My 1700 rules CRUSHED busy servers. This is why I sort them now by
CS> order of hits. So people can prune the rules to the heavy hitters if
CS> they wish. It was the only way I could think to make them still
CS> usefull for people. Also they can adjust scores for the ones that hit
CS> the most often.    

Monday, November 17, 2003 1:23 PM, William wrote:
>> So if I read you correctly, adding 4800 rules essentially triples the
>> cpu time needed to process a given message or collection of messages. 

I couldn't say that -- I have no good way of measuring the time it takes
to do a normal SA evaluation of a single email nor set of emails.

What I have been able to measure is the time needed for a mass check.
When I run mass-check against my now 50k corpus (that's 50k email
messages), it takes 15-16 minutes to run for a single rule. Adding a
small number of rules doesn't seem to have much impact. However, when I
ran your full set of 4800 rules in one pass, mass check took 1.5 hours.

We can figure this two ways:
* 4800 rules takes 75 minutes longer than 1 rule, therefore it takes
0.0156 minutes = 0.938 seconds per rule
* 4800 rules x 50k messages takes 90 minutes. Therefore 4800 rules x 1
message should take 0.11 seconds. The experience of those who attempted
to apply Chris' full EvilRules set indicates this is not a valid analysis
(1700 rules is too much to add to busy email servers).

>> Are there ways to improve the performance of the checks?  I ask
>> because these URI rules are tripping on about 50-60% of my current
>> spam - much more than the corresponding source domain blacklist rules.

That's the value of EvilRules. As valuable as they are, your blacklists
only work when the spam is From some consistent address pattern. The URI
rules catch the spammer's domain within the email message, regardless of
who the spam is from.

Performance improvements? Maybe. And I don't know whether any of this
will help -- it'll take experimentation unless the developers have some
answers here.

Possibility 1: combine rules.  If you can combine 10 tests into a single
rule,
> uri rulename /(?:spammer1|spammer2|s3|s4|s5|s6|s7|s8|s9|s10)\.com/i
then you'll have only 480 rules, not 4800. I don't know if this will have
any impact, but maybe...

Possibility 2: bound the rules.  I noted that the URI for 16.com matched
significant ham.  Test for /\bdomain/ and maybe it'll run a trifle
faster.

Those with more experience in this realm might have other ideas.

Bob Menschel

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0

iQA/AwUBP7mPoZebK8E4qh1HEQIGZgCgk/hNJXsKZpmUpOKitW7WY0jNIZEAoN4Z
jYjE0zyHAhElMmiLP659Axd6
=flpz
-----END PGP SIGNATURE-----

-------------------------------------------------------
This SF. Net email is sponsored by: GoToMyPC
GoToMyPC is the fast, easy and secure way to access your computer from
any Web browser or wireless device. Click here to Try it Free!
https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re[2]: [SAtalk] Sanity checking new uri rules?

Reply via email to