-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Chris, William,
Monday, November 17, 2003, 10:52:10 AM, Chris wrote: CS> My 1700 rules CRUSHED busy servers. This is why I sort them now by CS> order of hits. So people can prune the rules to the heavy hitters if CS> they wish. It was the only way I could think to make them still CS> usefull for people. Also they can adjust scores for the ones that hit CS> the most often. Monday, November 17, 2003 1:23 PM, William wrote: >> So if I read you correctly, adding 4800 rules essentially triples the >> cpu time needed to process a given message or collection of messages. I couldn't say that -- I have no good way of measuring the time it takes to do a normal SA evaluation of a single email nor set of emails. What I have been able to measure is the time needed for a mass check. When I run mass-check against my now 50k corpus (that's 50k email messages), it takes 15-16 minutes to run for a single rule. Adding a small number of rules doesn't seem to have much impact. However, when I ran your full set of 4800 rules in one pass, mass check took 1.5 hours. We can figure this two ways: * 4800 rules takes 75 minutes longer than 1 rule, therefore it takes 0.0156 minutes = 0.938 seconds per rule * 4800 rules x 50k messages takes 90 minutes. Therefore 4800 rules x 1 message should take 0.11 seconds. The experience of those who attempted to apply Chris' full EvilRules set indicates this is not a valid analysis (1700 rules is too much to add to busy email servers). >> Are there ways to improve the performance of the checks? I ask >> because these URI rules are tripping on about 50-60% of my current >> spam - much more than the corresponding source domain blacklist rules. That's the value of EvilRules. As valuable as they are, your blacklists only work when the spam is From some consistent address pattern. The URI rules catch the spammer's domain within the email message, regardless of who the spam is from. Performance improvements? Maybe. And I don't know whether any of this will help -- it'll take experimentation unless the developers have some answers here. Possibility 1: combine rules. If you can combine 10 tests into a single rule, > uri rulename /(?:spammer1|spammer2|s3|s4|s5|s6|s7|s8|s9|s10)\.com/i then you'll have only 480 rules, not 4800. I don't know if this will have any impact, but maybe... Possibility 2: bound the rules. I noted that the URI for 16.com matched significant ham. Test for /\bdomain/ and maybe it'll run a trifle faster. Those with more experience in this realm might have other ideas. Bob Menschel -----BEGIN PGP SIGNATURE----- Version: PGP 8.0 iQA/AwUBP7mPoZebK8E4qh1HEQIGZgCgk/hNJXsKZpmUpOKitW7WY0jNIZEAoN4Z jYjE0zyHAhElMmiLP659Axd6 =flpz -----END PGP SIGNATURE----- ------------------------------------------------------- This SF. Net email is sponsored by: GoToMyPC GoToMyPC is the fast, easy and secure way to access your computer from any Web browser or wireless device. Click here to Try it Free! https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk