Hi! I suggested this once before, and did not see any response. Many rules that I see suggested on this list all have the characteristic of being a good test against e-mail that contain a large number of occurences (a high 'count') of a particular 'trick' or 'obfuscation'. BUT these rules have to be scored very LOW because sometimes legitimate mail contains one or two occurences of the same text/string.
For example, Someone might include a legitimate Acronym, such as I.B.M. or I.B.E.W. and this would trigger a rule to check for a single occurence of 'period obfuscated text'. But if we were able to check the COUNT of how many times a particular rule was matched, we could easily distinguish runaway use of obfuscation. Now, if the current rule-checking logic has been optimized to stop after it finds a successful match, then we would need an extra parameter to tell the test to keep going and count all occurences. Then, we would need a parameter on the 'score' line to work with those counts. Here would be a coding example, based on Jennifer's period checker: body LOC_PERIODS count /\s[a-zA-Z]{9}\.[a-zA-Z]{1}[ ,'\?!]/i describe LOC_PERIODS Too many words with period spacing score LOC_PERIODS 5:0.5,10:1.2 Meaning in this case, score 0.5 for a count of 5 or higher, and 1.2 for a count of 10 or higher. As per other scoring lines, you could have up to four space separated groups of scores. Note that we do not want to use a straight *multiplier* as there will be cases where we want to have no score until a certain minimum threshold is reached. In the above example, up to 4 instances of period spaced words would score nothing at all.... In terms of program logic, the main change would be: - recognizing the 'count' parameter on the rule and accumulating the count, as well as insuring that testing doesn't stop on the first match. - on the scoring, recognizing the 'x:y' pairs as being count related. - A simple error condition check for: - count-style scoring (x:y) for a rule that didn't use the 'count' option. - normal style scoring (x) for a rule that used the 'count' option. So, how's that grab people? This would be a fundamental change, affecting the basic behaviour of every test except for the 'evals' - and even then with clever coding it might be applied to those. But I don't think it would be a lot of code. It would probably take longer to document the new usage.... :-) - Charles ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk