I noticed that VERY_SUSP_RECIPS and VERY_SUSP_CC_RECIPS were failing to match in some cases they should, and matching in some they shouldn't.
/\b([a-z][a-z])[^@]{0,20}(@[-a-z0-9_\.]{0,30}).{0,30}?(?:\1[^@]*\2.{0,20}?){9,}/is - Sequences such as "[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], ..." matched. (This should match SUSPICIOUS_RECIPS, but not VERY_SUSP_RECIPS.) I think both occurrences of [^@] should be [^@,] to prevent it swallowing commas and usernames as part of the hostname, then mistaking a hostname as a username. Possibly it should also exclude parens and angle brackets. - I added \b before \1 to keep it from finding the repeated 2 character sequence other than at the beginning of the username. - Long hostnames caused failures. I changed \2.{0,20}? to \2.{0,30}? Obviously that could be better. - I saw a number of spams with 8 or 9 repetitions, so I'm now using {7,} instead of {9,} (If/when rules can have variable scores, a possibly worthwhile enhancement would be to make this score proportional to the number of repetitions.) The result is: /\b([a-z][a-z])[^@,]{0,20}(@[-a-z0-9_\.]{0,30}).{0,30}?(?:\b\1[^@,]*\2.{0,30}?){7,}/i This fixes both problems and works on all my tests, but I'm not 100% confident I haven't broken something. I'm assuming the intent was that "similar usernames" mean "similar initial substrings". If not I've certainly broken something, but as it was it was matching lists that had no real similarities in the usernames. Tom _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk