Many thanks for your help.

On 2020-12-20 15:26, John Hardin wrote:
On Sat, 19 Dec 2020, Alan wrote:

The reason for asking is that I want to use SpamAssassin to flag some things that are suspicious but only when other conditions are met for specific users. I'd like to have SA insert the rule text, eg. LOCAL_SOME_RULE so that I can have an exim filter check for a specific form of to address plus this rule match before removing the message.

You should be able to do that purely in SA; it's a tad more difficult if you want to match the envelope to address rather than the To: header. If you want to reliably match the envelope to address you'd need to have it recorded in a Received header (either the one that your MTA generates or the one that some trusted MTA prior to your MTA generates).

Agreed, ideally this is something I can stick into a KB article and have afflicted users implement on their own. I'd like to keep system-wide modifications to a minimum. A user's exim filters also move when we transfer an account to another server, so as long as there's a common rule set, not having to adjust SA configuration is a benefit.

Basically what I have now is this:

uri __LCL_SUSPECT_LINK1 /target_pattern_1/i
tflags __LCL_SUSPECT_LINK1 multiple maxhits=5
uri __LCL_SUSPECT_LINK2 /target_pattern_2/i
tflags __LCL_SUSPECT_LINK2 multiple maxhits=5
meta LCL_MANY_SUSPECT_LINKS __LCL_SUSPECT_LINK1 && __LCL_SUSPECT_LINK2 && rules_matching(__LCL_SUSPECT_LINK?) > 5
score LCL_MANY_SUSPECT_LINKS 0.001
describe LCL_MANY_SUSPECT_LINKS More than 5 links match a suspected spam pattern
As for long sequences of random characters - that's FP-prone. It's difficult to detect *random* in a simple RE. A long string of characters from a given set, easy. Characteristics about that string? complicated. A rule like that might potentially hit on legitimate (for values of "legitimate") tracking analysis URIs or caching URIs, unless there is some kind of uncommon pattern to it that you can discern and look for in the RE.

No kidding. I've seen this specific pattern in many a spam message over the years so I suspect it's particularly FP vulnerable. If there was a regex rule for "matches English word" I could nail them with ease. OTOH my regex skills are pretty decent. Finding the two common patterns and checking that at least one of each is there will hopefully eliminate messages that consistently only use one form, eliminating a range of FPs.

If I can use the "many suspect links" match along with a few other indicators, including that this particular [expletive] makes the message look like it comes from a mailing list, I think I can kill their spew. I'm seeing upwards of 20 messages per day per user from this source, but they're rotating through junk data center IP addresses and disposable mail server identities daily. This is war.

One more noob question. Can I test a rule without messing with the production environment by using

spamassassin -t -cf='include myrule.cf' path

or should I build a test environment?

Reply via email to