Many thanks for your help.
On 2020-12-20 15:26, John Hardin wrote:
On Sat, 19 Dec 2020, Alan wrote:
The reason for asking is that I want to use SpamAssassin to flag some
things that are suspicious but only when other conditions are met for
specific users. I'd like to have SA insert the rule text, eg.
LOCAL_SOME_RULE so that I can have an exim filter check for a
specific form of to address plus this rule match before removing the
message.
You should be able to do that purely in SA; it's a tad more difficult
if you want to match the envelope to address rather than the To:
header. If you want to reliably match the envelope to address you'd
need to have it recorded in a Received header (either the one that
your MTA generates or the one that some trusted MTA prior to your MTA
generates).
Agreed, ideally this is something I can stick into a KB article and have
afflicted users implement on their own. I'd like to keep system-wide
modifications to a minimum. A user's exim filters also move when we
transfer an account to another server, so as long as there's a common
rule set, not having to adjust SA configuration is a benefit.
Basically what I have now is this:
uri __LCL_SUSPECT_LINK1 /target_pattern_1/i
tflags __LCL_SUSPECT_LINK1 multiple maxhits=5
uri __LCL_SUSPECT_LINK2 /target_pattern_2/i
tflags __LCL_SUSPECT_LINK2 multiple maxhits=5
meta LCL_MANY_SUSPECT_LINKS __LCL_SUSPECT_LINK1 && __LCL_SUSPECT_LINK2
&& rules_matching(__LCL_SUSPECT_LINK?) > 5
score LCL_MANY_SUSPECT_LINKS 0.001
describe LCL_MANY_SUSPECT_LINKS More than 5 links match a suspected spam
pattern
As for long sequences of random characters - that's FP-prone. It's
difficult to detect *random* in a simple RE. A long string of
characters from a given set, easy. Characteristics about that string?
complicated. A rule like that might potentially hit on legitimate (for
values of "legitimate") tracking analysis URIs or caching URIs, unless
there is some kind of uncommon pattern to it that you can discern and
look for in the RE.
No kidding. I've seen this specific pattern in many a spam message over
the years so I suspect it's particularly FP vulnerable. If there was a
regex rule for "matches English word" I could nail them with ease. OTOH
my regex skills are pretty decent. Finding the two common patterns and
checking that at least one of each is there will hopefully eliminate
messages that consistently only use one form, eliminating a range of FPs.
If I can use the "many suspect links" match along with a few other
indicators, including that this particular [expletive] makes the message
look like it comes from a mailing list, I think I can kill their spew.
I'm seeing upwards of 20 messages per day per user from this source, but
they're rotating through junk data center IP addresses and disposable
mail server identities daily. This is war.
One more noob question. Can I test a rule without messing with the
production environment by using
spamassassin -t -cf='include myrule.cf' path
or should I build a test environment?