On 07/17/2017 12:03 PM, Jesse Norell wrote:
This description:

On Thu, 2017-07-13 at 15:07 +0100, Martin Gregorie wrote:
I'm continuing to get good results from a multi-level approach:

I use two or more subrules with low scores (0.01 or so) that are
combined by an AND relation in a meta-rule that triggers a suitably
spammy score when all subrules get hits.

The subrules are typically automatically assembled lists of words or
phrases - automatically assembled because that makes maintenance
vastly
easier. The list contents are typically words and phrases found in
spam, e.g. one list might be selling phrases such as "get you rocks
off
with" that are unlikely to appear in personal or legit commercial mail
and another might be names or slang terms for less common
pharmaceuticals.


and what David Jones has been describing in this thread of identifying
specific combinations of rules (his based on reputation vs. content)
both remind me of the description of Marc Perkel's "evolution filter",
which from memory identified sets of rules which are very indicative of
ham/spam.   Both David and Martin are reporting good success, as did
Marc - maybe worth looking into implementing in spamassassin?

Does masscheck automate meta rule creation? (ie. not just generate
scores)  Not the full "evolution filter" idea which would have to run on
the endpoint, but that would benefit everyone via rule updates.



I have been working on rebuilding the SA project's server the past four months. The first priority was getting the spamassassin.org hidden DNS master active again. This was pretty easy. The second priority was the masscheck processing which turned out to be pretty time intensive and still could have an open issue so SA updates are currently on hold.

From what I can tell, the masscheck is only meant to dynamically update the rule scores in 72_scores.cf (manual scores are in 50_scores.cf) and help validate new rules added by the SA developers. I doesn't create new rules. It's not able to create new rules based on content since the masscheck processing is run locally by easy user. The email content is not uploaded to the SA server. Only a special log file showing all of the rule hits each message hit for ham and spam is sent to the SA server.

It would be nice if there was a local tool that could be part of the SA project that would extend the masscheck processing and help build content and meta rules. This would create more interest in masschecking and get more people involved. (I use my masscheck ham/spam to also train my Bayes DB or else it may not have been helpful enough for me to set it up and understand the value of it.) I suspect the advanced users of SA like Kevin's KAM.cf rules and a few others on this list have something like this they are using to build custom rules in an automated way. Thankfully Kevin publishes his KAM.cf and allows public downloading.

I know that Kevin has a desire to be able to speed up rule development and SA updates (could take up to ~40 hours today if it weren't currently on hold) to react faster to new spam but it will never be fast enough to react to zero-hour spam like other technologies. The best thing you can do is selective greylisting, rate limiting, DCC, Razor, Pyzor, and hope the RBLs catch up quickly. I also have a local ruleset that I add zero-hour spam to shortcircuit as spam based on content which does a pretty good job at most new spam and phishing but some still get through now and then from compromised accounts.

--
David Jones

Reply via email to