On 07/17/2017 12:03 PM, Jesse Norell wrote:
This description:
On Thu, 2017-07-13 at 15:07 +0100, Martin Gregorie wrote:
I'm continuing to get good results from a multi-level approach:
I use two or more subrules with low scores (0.01 or so) that are
combined by an AND relation in a meta-rule that triggers a suitably
spammy score when all subrules get hits.
The subrules are typically automatically assembled lists of words or
phrases - automatically assembled because that makes maintenance
vastly
easier. The list contents are typically words and phrases found in
spam, e.g. one list might be selling phrases such as "get you rocks
off
with" that are unlikely to appear in personal or legit commercial mail
and another might be names or slang terms for less common
pharmaceuticals.
and what David Jones has been describing in this thread of identifying
specific combinations of rules (his based on reputation vs. content)
both remind me of the description of Marc Perkel's "evolution filter",
which from memory identified sets of rules which are very indicative of
ham/spam. Both David and Martin are reporting good success, as did
Marc - maybe worth looking into implementing in spamassassin?
Does masscheck automate meta rule creation? (ie. not just generate
scores) Not the full "evolution filter" idea which would have to run on
the endpoint, but that would benefit everyone via rule updates.
I have been working on rebuilding the SA project's server the past four
months. The first priority was getting the spamassassin.org hidden DNS
master active again. This was pretty easy. The second priority was the
masscheck processing which turned out to be pretty time intensive and
still could have an open issue so SA updates are currently on hold.
From what I can tell, the masscheck is only meant to dynamically update
the rule scores in 72_scores.cf (manual scores are in 50_scores.cf) and
help validate new rules added by the SA developers. I doesn't create
new rules. It's not able to create new rules based on content since the
masscheck processing is run locally by easy user. The email content is
not uploaded to the SA server. Only a special log file showing all of
the rule hits each message hit for ham and spam is sent to the SA server.
It would be nice if there was a local tool that could be part of the SA
project that would extend the masscheck processing and help build
content and meta rules. This would create more interest in masschecking
and get more people involved. (I use my masscheck ham/spam to also
train my Bayes DB or else it may not have been helpful enough for me to
set it up and understand the value of it.) I suspect the advanced users
of SA like Kevin's KAM.cf rules and a few others on this list have
something like this they are using to build custom rules in an automated
way. Thankfully Kevin publishes his KAM.cf and allows public downloading.
I know that Kevin has a desire to be able to speed up rule development
and SA updates (could take up to ~40 hours today if it weren't currently
on hold) to react faster to new spam but it will never be fast enough to
react to zero-hour spam like other technologies. The best thing you can
do is selective greylisting, rate limiting, DCC, Razor, Pyzor, and hope
the RBLs catch up quickly. I also have a local ruleset that I add
zero-hour spam to shortcircuit as spam based on content which does a
pretty good job at most new spam and phishing but some still get through
now and then from compromised accounts.
--
David Jones