What do you do to push that last 5% or so of missed spam over the
threshold from nonspam to spam?
Things already done:
-> I autoupdate Justin Mason's "sought" ruleset daily
-> I update the core rules on an irregular basis (although it averages
out to at least once a week - usually at the same time as I update local
rules I channelized)
-> I do a modest amount of hand-training Bayes with missed spam, however
the major problem there has been getting reports in a useful format - a
"report as spam" button in webmail helps, but I have fewer regular
reporters with ~30K users now than I did with ~300 users four or five
years ago. I'm still searching for ways to make the training that
*does* happen more effective.
-> I use a collection of SARE level 0 and 1 rules bundled as a single
update channel by openprotect.com
System resources are pretty open, but I'm thinking of that more as
"headroom for more users". Some of the legacy systems I'm tuning in
parallel are also a lot shorter on CPU and/or memory than the cluster
doing most of the work, so bulky third-party rulesets aren't a
particularly good solution - in fact I've had to shuffle the SARE rules
on one system due to OOM problems.
I'm also in the process of doing some analysis on how useful various
rules and rulesets are, so I can decide which ones are just
overhead/overkill (hitting on lots of spam, but the hits just push the
score up from "we can almost certainly delete this" to "<snicker> lookit
the score on that one!").
-kgd