What do you do to push that last 5% or so of missed spam over the threshold from nonspam to spam?

Things already done:
-> I autoupdate Justin Mason's "sought" ruleset daily
-> I update the core rules on an irregular basis (although it averages out to at least once a week - usually at the same time as I update local rules I channelized) -> I do a modest amount of hand-training Bayes with missed spam, however the major problem there has been getting reports in a useful format - a "report as spam" button in webmail helps, but I have fewer regular reporters with ~30K users now than I did with ~300 users four or five years ago. I'm still searching for ways to make the training that *does* happen more effective. -> I use a collection of SARE level 0 and 1 rules bundled as a single update channel by openprotect.com

System resources are pretty open, but I'm thinking of that more as "headroom for more users". Some of the legacy systems I'm tuning in parallel are also a lot shorter on CPU and/or memory than the cluster doing most of the work, so bulky third-party rulesets aren't a particularly good solution - in fact I've had to shuffle the SARE rules on one system due to OOM problems.

I'm also in the process of doing some analysis on how useful various rules and rulesets are, so I can decide which ones are just overhead/overkill (hitting on lots of spam, but the hits just push the score up from "we can almost certainly delete this" to "<snicker> lookit the score on that one!").

-kgd

Reply via email to