Last-5-percent tuning

Kris Deugau Thu, 12 Feb 2009 08:48:13 -0800

What do you do to push that last 5% or so of missed spam over thethreshold from nonspam to spam?


Things already done:
-> I autoupdate Justin Mason's "sought" ruleset daily

-> I update the core rules on an irregular basis (although it averagesout to at least once a week - usually at the same time as I update localrules I channelized)-> I do a modest amount of hand-training Bayes with missed spam, howeverthe major problem there has been getting reports in a useful format - a"report as spam" button in webmail helps, but I have fewer regularreporters with ~30K users now than I did with ~300 users four or fiveyears ago. I'm still searching for ways to make the training that*does* happen more effective.-> I use a collection of SARE level 0 and 1 rules bundled as a singleupdate channel by openprotect.com

System resources are pretty open, but I'm thinking of that more as"headroom for more users". Some of the legacy systems I'm tuning inparallel are also a lot shorter on CPU and/or memory than the clusterdoing most of the work, so bulky third-party rulesets aren't aparticularly good solution - in fact I've had to shuffle the SARE ruleson one system due to OOM problems.

I'm also in the process of doing some analysis on how useful variousrules and rulesets are, so I can decide which ones are justoverhead/overkill (hitting on lots of spam, but the hits just push thescore up from "we can almost certainly delete this" to "<snicker> lookitthe score on that one!").


-kgd

Last-5-percent tuning

Reply via email to