RE: sa-learn on a wide site HOWTO ?

Aaron Grewell Mon, 11 Jul 2005 11:04:37 -0700

> Forget about this. Most of you users will only report spams, 
> not ham, they're going to screw the bayes database. As a 
> consequence, you'll have more spam, or more fp.
> 
> You should find another solution or educate your users (but 
> it takes too much time) so they feed correctly the bayesian filters.
>


I've heard this many times, but my experience thus far hasn't borne it out.
We've got SA w/Bayes running site-wide on our 400-user system and Bayes_99
is consistently our highest-scoring test systemwide, even outscoring the
various SBL and URIBL tests.  That said, the Ham corpus is entirely my own,
I don't bother to have my users submit anything but Spam.  This works
surprisingly well, so I guess I have good Ham. :)

My method is simple and fairly manual.  I have my users put Spam in an
Exchange Public Folder (substitute shared IMAP folder if you're using a more
standard e-mail server) and copy them down into a local MBOX.  Thunderbird
is handy for this.  I upload the MBOX file to the SA server, run sa-learn,
and it's done.  Initially I had to do this fairly often, but once I had it
well trained and enough SARE rules in place it became less of an issue.  I
now run it only every other month or so.  Bayes covers a number of
corner-cases that aren't covered by rules, so it's an important part of my
overall strategy.  It's also handy to train in new spam that hasn't hit the
URIBLs or other rules yet, much easier than writing custom rules.

Bayes hasn't given any false positives that I'm aware of in the last year,
despite the theoretical skew that ought to be introduced by using everyone's
Spam and only my Ham.  I cannot tell you why, but it works and it works
well.

Aaron Grewell
Network Administrator
University of Washington Bothell

RE: sa-learn on a wide site HOWTO ?

Reply via email to