jdow wrote:
From: "Jim Maul" <[EMAIL PROTECTED]>
Chris Santerre wrote:
> -----Original Message-----
> From: jo3 [mailto:[EMAIL PROTECTED]
> Sent: Monday, January 09, 2006 2:28 PM
> To: users@spamassassin.apache.org
> Subject: rules better than bayes?
>
>
> Hi,
>
> This is an observation, please take it in the spirit in which it is
> intended, it is not meant to be flame bait.
>
> After using spamassassin for six solid months, it seems to me
> that the
> bayes process (sa-learn [--spam | --ham]) has only very
> limited success
> in learning about new spam. Regardless of how many spams and
> hams are
> submitted, the effectiveness never goes above the default
> level which,
> in our case here, is somewhere around 2 out of 3 spams correctly
> identified. By the same token, after adding the "third party" rule,
> airmax.cf, the effectiveness went up to 99 out of 100 spams correctly
> identified.
I have long said that IMHO, I do not think bayes is worth it. Left
unattended, it isn't as good. A simple rule can take out a lot of
spam. Some may say rule writing is more complicated then training
bayes. Maybe. Not so much the rule writing, but the figuring out what
to look for and testing for FPs.
I do not run Bayes for our company. Obviously I'm partial to
URIBL.com and SARE rules ;) I get about 98% of spam caught, and
little FPs.
This is going to sound like tooting our own horn, but so be it.
Before SARE, Bayes was cool. After SARE, I see no need.
I always feel i have to point out the flip side to this just to offer
another opinion. While i certainly dont have a NEED for bayes at our
facility, i do run it, complete with autolearn. We have very low
volume (5k msgs/day) but it works so well i rarely ever have to think
about it. For us, 96% of the time bayes alone is enough to say
whether a message is ham/spam. Add all the other tests on top of this
(uribl, razor, a few sare, and theres easily a 20 point difference
between ham and spam.
Jim, can you back that up with a run of the SARE version of sa_stats.pl?
I'd love to see your record with that setup for the highest and lowest
ranking BAYES scores.
{^_^}
i dont have any sa-stats.pl on my system, and i recall some confusion
with different scripts named the same thing so im not sure. If you can
provide me with a location to obtain the sa-stats.pl script you are
talking about i'll try to give it a run when i get some time. Im
running 2.64 through qmail-scanner if it matters.
-Jim