Re: True spam getting really low Bayesian points

maillist Thu, 25 Jan 2007 08:36:41 -0800

maillist wrote:

Kim Christensen wrote:
Hey list,
I've recently started training our bayesian filter with spam/ham from my
personal mailbox, to prepare for live usage on our customer accounts.

% sa-learn --dump magic
...
0.000          0        340          0  non-token data: nspam
0.000          0        475          0  non-token data: nham
0.000          0      53404          0  non-token data: ntokens
...

So far so good, and spamd is actually using the bayesian db when
examining incoming mails. However, I find that a few of the legit ham(not a majority) mails get unusually high bayesian points, while some
of the real spam (which gets scored as spam by sa) often get bayesian
points < 1.
Now, I'm sure I haven't trained the database with wrong messages. Is it
a good idea to continue feeding sa-learn with example spam and ham until
it reaches a few thousands messages, before relying on the results?

I would think my current amount is sufficient, but I guess something's
wrong with this picture :-)


Best regards
Run spamassassin --test-mode on the messages that are scoring high andlow. See if they are actually running through any BAYES_* tests. I'mnot 100% sure but I think that by default, the bayes do not even beginuntil you have 500 trained messages of each spam and ham.
You can of course get around this by setting bayes_min_ham_num andbayes_min_spam_num in your local.cf file.
-=Aubrey=-

The default for 3.* is 200 messages for each.  Sorry dude.

-=Aubrey=-

Re: True spam getting really low Bayesian points

Reply via email to