At 03:58 PM 8/4/2004, Nicki Messerschmidt wrote:
I'm seeking information about bayesian filters. I'm using spamassassin
on our mail server with auto learning.
No I was asked if a very uneven ham/spam ration of 1:10 does harm the
filtering done by the bayesian database.

Has anyone of you more information and/or experience on this subject?

First, IMO, it's a *complete* misconception that a "perfect" bayes database should be trained with a 1:1 ratio. That's complete nonsense and discard such garbage from your mind at once. Bayes is a statistical system. Statistical systems work best when given REALISTIC input. Thus the "perfect" ratio isn't 1:1, it's whatever your real-world ham:spam ratio is. And I don't know about your network, but on mine, inbound spam outnumbers inbound ham by quite a lot.


And in general bayes is pretty resilient to gross deviations from the "perfect" ratio. My training ratio is coming in at about 1:26. My real-world inbound ratio seems to be about 1:10 or so, thus I'm even further over than that. I'm not having any problems so far.

The only situation you might run into is if you're severely undertraining ham and overtraining spam, bayes poisoning might start making nonspam emails score higher in the BAYES_ ranks.





Reply via email to