From: "Jim C. Nasby" <[EMAIL PROTECTED]>

On Tue, Feb 07, 2006 at 03:16:57PM -0500, Matt Kettler wrote:
My current training ratio is about 7:1 spam:nonspam, but in the past it's been
as bad as 20:1. Both of those are very far off from equal amounts, but the
imbalance has never caused me any problems.

From my sa-learn --dump magic output as of today:
0.000          0     995764          0  non-token data: nspam
0.000          0     145377          0  non-token data: nham

Interesting... it appears I actually need to do a better job of training
spam!
sa-learn --dump magic|grep am
0.000          0      98757          0  non-token data: nspam
0.000          0     255134          0  non-token data: nham

I just changed bayes_auto_learn_threshold_spam to 5.0, we'll see what
that does...

If you have the option manually train the spam for awhile. If the threshold
is set too low for autolearning spam you will find yourself with a mangled
database that has a high percentage of actual ham learned as spam. That is
not a good thing. You might actually lower the ham threshold, as well. It
looks like you might be at risk of learning spam as ham. (And in fact may
have done this already to a high degree.)

{^_^}

Reply via email to