> There are times when the Bayes database begins to misbehave, scoring
> significant ham with BAYES_99 or significant spam with BAYES_00.
> Whenever that happens, for whatever reason, wipe the database and
> retrain (a good reason to keep 2-3k spam and 2-3k ham around, for a
> quick retrain).
>

Hi,
my bayes looks like this:
0.000          0          2          0  non-token data: bayes db version
0.000          0       4588          0  non-token data: nspam
0.000          0      15006          0  non-token data: nham
0.000          0     148621          0  non-token data: ntokens
0.000          0 1088644104          0  non-token data: oldest atime
0.000          0 1089366749          0  non-token data: newest atime
0.000          0 1089366089          0  non-token data: last journal sync
atime
0.000          0 1089335321          0  non-token data: last expiry atime
0.000          0     691200          0  non-token data: last expire atime
delta
0.000          0       7297          0  non-token data: last expire
reduction count

I have been using it for a long time only with SA's autolearn, and recently
I started training. Basically I train it only with false positives or false
negatives (mistake-based learning). It seems to work fine, properly
classifying spam and ham messages. Is my whole approach incorrect?

Also, based on the above numbers of ham and spam, and considering that
sa-learn's man page says that above 5,000 messages there is no significant
improvement, how much more should I let it to grow?

However, my experience says that, using a large number of SA rules, it would
not be a problem to empty it, as the rules will most probably identify the
spam. All I have to do is perform training in the same frequency I do it now
(ie. it doesn't really matter if already manually 'learned' spams and hams
are lost - my work remains the same!). It's a strange approach but it works
for me (I have about 4,000 messages per day, of which about 40% is spam).

I would appreciate any comments.
Regards,
Alty



Reply via email to