Bayes issues

Alt Thomy 13 Jul 2004 13:20:26 -0000

Hi,
my bayes looks like this:
0.000          0          2          0  non-token data: bayes db version
0.000          0       4588          0  non-token data: nspam
0.000          0      15006          0  non-token data: nham
0.000          0     148621          0  non-token data: ntokens
0.000          0 1088644104          0  non-token data: oldest atime
0.000          0 1089366749          0  non-token data: newest atime
0.000          0 1089366089          0  non-token data: last journal sync
atime
0.000          0 1089335321          0  non-token data: last expiry atime
0.000          0     691200          0  non-token data: last expire atime
delta
0.000          0       7297          0  non-token data: last expire
reduction count


I have been using it for a long time only with SA's autolearn, and recently
I started training. Basically I train it only with false positives or false
negatives (mistake-based learning). It seems to work fine, properly
classifying spam and ham messages. Is my whole approach incorrect?

Also, based on the above numbers of ham and spam, and considering that
sa-learn's man page says that above 5,000 messages there is no significant
improvement, how much more should I let it to grow?

However, my experience says that, using a large number of SA rules, it would
not be a problem to empty it, as the rules will most probably identify the
spam. All I have to do is perform training in the same frequency I do it now
(ie. it doesn't really matter if already manually 'learned' spams and hams
are lost - my work remains the same!). It's a strange approach but it works
for me (I have about 4,000 messages per day, of which about 40% is spam).

I would appreciate any comments.
Regards,
Alty

Bayes issues

Reply via email to