What I did last night was construct nearly equal size mailboxes containing spam and non-spam. Each one had about 1600 messages in it.
I was extra careful to make sure that my non-spam folder contained a number of E-mails that might at first glance look like spam but really aren't, such as various automated notifications. That was a very tedious process, but then checking my spam folder several times a day was getting tedious as well.
I then moved the old database file and retrained. I let it run that way for about 12 hours and I'm happy to report I've only had 3 spams get through and so far I found only one false positive.
That's much more acceptable. I'll continue to train on errors like I have been so hopefully it can only get better from here.
Thanks!
Tony Meyer wrote:
I would say that retraining was the best bet, yes. Wiping (or moving aside) the existing databases and then following a train-on-errors regime would probably work best (unless you want to use the tte.py script, which would probably provide even better results).
There's lot of information about training at:
http://entrian.com/sbwiki/TrainingIdeas
-- Greg Gulik http://www.gulik.org/greg/ greg @ gulik.org
_______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
