What I did last night was construct nearly equal size mailboxes containing spam and non-spam. Each one had about 1600 messages in it.


I was extra careful to make sure that my non-spam folder contained a number of E-mails that might at first glance look like spam but really aren't, such as various automated notifications. That was a very tedious process, but then checking my spam folder several times a day was getting tedious as well.

I then moved the old database file and retrained. I let it run that way for about 12 hours and I'm happy to report I've only had 3 spams get through and so far I found only one false positive.

That's much more acceptable. I'll continue to train on errors like I have been so hopefully it can only get better from here.

Thanks!


Tony Meyer wrote:
I would say that retraining was the best bet, yes.  Wiping (or moving aside)
the existing databases and then following a train-on-errors regime would
probably work best (unless you want to use the tte.py script, which would
probably provide even better results).

There's lot of information about training at:

  http://entrian.com/sbwiki/TrainingIdeas


-- Greg Gulik http://www.gulik.org/greg/ greg @ gulik.org

_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to