[email protected] wrote:
Keith> How many hams and spams have you trained on? Keith> -Quite a few , around 350 spam mails, hams around 4500.

This is way out-of-balance.  Typically SpamBayes works best with roughly
equal numbers of ham and spam.

While I agree that this is out of balance, Spambayes seriously needs to get its act together and stop allowing users to train on imbalances or messages classified correctly and allows users to reset the database periodically (the POP3 proxy server seriously needs a feature that allows you to do a complete reset of the database within the UI itself).

The rule of thumb I follow is: Train on only one spam in ham and one ham in unsure. Skip training on messages I plan on filtering using my e-mail client (i.e. no point in training on messages I'm going to whitelist in the first place). Once I reach about 300 of each type, reset the database and start over.

My problem is that 99.9% of my incoming mail is spam, so there is an imbalance by default. I am forced to delete unsures because the imbalance is so great. IMO, 'unsure' is an inappropriate word choice for the category. It causes many users to feel they need to tell Spambayes what is ham and spam. This, in turn, creates the imbalances they then experience.

When was the last update to Spambayes?  Time for a new version!

--
Thomas Hruska
CubicleSoft President
Ph: 517-803-4197

*NEW* MyTaskFocus 1.1
Get on task.  Stay on task.

http://www.CubicleSoft.com/MyTaskFocus/

_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to