Question regarding the Ham vs Spam ratio As I get many messages daily thanks to multiple active lists I belong to, I do get far more Ham than Spam. I have been reviewing & checking off the spam as it arrives and then clicking on train.
I just reviewed for new messages and none had arrived so I clicked on return Home and saw the messages copied below: -------------------------------- POP3 conversations this session: 1. Emails classified this session: 0 spam, 0 ham, 0 unsure. Total emails trained: Spam: 15 Ham: 211 More statistics... Warning: you have much more ham than spam - SpamBayes works best with approximately even numbers of ham and spam. ------------------------------ the "warning" is what I am writing about. I read that there should be a more equal ratio of spam/ham but how are we to create that ratio when email continues to come in a skewered (in in my case a 15:211) ratio? I could unsubscribe from lists and then the spam would be more equal but obviously that's not practical. I could review all emails & one by one & unclick the good ones and re-click them as "defer" thus creating a 1:1 ratio of ham/spam for training but that would create a lot of messages to keep reviewing each time I & a whole lot of what would be a PITA amount of clicking as there's no global option to list everything as defer & then select the Spam & equal #'s of HAM for training purposes. I could go to the Proxy folder and manually delete some of the stored items in the Ham cache but I suspect that would not alter the database's findings. How is one to create the proper ham/spam ratio when the incoming Ham greatly outnumbers the Spam as regards training? I must have missed something obvious. Suggestions. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
