I've been meaning to ask this for a long time. I run Spambayes on a Fedora Core 5 Linux machine, among others It works extremely well. After two days training, it had pretty much figured out all my mail-list traffic, which is about 98% of my mail, and always classifies it as ham. It catches about 90-95% of my spam. On a typical day I get 600-800 messages. Every day, I go through the review process from the Spambayes web interface window and check the unsures, which are nearly all spam and properly classify them. Over time, the result is that I've built up a huge imbalance of trained messages, nearly 1000 trained spam vs. 150 trained ham
So, how to regain balance? Should I just train on a group of mails that already have been correctly classified as ham, say an equal number from each of my mail-lists, to get things back in balance? Somehow, that seems counterintuitive to me - but I can't think of any other way. Spambayes just works too well... -- Claude Jones Brunswick, MD, USA _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
