> On the plus side, I am noticing a significant difference this time > around. Trained on just 20 messages so far and it is definitely working > a lot better than my previous approach of training on everything > (60,000+ messages - and took almost 300 messages to reach the same point > I'm at now). Still have a ways to go before I know for certain. > Training one message at a time is going to take a while. > > I've been lurking on the list for ages, and have finally gotten a chance to try out spambayes (moved to Thunderbird after gettting fed up with Apple Mail). I have to echo Thomas' comments; Spambayes should train properly when confronted with common user behavior in the mailreader (ie: she tells spambayes when unsures are spam, and when spam is ham, but usually not when unsures are ham).
I am probably recapitulating some old suggestions (or even, this is the way that SB works already), but it occurs to me that you can deal with the problem of database growth by simply cutting back the word counts regularly (ie: when the spam or ham word count of any word exceeds some number, divide all the word counts of everything in the database by 2) and then zapping all of the middle-of-the-road noise words to get the total word count down to some reasonable number. Wouldn't this also deal with evolving spam signatures in a natural manner? PS: It isn't immediately obvious from the web-based interface how to zap your database, or exactly what the save+quit button really does. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
