Hi, >*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% > >* [score: 0.0000] > > This indicates a mistrained database, which means you have trained too > many > spams or spam-like messages (commercial messages) as ham. > > Proper training of spams should help. Just keep your spam (and optionally > ham) corpora for retraining in case you would drop the database. > > I also recommend to abstain from training commercial mail (notices from > e-shops, companies you done business with etc) as ham, unless they > generate > BAYES_999 score and you want it lower. I often train them as spam so > those > give uncertain BAYES_50 result. >
Is there any ability to distinguish a legitimate newsletter from a spam newsletter? In other words, if I train emails from Forbes or Washington Post as ham, then train similar newsletter emails from other other providers that are more suspect, will bayes still be able to distinguish Forbes and WP as ham? The problem is that if I avoid training newsletters or bulk email altogether, then I'm also left with spam newsletters still only hitting bayes50. I'm actually in a situation now where Forbes and WP newsletters are being marked as spam, so considering retraining, but wondering what approach/best practices I should be following. # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 97002 0 non-token data: nspam 0.000 0 90173 0 non-token data: nham 0.000 0 11581565 0 non-token data: ntokens 0.000 0 1054224948 0 non-token data: oldest atime 0.000 0 1676433889 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1648164856 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count