Hi,

>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> >*      [score: 0.0000]
>
> This indicates a mistrained database, which means you have trained too
> many
> spams or spam-like messages (commercial messages) as ham.
>
> Proper training of spams should help. Just keep your spam (and optionally
> ham) corpora for retraining in case you would drop the database.
>
> I also recommend to abstain from training commercial mail (notices from
> e-shops, companies you done business with etc) as ham, unless they
> generate
> BAYES_999 score and you want it lower.  I often train them as spam so
> those
> give uncertain BAYES_50 result.
>

Is there any ability to distinguish a legitimate newsletter from a spam
newsletter?

In other words, if I train emails from Forbes or Washington Post as ham,
then train similar newsletter emails from other other providers that are
more suspect, will bayes still be able to distinguish Forbes and WP as ham?

The problem is that if I avoid training newsletters or bulk email
altogether, then I'm also left with spam newsletters still only hitting
bayes50.

I'm actually in a situation now where Forbes and WP newsletters are being
marked as spam, so considering retraining, but wondering what approach/best
practices I should be following.

 # sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0      97002          0  non-token data: nspam
0.000          0      90173          0  non-token data: nham
0.000          0   11581565          0  non-token data: ntokens
0.000          0 1054224948          0  non-token data: oldest atime
0.000          0 1676433889          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync
atime
0.000          0 1648164856          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime
delta
0.000          0          0          0  non-token data: last expire
reduction count

Reply via email to