Re: Bayes underperforming, HTML entities?

Matus UHLAR - fantomas Thu, 08 Nov 2018 10:34:48 -0800

On 07.11.18 12:33, Amir Caspi wrote:

In the past couple of weeks I've gotten a number of clearly-spam messages
that slipped past SA, and the only reason was because they were getting
low Bayes scores (BAYES_50 or even down to BAYES_00 or BAYES_05).  I do my
Bayes training manually on both ham and spam so there should not be any
mis-categorizations...  and things worked fine until a few weeks ago, so I
don't know what's going on now.


I've had similar experience after running SA in some pleaces.

Do you use autolearn? There are a few rules to detect ham (score
negatively), many of them based on default whitelists and DNS whitelists,
where many mails come from grey area companies, not necessarily spam, but
training their mail as ham can lower the detection rate of real spams.

Here's the magic dump:

-bash-3.2$ sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0     253112          0  non-token data: nspam
0.000          0     106767          0  non-token data: nham
0.000          0     150434          0  non-token data: ntokens


I found this number of tokens low, and have increased it.

bayes_expiry_max_db_size        262144

could help in the long run.

0.000          0 1536087614          0  non-token data: oldest atime
0.000          0 1541617125          0  non-token data: newest atime
0.000          0 1541614751          0  non-token data: last journal sync atime
0.000          0 1541614749          0  non-token data: last expiry atime
0.000          0    5529600          0  non-token data: last expire atime delta
0.000          0       1173          0  non-token data: last expire reduction 
count


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.

Support bacteria - they're the only culture some people have.

Re: Bayes underperforming, HTML entities?

Reply via email to