Hi

I checked the first message on my SA and found multiple hits on
__SCC_SHORT_WORDS rule which resulted in hits on the metas

        *  1.0 SCC_10_SHORT_WORD_LINES 10 lines with many short words
        *  1.0 SCC_5_SHORT_WORD_LINES 5 lines with many short words
        *  1.0 SCC_20_SHORT_WORD_LINES 20 lines with many short words

do you regularly perform sa-update on that box?
My bayes hit on BAYES_50

Furthermore I saw two hits on KAM rules [1]

As well as several dnsbl lookups hits (but its possible that these
listings are younger than the msg you received). On my SA the lookups
from IVM [2] (not free) hit very nice (on IPs and URIs). As well hits
from razor2


[1] https://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf
[2] https://www.invaluement.com/

Am 07.11.18 um 20:33 schrieb Amir Caspi:
> Hi all,
> 
>       In the past couple of weeks I've gotten a number of clearly-spam 
> messages that slipped past SA, and the only reason was because they were 
> getting low Bayes scores (BAYES_50 or even down to BAYES_00 or BAYES_05).  I 
> do my Bayes training manually on both ham and spam so there should not be any 
> mis-categorizations... and things worked fine until a few weeks ago, so I 
> don't know what's going on now.
> 
> Here's the magic dump:
> 
> -bash-3.2$ sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0     253112          0  non-token data: nspam
> 0.000          0     106767          0  non-token data: nham
> 0.000          0     150434          0  non-token data: ntokens
> 0.000          0 1536087614          0  non-token data: oldest atime
> 0.000          0 1541617125          0  non-token data: newest atime
> 0.000          0 1541614751          0  non-token data: last journal sync 
> atime
> 0.000          0 1541614749          0  non-token data: last expiry atime
> 0.000          0    5529600          0  non-token data: last expire atime 
> delta
> 0.000          0       1173          0  non-token data: last expire reduction 
> count
> 
> 
> I don't see any obvious problem but I'm not an expert at interpreting these...
> 
> Do I need to completely trash and rebuild my DB, or am I missing something 
> obvious?
> 
> In many cases, it would appear that these spams have either very little 
> (real) text (besides the usual attempt at Bayes poisoning) and/or are using 
> HTML-entity encoding to try to bypass Bayes.  Here are a couple of spamples:
> 
> https://pastebin.com/peiXZivJ
> https://pastebin.com/3h3r7r7j
> 
> Does SA decode HTML entities as part of normalize_charset?  If not ... can 
> this be added?
> 
> I'm using SA 3.4.1 (working on upgrading to 3.4.2 but have not had time to 
> build it yet).
> 
> Thanks!
> 
> --- Amir
> 

Reply via email to