Hi I checked the first message on my SA and found multiple hits on __SCC_SHORT_WORDS rule which resulted in hits on the metas
* 1.0 SCC_10_SHORT_WORD_LINES 10 lines with many short words * 1.0 SCC_5_SHORT_WORD_LINES 5 lines with many short words * 1.0 SCC_20_SHORT_WORD_LINES 20 lines with many short words do you regularly perform sa-update on that box? My bayes hit on BAYES_50 Furthermore I saw two hits on KAM rules [1] As well as several dnsbl lookups hits (but its possible that these listings are younger than the msg you received). On my SA the lookups from IVM [2] (not free) hit very nice (on IPs and URIs). As well hits from razor2 [1] https://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf [2] https://www.invaluement.com/ Am 07.11.18 um 20:33 schrieb Amir Caspi: > Hi all, > > In the past couple of weeks I've gotten a number of clearly-spam > messages that slipped past SA, and the only reason was because they were > getting low Bayes scores (BAYES_50 or even down to BAYES_00 or BAYES_05). I > do my Bayes training manually on both ham and spam so there should not be any > mis-categorizations... and things worked fine until a few weeks ago, so I > don't know what's going on now. > > Here's the magic dump: > > -bash-3.2$ sa-learn --dump magic > 0.000 0 3 0 non-token data: bayes db version > 0.000 0 253112 0 non-token data: nspam > 0.000 0 106767 0 non-token data: nham > 0.000 0 150434 0 non-token data: ntokens > 0.000 0 1536087614 0 non-token data: oldest atime > 0.000 0 1541617125 0 non-token data: newest atime > 0.000 0 1541614751 0 non-token data: last journal sync > atime > 0.000 0 1541614749 0 non-token data: last expiry atime > 0.000 0 5529600 0 non-token data: last expire atime > delta > 0.000 0 1173 0 non-token data: last expire reduction > count > > > I don't see any obvious problem but I'm not an expert at interpreting these... > > Do I need to completely trash and rebuild my DB, or am I missing something > obvious? > > In many cases, it would appear that these spams have either very little > (real) text (besides the usual attempt at Bayes poisoning) and/or are using > HTML-entity encoding to try to bypass Bayes. Here are a couple of spamples: > > https://pastebin.com/peiXZivJ > https://pastebin.com/3h3r7r7j > > Does SA decode HTML entities as part of normalize_charset? If not ... can > this be added? > > I'm using SA 3.4.1 (working on upgrading to 3.4.2 but have not had time to > build it yet). > > Thanks! > > --- Amir >