On 21 Nov 2017, at 16:01 (-0500), Jerry Malcolm wrote:

I have the Bayesian filter working, with a simple way to train it.  I have sent over 5000 training messages to it over the past 6-8 months.

That's good, but only if it has a mix of ham and spam. If you only tell the Bayes filter about one or the other, it will work poorly. Also, while there is widely differing opinion on the "autolearn" functionality, I find it keeps my Bayes DB pretty accurate, with bayes_auto_learn_threshold_nonspam set to -0.1 (it defaults to 0.1) and bayes_auto_learn_threshold_spam to 8 (default is 12.)

On the larger issue of whether SpamAssassin is an "out-of-the-box" total solution for spam control, the simple answer is "no." No such thing exists or CAN exist. I've worked with about a dozen mail systems handling mail for about a hundred distinct domains (I'm old) and while they have had many commonalities, each has needed its own tweaking to optimize, and in cases where I've run multi-tenant systems, seemingly similar customers have required divergent filtering, i.e. multiple small businesses of similar scale in the same metropolitan area have each needed domain-specific filtering that their neighbors using the same infrastructure & services provider couldn't tolerate. The FUSSP is a myth. (see https://www.rhyolite.com/anti-spam/you-might-be.html for signs of FUSSP delusions.)

Beyond the fact that similar domains and even similar individuals can have starkly different anti-spam needs, there is a blind spot in SpamAssassin which is the result of its contributors generally practicing layered defense in such a way that they never even show to SpamAssassin a large fraction of the spam which targets them. SA will not catch some of the most blatant and even dangerous spam because it is easily caught by safe MTA (or pre-MTA, e.g. postscreen) anti-spam tactics or even network layer tools like the Spamhaus DROP and EDROP lists. If you do not use mechanisms that end the majority of SMTP sessions before the DATA phase, you will need to be especially careful about correct and customized configuration of SA.

One common area to be particularly careful about in configuring SA is network classification: the trusted_networks and internal_networks settings. If you do not set those correctly, you can end up never hitting any of the DNSBL rules or hitting them improperly because SA isn't working with the right Received header for a particular DNSBL. A related and increasingly common (dunno why) source of never hitting DNSBL rules is a form of firewall/router NAT sometimes called "Secure NAT" where inbound connections have their source IP's replaced with the IP of the device handling the NAT. This typically kills any ability of a MTA or a filter like SA to use DNSBLs or any other anti-spam tactic that requires knowing the client IP (or the client IP of the last external-client transport hop.)

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole

Reply via email to