Re: Scoring Philosophy?

Jerry Malcolm Tue, 21 Nov 2017 15:11:31 -0800

On 11/21/2017 3:52 PM, Bowie Bailey wrote:

On 11/21/2017 4:01 PM, Jerry Malcolm wrote:
I have been using SpamAssassin in my hosting environment for severalyears. It catches thousands of spam messages (thank you...). But myconcern is that it doesn't catch a couple of hundred messages perday. I have the Bayesian filter working, with a simple way to trainit. I have sent over 5000 training messages to it over the past 6-8months. I have set up a non-forwarding caching DNS, and the blacklist tests are working.
My question is with the scoring. I understand the general theory ofadding up 'votes' by all of the spam tests to determine if it'sindeed spam. But it appears that no one test, no matter how certainit is, has enough power to qualify the message as spam. The Bayesianfilter can say it's 80-100% certain it's spam. But some other testdecides it's not and even sometimes has a negative number thatsubtracts the Bayesian score from the total. But my biggest problemis that even if it's scored as coming from a BL URL, but if Bayesiandoesn't also say it's spam, then it's apparently still not spam. Ispend a couple of hours every day trying to tell the Bayesian filterabout today's new strains of spam that it hasn't yet seen.
Am I missing something obvious? Is this just the way it works, and Ishould expect to have to run a couple of hundred missed spams throughthe Bayesian filter each day? My threshold score was originally setto 5.0. I don't even remember where that came from. I dropped it to4.0 a couple of years ago, and that's where it is now. But (seeexample output below) when BL says it's spam and adds 2.5, thenBayesian says it's 40-60% spam and adds 0.8, and it's got a smallfont and gets another 0.5, and all other tests are neutral... it'snow 3.8 and STILL not spam with a threshold of 4.0.
Can someone tell me if this is by design and/or if my configurationshould be adjusted? I realize I can easily drop the threshold to 1.0or 2.0. But that would probably just shift the problem to tons offalse-positives which obviously is not a good solution.
The general philosophy is that no one rule should be able to mark amessage as spam on its own. However, quite a few of us have bumpedthe scores beyond that point for rules that we trust. Most commonlythe Bayes_99 or Bayes_99+Bayes_999 can be made to score 5+ points ifyour database is well trained. You can also keep an eye on negativescoring rules that seem to hit too frequently on spams and bump thescores for those.
You can also add third-party rules. I believe the onlyactively-maintained set at the moment is from KAM.
http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf
http://www.pccc.com/downloads/SpamAssassin/contrib/nonKAMrules.cf

I have both of these in use on my server with good results.
You can use trusted blacklists to block spam at the MTA before it evengets to SA. I use zen.spamhaus.org for this on my server.
You didn't specify the version of SA you are running, but you shouldmake sure you are upgraded to the newest version (3.4.1) for bestresults.
The amount of spam that gets through will be dependent on yourmailflow. I usually see 1 or 2 missed spams in my inbox per day.Today, the zen blacklist blocked 33 messages to my inbox and SA markedanother 20 as spam while delivering 95 clean messages. I think theremight have been one spam that made it to my inbox. I have thethreshold set at 4.0 for my mailbox.

Hi Bowie,

Thanks for the quick response. I'm a bit concerned about going in andplaying around with scores, etc considering my minimal knowledge of theoverall process at this point. I'll download and install the KAMrules. And I'll need to start logging my percentages ofclean/spam/total per account and determine if my results are typical. But I'm just curious what others are using for their installation. Isrunning SA out-of-the-box simply not done? is it expected that allserious users will add 3rd party rule plugins and adjust scores? I'mcurrently being inundated with solar, skin tag removal, 3D organprinting, Shark Tank, and Russian brides spam messages. Seems like someof these could be pretty obvious rules. No matter what I do withBayesian training, they are still getting through. Just out ofcuriosity, what rule(s), if any, should be expected to catch spam emailswith subjects such as these? Or, said differently.... are these specificsubject lines being caught by other users, and if so, what rules arecatching them? Is there some way I can quickly add a rule that says ifthe subject contains 'Skin Tags', score it to 5 (without getting intocoding rule plugins, etc)? Or is there some place that is creatingthese rules for 'pretty obvious' subject lines as the new strains ofspam appear?


I still feel like I'm missing something obvious....

Thanks again.

Jerry

On 11/21/2017 3:52 PM, Bowie Bailey wrote:

On 11/21/2017 4:01 PM, Jerry Malcolm wrote:
I have been using SpamAssassin in my hosting environment for severalyears. It catches thousands of spam messages (thank you...). But myconcern is that it doesn't catch a couple of hundred messages perday. I have the Bayesian filter working, with a simple way to trainit. I have sent over 5000 training messages to it over the past 6-8months. I have set up a non-forwarding caching DNS, and the blacklist tests are working.
My question is with the scoring. I understand the general theory ofadding up 'votes' by all of the spam tests to determine if it'sindeed spam. But it appears that no one test, no matter how certainit is, has enough power to qualify the message as spam. The Bayesianfilter can say it's 80-100% certain it's spam. But some other testdecides it's not and even sometimes has a negative number thatsubtracts the Bayesian score from the total. But my biggest problemis that even if it's scored as coming from a BL URL, but if Bayesiandoesn't also say it's spam, then it's apparently still not spam. Ispend a couple of hours every day trying to tell the Bayesian filterabout today's new strains of spam that it hasn't yet seen.
Am I missing something obvious? Is this just the way it works, and Ishould expect to have to run a couple of hundred missed spams throughthe Bayesian filter each day? My threshold score was originally setto 5.0. I don't even remember where that came from. I dropped it to4.0 a couple of years ago, and that's where it is now. But (seeexample output below) when BL says it's spam and adds 2.5, thenBayesian says it's 40-60% spam and adds 0.8, and it's got a smallfont and gets another 0.5, and all other tests are neutral... it'snow 3.8 and STILL not spam with a threshold of 4.0.
Can someone tell me if this is by design and/or if my configurationshould be adjusted? I realize I can easily drop the threshold to 1.0or 2.0. But that would probably just shift the problem to tons offalse-positives which obviously is not a good solution.
The general philosophy is that no one rule should be able to mark amessage as spam on its own. However, quite a few of us have bumpedthe scores beyond that point for rules that we trust. Most commonlythe Bayes_99 or Bayes_99+Bayes_999 can be made to score 5+ points ifyour database is well trained. You can also keep an eye on negativescoring rules that seem to hit too frequently on spams and bump thescores for those.
You can also add third-party rules. I believe the onlyactively-maintained set at the moment is from KAM.
http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf
http://www.pccc.com/downloads/SpamAssassin/contrib/nonKAMrules.cf

I have both of these in use on my server with good results.
You can use trusted blacklists to block spam at the MTA before it evengets to SA. I use zen.spamhaus.org for this on my server.
You didn't specify the version of SA you are running, but you shouldmake sure you are upgraded to the newest version (3.4.1) for bestresults.
The amount of spam that gets through will be dependent on yourmailflow. I usually see 1 or 2 missed spams in my inbox per day.Today, the zen blacklist blocked 33 messages to my inbox and SA markedanother 20 as spam while delivering 95 clean messages. I think theremight have been one spam that made it to my inbox. I have thethreshold set at 4.0 for my mailbox.

Re: Scoring Philosophy?

Reply via email to