On 11/21/2017 3:52 PM, Bowie Bailey wrote:
On 11/21/2017 4:01 PM, Jerry Malcolm wrote:
I have been using SpamAssassin in my hosting environment for several years.  It catches thousands of spam messages (thank you...). But my concern is that it doesn't catch a couple of hundred messages per day.  I have the Bayesian filter working, with a simple way to train it.  I have sent over 5000 training messages to it over the past 6-8 months. I have set up a non-forwarding caching DNS, and the black list tests are working.

My question is with the scoring.  I understand the general theory of adding up 'votes' by all of the spam tests to determine if it's indeed spam.  But it appears that no one test, no matter how certain it is, has enough power to qualify the message as spam. The Bayesian filter can say it's 80-100% certain it's spam.  But some other test decides it's not and even sometimes has a negative number that subtracts the Bayesian score from the total.  But my biggest problem is that even if it's scored as coming from a BL URL, but if Bayesian doesn't also say it's spam, then it's apparently still not spam.  I spend a couple of hours every day trying to tell the Bayesian filter about today's new strains of spam that it hasn't yet seen.

Am I missing something obvious?  Is this just the way it works, and I should expect to have to run a couple of hundred missed spams through the Bayesian filter each day?  My threshold score was originally set to 5.0.  I don't even remember where that came from.  I dropped it to 4.0 a couple of years ago, and that's where it is now.  But (see example output below) when BL says it's spam and adds 2.5, then Bayesian says it's 40-60% spam and adds 0.8, and it's got a small font and gets another 0.5, and all other tests are neutral... it's now 3.8 and STILL not spam with a threshold of 4.0.

Can someone tell me if this is by design and/or if my configuration should be adjusted?  I realize I can easily drop the threshold to 1.0 or 2.0.  But that would probably just shift the problem to tons of false-positives which obviously is not a good solution.

The general philosophy is that no one rule should be able to mark a message as spam on its own.  However, quite a few of us have bumped the scores beyond that point for rules that we trust.  Most commonly the Bayes_99 or Bayes_99+Bayes_999 can be made to score 5+ points if your database is well trained.  You can also keep an eye on negative scoring rules that seem to hit too frequently on spams and bump the scores for those.

You can also add third-party rules.  I believe the only actively-maintained set at the moment is from KAM.

http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf
http://www.pccc.com/downloads/SpamAssassin/contrib/nonKAMrules.cf

I have both of these in use on my server with good results.

You can use trusted blacklists to block spam at the MTA before it even gets to SA.  I use zen.spamhaus.org for this on my server.

You didn't specify the version of SA you are running, but you should make sure you are upgraded to the newest version (3.4.1) for best results.

The amount of spam that gets through will be dependent on your mailflow.  I usually see 1 or 2 missed spams in my inbox per day. Today, the zen blacklist blocked 33 messages to my inbox and SA marked another 20 as spam while delivering 95 clean messages.  I think there might have been one spam that made it to my inbox.  I have the threshold set at 4.0 for my mailbox.

Hi Bowie,
Thanks for the quick response.   I'm a bit concerned about going in and playing around with scores, etc considering my minimal knowledge of the overall process at this point.  I'll download and install the KAM rules.  And I'll need to start logging my percentages of clean/spam/total per account and determine if my results are typical.  But I'm just curious what others are using for their installation.  Is running SA out-of-the-box simply not done?  is it expected that all serious users will add 3rd party rule plugins and adjust scores?  I'm currently being inundated with solar, skin tag removal, 3D organ printing, Shark Tank, and Russian brides spam messages.  Seems like some of these could be pretty obvious rules. No matter what I do with Bayesian training, they are still getting through.  Just out of curiosity, what rule(s), if any, should be expected to catch spam emails with subjects such as these? Or, said differently.... are these specific subject lines being caught by other users, and if so, what rules are catching them?  Is there some way I can quickly add a rule that says if the subject contains 'Skin Tags', score it to 5 (without getting into coding rule plugins, etc)?  Or is there some place that is creating these rules for 'pretty obvious' subject lines as the new strains of spam appear?

I still feel like I'm missing something obvious....

Thanks again.

Jerry

On 11/21/2017 3:52 PM, Bowie Bailey wrote:
On 11/21/2017 4:01 PM, Jerry Malcolm wrote:
I have been using SpamAssassin in my hosting environment for several years.  It catches thousands of spam messages (thank you...).  But my concern is that it doesn't catch a couple of hundred messages per day.  I have the Bayesian filter working, with a simple way to train it.  I have sent over 5000 training messages to it over the past 6-8 months. I have set up a non-forwarding caching DNS, and the black list tests are working.

My question is with the scoring.  I understand the general theory of adding up 'votes' by all of the spam tests to determine if it's indeed spam.  But it appears that no one test, no matter how certain it is, has enough power to qualify the message as spam. The Bayesian filter can say it's 80-100% certain it's spam.  But some other test decides it's not and even sometimes has a negative number that subtracts the Bayesian score from the total.  But my biggest problem is that even if it's scored as coming from a BL URL, but if Bayesian doesn't also say it's spam, then it's apparently still not spam.  I spend a couple of hours every day trying to tell the Bayesian filter about today's new strains of spam that it hasn't yet seen.

Am I missing something obvious?  Is this just the way it works, and I should expect to have to run a couple of hundred missed spams through the Bayesian filter each day?  My threshold score was originally set to 5.0.  I don't even remember where that came from.  I dropped it to 4.0 a couple of years ago, and that's where it is now.  But (see example output below) when BL says it's spam and adds 2.5, then Bayesian says it's 40-60% spam and adds 0.8, and it's got a small font and gets another 0.5, and all other tests are neutral... it's now 3.8 and STILL not spam with a threshold of 4.0.

Can someone tell me if this is by design and/or if my configuration should be adjusted?  I realize I can easily drop the threshold to 1.0 or 2.0.  But that would probably just shift the problem to tons of false-positives which obviously is not a good solution.

The general philosophy is that no one rule should be able to mark a message as spam on its own.  However, quite a few of us have bumped the scores beyond that point for rules that we trust.  Most commonly the Bayes_99 or Bayes_99+Bayes_999 can be made to score 5+ points if your database is well trained.  You can also keep an eye on negative scoring rules that seem to hit too frequently on spams and bump the scores for those.

You can also add third-party rules.  I believe the only actively-maintained set at the moment is from KAM.

http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf
http://www.pccc.com/downloads/SpamAssassin/contrib/nonKAMrules.cf

I have both of these in use on my server with good results.

You can use trusted blacklists to block spam at the MTA before it even gets to SA.  I use zen.spamhaus.org for this on my server.

You didn't specify the version of SA you are running, but you should make sure you are upgraded to the newest version (3.4.1) for best results.

The amount of spam that gets through will be dependent on your mailflow.  I usually see 1 or 2 missed spams in my inbox per day. Today, the zen blacklist blocked 33 messages to my inbox and SA marked another 20 as spam while delivering 95 clean messages.  I think there might have been one spam that made it to my inbox.  I have the threshold set at 4.0 for my mailbox.


Reply via email to