On 11/21/2017 05:11 PM, Jerry Malcolm wrote:
On 11/21/2017 3:52 PM, Bowie Bailey wrote:
On 11/21/2017 4:01 PM, Jerry Malcolm wrote:
I have been using SpamAssassin in my hosting environment for several
years. It catches thousands of spam messages (thank you...). But my
concern is that it doesn't catch a couple of hundred messages per
day. I have the Bayesian filter working, with a simple way to train
it. I have sent over 5000 training messages to it over the past 6-8
months. I have set up a non-forwarding caching DNS, and the black
list tests are working.
My question is with the scoring. I understand the general theory of
adding up 'votes' by all of the spam tests to determine if it's
indeed spam. But it appears that no one test, no matter how certain
it is, has enough power to qualify the message as spam. The Bayesian
filter can say it's 80-100% certain it's spam. But some other test
decides it's not and even sometimes has a negative number that
subtracts the Bayesian score from the total. But my biggest problem
is that even if it's scored as coming from a BL URL, but if Bayesian
doesn't also say it's spam, then it's apparently still not spam. I
spend a couple of hours every day trying to tell the Bayesian filter
about today's new strains of spam that it hasn't yet seen.
Am I missing something obvious? Is this just the way it works, and I
should expect to have to run a couple of hundred missed spams through
the Bayesian filter each day? My threshold score was originally set
to 5.0. I don't even remember where that came from. I dropped it to
4.0 a couple of years ago, and that's where it is now. But (see
example output below) when BL says it's spam and adds 2.5, then
Bayesian says it's 40-60% spam and adds 0.8, and it's got a small
font and gets another 0.5, and all other tests are neutral... it's
now 3.8 and STILL not spam with a threshold of 4.0.
Can someone tell me if this is by design and/or if my configuration
should be adjusted? I realize I can easily drop the threshold to 1.0
or 2.0. But that would probably just shift the problem to tons of
false-positives which obviously is not a good solution.
The general philosophy is that no one rule should be able to mark a
message as spam on its own. However, quite a few of us have bumped
the scores beyond that point for rules that we trust. Most commonly
the Bayes_99 or Bayes_99+Bayes_999 can be made to score 5+ points if
your database is well trained. You can also keep an eye on negative
scoring rules that seem to hit too frequently on spams and bump the
scores for those.
You can also add third-party rules. I believe the only
actively-maintained set at the moment is from KAM.
http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf
http://www.pccc.com/downloads/SpamAssassin/contrib/nonKAMrules.cf
I have both of these in use on my server with good results.
You can use trusted blacklists to block spam at the MTA before it even
gets to SA. I use zen.spamhaus.org for this on my server.
You didn't specify the version of SA you are running, but you should
make sure you are upgraded to the newest version (3.4.1) for best
results.
The amount of spam that gets through will be dependent on your
mailflow. I usually see 1 or 2 missed spams in my inbox per day.
Today, the zen blacklist blocked 33 messages to my inbox and SA marked
another 20 as spam while delivering 95 clean messages. I think there
might have been one spam that made it to my inbox. I have the
threshold set at 4.0 for my mailbox.
Hi Bowie,
Thanks for the quick response. I'm a bit concerned about going in and
playing around with scores, etc considering my minimal knowledge of the
overall process at this point. I'll download and install the KAM
rules. And I'll need to start logging my percentages of
clean/spam/total per account and determine if my results are typical.
But I'm just curious what others are using for their installation. Is
running SA out-of-the-box simply not done? is it expected that all
serious users will add 3rd party rule plugins and adjust scores?
Mail filtering is very dependent on your location, expected languages,
and recipients. SpamAssassin is pretty generic/conservative out of the
box to fit in most environments safely. You need to tune it a bit for
your mail flow.
I'm currently being inundated with solar, skin tag removal, 3D organ
printing, Shark Tank, and Russian brides spam messages. Seems like some
of these could be pretty obvious rules. No matter what I do with
Bayesian training, they are still getting through. Just out of
curiosity, what rule(s), if any, should be expected to catch spam emails
with subjects such as these? Or, said differently.... are these specific
subject lines being caught by other users, and if so, what rules are
catching them? Is there some way I can quickly add a rule that says if
the subject contains 'Skin Tags', score it to 5 (without getting into
coding rule plugins, etc)? Or is there some place that is creating
these rules for 'pretty obvious' subject lines as the new strains of
spam appear?
Absolutely. Start with downloading the KAM.cf into your
/etc/mail/spamassassin once a day. Then look at the KAM.cf to find
examples on how to make custom rules. It's pretty easy:
https://wiki.apache.org/spamassassin/WritingRules
For example. This can be put in /etc/mail/spamassassin/99_my_rules.cf
header SUBJ_INVOICE Subject =~ /Invoice/i
describe SUBJ_INVOICE Subject contains invoice
score SUBJ_INVOICE 0.2
The you restart or reload whatever is launching SA (spamd, Amavis,
Mimedefang, MailScanner, etc.) to load the new rule. If you are simply
using it in a mail client like Thunderbird or Apple Mail, I don't think
you have to restart anything since it launches SpamAssassin probably for
each email. I am not that familiar with using SA in a mail client.
I still feel like I'm missing something obvious....
Thanks again.
Jerry
On 11/21/2017 3:52 PM, Bowie Bailey wrote:
On 11/21/2017 4:01 PM, Jerry Malcolm wrote:
I have been using SpamAssassin in my hosting environment for several
years. It catches thousands of spam messages (thank you...). But my
concern is that it doesn't catch a couple of hundred messages per
day. I have the Bayesian filter working, with a simple way to train
it. I have sent over 5000 training messages to it over the past 6-8
months. I have set up a non-forwarding caching DNS, and the black
list tests are working.
My question is with the scoring. I understand the general theory of
adding up 'votes' by all of the spam tests to determine if it's
indeed spam. But it appears that no one test, no matter how certain
it is, has enough power to qualify the message as spam. The Bayesian
filter can say it's 80-100% certain it's spam. But some other test
decides it's not and even sometimes has a negative number that
subtracts the Bayesian score from the total. But my biggest problem
is that even if it's scored as coming from a BL URL, but if Bayesian
doesn't also say it's spam, then it's apparently still not spam. I
spend a couple of hours every day trying to tell the Bayesian filter
about today's new strains of spam that it hasn't yet seen.
Am I missing something obvious? Is this just the way it works, and I
should expect to have to run a couple of hundred missed spams through
the Bayesian filter each day? My threshold score was originally set
to 5.0. I don't even remember where that came from. I dropped it to
4.0 a couple of years ago, and that's where it is now. But (see
example output below) when BL says it's spam and adds 2.5, then
Bayesian says it's 40-60% spam and adds 0.8, and it's got a small
font and gets another 0.5, and all other tests are neutral... it's
now 3.8 and STILL not spam with a threshold of 4.0.
Can someone tell me if this is by design and/or if my configuration
should be adjusted? I realize I can easily drop the threshold to 1.0
or 2.0. But that would probably just shift the problem to tons of
false-positives which obviously is not a good solution.
The general philosophy is that no one rule should be able to mark a
message as spam on its own. However, quite a few of us have bumped
the scores beyond that point for rules that we trust. Most commonly
the Bayes_99 or Bayes_99+Bayes_999 can be made to score 5+ points if
your database is well trained. You can also keep an eye on negative
scoring rules that seem to hit too frequently on spams and bump the
scores for those.
You can also add third-party rules. I believe the only
actively-maintained set at the moment is from KAM.
http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf
http://www.pccc.com/downloads/SpamAssassin/contrib/nonKAMrules.cf
I have both of these in use on my server with good results.
You can use trusted blacklists to block spam at the MTA before it even
gets to SA. I use zen.spamhaus.org for this on my server.
Yes. If you are running a mail filtering server with an MTA like
Postfix, you should put as much filtering as possible in the MTA. For
example, Postfix has postscreen that is very easy to enable and does
wonders without any tuning. Postscreen also has the ability to do
weighted RBLs so you can combine the score of multiple RBLs into a
threshold. Normally any single RBL hit will block an email but this is
too risky. With weighted RBLs, you can setup 20+ RBLs to work together
for better overall accuracy.
Greylisting is very effective too if your users can handle the delay on
new senders. I rolled out greylisting slowly where most users didn't
even know it and now it's helping a lot. You have to exclude Google's
mail servers from greylisting.
You didn't specify the version of SA you are running, but you should
make sure you are upgraded to the newest version (3.4.1) for best
results.
The amount of spam that gets through will be dependent on your
mailflow. I usually see 1 or 2 missed spams in my inbox per day.
Today, the zen blacklist blocked 33 messages to my inbox and SA marked
another 20 as spam while delivering 95 clean messages. I think there
might have been one spam that made it to my inbox. I have the
threshold set at 4.0 for my mailbox.
--
David Jones