On 24 Mar 2016, at 13:50, Yves Goergen wrote:
Hello,
I'm getting more and more spam every day and SpamAssassin can't handle
it. Most of it looks very similar but it isn't filtered out.
Have you tried creating local rules for it?
I can't share the rules I've created for *some* of these families of
malware-connected spam, but because the worst of them (spreading
ransomware) are produced programmatically in bulk, they have very strong
similarities that make multiline 'rawbody' rules helpful as well as
case-sensitive header checks looking for idiosyncratic combinations of
uncommon minor details.
That's vague on purpose because: spammers are known to change behavior
based on posts here and on other, even notionally "private", anti-spam
lists; these particular spam genera have morphed over time and so need
to be treated as moving targets with regular rule adjustments and
additions; and the specific best ruleset I've created for these were
done in an environment where they are legally not mine to share,
especially in a place where I know spammers look for ways to evade
filters, making those rules obsolete faster.
I can't speak to the ClamAV issue because I don't use the extra sigs and
have come to expect very little of ClamAV. Maybe ask on a ClamAV list?
What other solutions are there to improve the detection rate of
SpamAssassin? My current spam-to-useful ratio in some mailboxes is
somewhere around 10:1.
That implies that you are probably underutilizing spam-control measures
in your MTA. I manage a diverse set of mail systems running multiple
MTAs and in all cases the most effective anti-spam measure against ALL
spam is delaying the initial greeting banner, which is a mandatory
option for a MTA to be fit for use exposed to the modern Internet. Later
in the message you say you use Exim, which I believe has such a feature,
but I am not sure of that. The ideal delay to use is a matter of debate
because apparently the subtleties of how the delay is done matters, but
5 seconds is usually a reasonable delay to catch most spambots and you
don't start to really impair valid mail due to delays until you go above
15s.
Close behind a greeting delay, the use of high-accuracy DNSBLs is
indispensable: I use Spamhaus Zen (as well as their DROP+EDROP lists in
the network layer to simple never see the listed nets)
ix.dnsbl.manitu.net, and psbl.surriel.com. Note that you CANNOT safely
use many of these in the same ways on outbound mail submitted by your
own users and inbound mail for local delivery. The same is true of many
of the following measures as well. If you are not strictly segregating
initial submission to a suitably configured port 587 MSA for
authenticated users so that port 25 SMTP is only inbound mail from
relative strangers, your spam control will be harder to do safely or
well. Your own authenticated users MIGHT send spam, but some of the
tactics that work best before letting SpamAssassin see a message are
essentially detection of machines that *should* only be sending mail
though an authenticating MSA, not directly to a remote MTA unfamiliar
with them.
I'm not entirely familiar with the other options Exim offers for
rejecting spam, but right behind the banner delay and DNSBLs for me are
refusing mail from hosts that HELO/EHLO badly. Systems differ in what
they can do in that area, but where I use this most aggressively
(Postfix systems) I reject mail from hosts that HELO in strictly invalid
ways that that use idiosyncratically wrong or spammer-associated ways:
remote systems claiming one of my names or IP addresses, using a .local
name and most unqualified names (with a whitelist for special cases, IP
literals, and as a variety of valid names whose owners have said no
machine anywhere would ever HELO with the name (e.g. "mail.com") and
various "generic address" patterns where the hostname is derived from
the client IP.
Behind that, rejecting mail from sending IP's with no PTR records is
almost entirely safe on the modern net, and it is even getting safer (as
more people use it) to require the PTR names to resolve back to the IP
of the client machine. On systems where I can, I only check for an
existing PTR, but on systems where only the stronger check is available,
the rejections of valid mail have been declining over the last few years
and the legit systems who keep that problem for more than a few days are
quite rare.
As a result, the mail systems I run reject mail at RCPT time and in some
cases at connect time from 50-90% of all of their SMTP connections. So
only 10-50% of potential mail is even seen by SpamAssassin or any other
message content filter This makes it feasible to do more expensive
filtering in SA (such as AWL or TxRep, Bayes, complex local rules, and
URIBLs) because SA is spared from seeing the bulk of the worst stuff.
That's close to the point of abandoning e-mail and reverting to
telephone and snailmail. The rate of spam phone calls is a lot lower,
and that's not considering the filter.
Examples of the subjects from the recent days:
FW: Order RF#391032
Document2
FW: Payment Receipt
Sixt Invoice: 6502444876 from 24.03.2016
Attached document(s)
FW: Payment Details - [223434]
Image9876411149045.pdf
Voicemail from 07730881627 <07730881627> 00:00:24
FW: Order Status #022412
FW: Payment #092161
FW: Confirmation #388194
All of the messages have attachments, but I can't block all
attachments completely.
But you may be able to block some. For example, my favorite tool for
hooking SA into Postfix and Sendmail is MIMEDefang, a milter which I
think rules out use with Exim, but in it a few lines of Perl which could
probably be converted to a set of SA rules and meta-rules rather simply
reject mail if it contains any of about a dozen Windows filetypes or
particular names that are directly executable (.exe,.com, etc.) or have
been widespread malware vectors and have no business in mail from
strangers (.chm, winmail.dat,.js, etc.). Checking the relevant MIME
headers using a 'full' type rule should allow you to exclude some types.
Obviously PDFs and MS Office docs are a headache because they are both
chronic malware vectors AND mailed around all the time innocently, but
blocking .js files (recently quite popular as a vector) isn't so bad: if
people want to share JavaScript code they should use other means. Too
many MUAs today have failed to learn from MS's blunders and essentially
will execute scripts received in mail and referenced by HTML in that
mail. Not most desktop MUAs, but webmail (which IS a MUA) is often quite
sloppy.
If you are not training and using SA's Bayes component you are crippling
SA. It needs some adjustment (e.g. make the ham autolearn threshold
slightly negative and for most sites reducing the spam autolearn
threshold also helps) and it also needs some initial and routine
human-driven training: have a means for users to submit spam they
recognize as spam but SA didn't and if you don't reject spam but rather
tag it or deliver to a spam folder, a means for them to submit those
mistakes as well. Depending on the details of your delivered mailstore
and how users use it, it may be possible to identify how they handle
spam and how they handle ham, and train on that basis. In rare cases
with just the right sort of users you might even be able to train THEM
to handle spam and ham in specific ways so that you can automate finding
it and feeding it to the Bayes learner.
If you are not using sa-update daily, start doing so now. Rules get
added, changed, and score-adjusted whenever the project has enough fresh
ham & spam input to trust their automated tools for retuning the rules
to the current nature of ham & spam. This is a huge improvement over the
practice of tweaking the scores of the core public ruleset yourself
Finally: use one of the SA site-specific sender reputation tools: AWL
and its successor TxRep. I confess that I have not yet converted any
systems from AWL to the better TxRep, but the same recommendation
applies to both: enable one or the other and after a week or two,
especially with a well-trained Bayes DB, you may be able to drop your
spam threshold by a whole point safely.
Does grey-listing still work today?
Reportedly, yes, if it is done correctly. Unfortunately, the original
simple concept has proven to have a number of edge & corner cases that
can require you to set up things in a complex mail system that you
otherwise would not need to, such as a reliable database with shared
access if you have more than one host acting as an MX. I don't use it
because I've never been desperate enough to make mail routinely delayed
at that scale.
Is there an easy way to enable it in either SpamAssassin or Exim? I
don't want to fiddle around with databases and such for days in a
running system.
Simple answer: SA definitely not because SA isn't a greylisting tool. I
would *GUESS* that Exim can't do it without substantial effort because
soundly-implemented greylisting is a subtle mechanism that almost never
is directly embedded in an MTA but rather is hooked in externally and
just that process of getting the integration right can be a chore.