On 24 Mar 2016, at 13:50, Yves Goergen wrote:

Hello,

I'm getting more and more spam every day and SpamAssassin can't handle it. Most of it looks very similar but it isn't filtered out.

Have you tried creating local rules for it?

I can't share the rules I've created for *some* of these families of malware-connected spam, but because the worst of them (spreading ransomware) are produced programmatically in bulk, they have very strong similarities that make multiline 'rawbody' rules helpful as well as case-sensitive header checks looking for idiosyncratic combinations of uncommon minor details.

That's vague on purpose because: spammers are known to change behavior based on posts here and on other, even notionally "private", anti-spam lists; these particular spam genera have morphed over time and so need to be treated as moving targets with regular rule adjustments and additions; and the specific best ruleset I've created for these were done in an environment where they are legally not mine to share, especially in a place where I know spammers look for ways to evade filters, making those rules obsolete faster.

I can't speak to the ClamAV issue because I don't use the extra sigs and have come to expect very little of ClamAV. Maybe ask on a ClamAV list?

What other solutions are there to improve the detection rate of SpamAssassin? My current spam-to-useful ratio in some mailboxes is somewhere around 10:1.

That implies that you are probably underutilizing spam-control measures in your MTA. I manage a diverse set of mail systems running multiple MTAs and in all cases the most effective anti-spam measure against ALL spam is delaying the initial greeting banner, which is a mandatory option for a MTA to be fit for use exposed to the modern Internet. Later in the message you say you use Exim, which I believe has such a feature, but I am not sure of that. The ideal delay to use is a matter of debate because apparently the subtleties of how the delay is done matters, but 5 seconds is usually a reasonable delay to catch most spambots and you don't start to really impair valid mail due to delays until you go above 15s.

Close behind a greeting delay, the use of high-accuracy DNSBLs is indispensable: I use Spamhaus Zen (as well as their DROP+EDROP lists in the network layer to simple never see the listed nets) ix.dnsbl.manitu.net, and psbl.surriel.com. Note that you CANNOT safely use many of these in the same ways on outbound mail submitted by your own users and inbound mail for local delivery. The same is true of many of the following measures as well. If you are not strictly segregating initial submission to a suitably configured port 587 MSA for authenticated users so that port 25 SMTP is only inbound mail from relative strangers, your spam control will be harder to do safely or well. Your own authenticated users MIGHT send spam, but some of the tactics that work best before letting SpamAssassin see a message are essentially detection of machines that *should* only be sending mail though an authenticating MSA, not directly to a remote MTA unfamiliar with them.

I'm not entirely familiar with the other options Exim offers for rejecting spam, but right behind the banner delay and DNSBLs for me are refusing mail from hosts that HELO/EHLO badly. Systems differ in what they can do in that area, but where I use this most aggressively (Postfix systems) I reject mail from hosts that HELO in strictly invalid ways that that use idiosyncratically wrong or spammer-associated ways: remote systems claiming one of my names or IP addresses, using a .local name and most unqualified names (with a whitelist for special cases, IP literals, and as a variety of valid names whose owners have said no machine anywhere would ever HELO with the name (e.g. "mail.com") and various "generic address" patterns where the hostname is derived from the client IP.

Behind that, rejecting mail from sending IP's with no PTR records is almost entirely safe on the modern net, and it is even getting safer (as more people use it) to require the PTR names to resolve back to the IP of the client machine. On systems where I can, I only check for an existing PTR, but on systems where only the stronger check is available, the rejections of valid mail have been declining over the last few years and the legit systems who keep that problem for more than a few days are quite rare.

As a result, the mail systems I run reject mail at RCPT time and in some cases at connect time from 50-90% of all of their SMTP connections. So only 10-50% of potential mail is even seen by SpamAssassin or any other message content filter This makes it feasible to do more expensive filtering in SA (such as AWL or TxRep, Bayes, complex local rules, and URIBLs) because SA is spared from seeing the bulk of the worst stuff.

That's close to the point of abandoning e-mail and reverting to telephone and snailmail. The rate of spam phone calls is a lot lower, and that's not considering the filter.

Examples of the subjects from the recent days:

   FW: Order RF#391032
   Document2
   FW: Payment Receipt
   Sixt Invoice: 6502444876 from 24.03.2016
   Attached document(s)
   FW: Payment Details - [223434]
   Image9876411149045.pdf
   Voicemail from 07730881627 <07730881627> 00:00:24
   FW: Order Status #022412
   FW: Payment #092161
   FW: Confirmation #388194

All of the messages have attachments, but I can't block all attachments completely.

But you may be able to block some. For example, my favorite tool for hooking SA into Postfix and Sendmail is MIMEDefang, a milter which I think rules out use with Exim, but in it a few lines of Perl which could probably be converted to a set of SA rules and meta-rules rather simply reject mail if it contains any of about a dozen Windows filetypes or particular names that are directly executable (.exe,.com, etc.) or have been widespread malware vectors and have no business in mail from strangers (.chm, winmail.dat,.js, etc.). Checking the relevant MIME headers using a 'full' type rule should allow you to exclude some types. Obviously PDFs and MS Office docs are a headache because they are both chronic malware vectors AND mailed around all the time innocently, but blocking .js files (recently quite popular as a vector) isn't so bad: if people want to share JavaScript code they should use other means. Too many MUAs today have failed to learn from MS's blunders and essentially will execute scripts received in mail and referenced by HTML in that mail. Not most desktop MUAs, but webmail (which IS a MUA) is often quite sloppy.

If you are not training and using SA's Bayes component you are crippling SA. It needs some adjustment (e.g. make the ham autolearn threshold slightly negative and for most sites reducing the spam autolearn threshold also helps) and it also needs some initial and routine human-driven training: have a means for users to submit spam they recognize as spam but SA didn't and if you don't reject spam but rather tag it or deliver to a spam folder, a means for them to submit those mistakes as well. Depending on the details of your delivered mailstore and how users use it, it may be possible to identify how they handle spam and how they handle ham, and train on that basis. In rare cases with just the right sort of users you might even be able to train THEM to handle spam and ham in specific ways so that you can automate finding it and feeding it to the Bayes learner.

If you are not using sa-update daily, start doing so now. Rules get added, changed, and score-adjusted whenever the project has enough fresh ham & spam input to trust their automated tools for retuning the rules to the current nature of ham & spam. This is a huge improvement over the practice of tweaking the scores of the core public ruleset yourself

Finally: use one of the SA site-specific sender reputation tools: AWL and its successor TxRep. I confess that I have not yet converted any systems from AWL to the better TxRep, but the same recommendation applies to both: enable one or the other and after a week or two, especially with a well-trained Bayes DB, you may be able to drop your spam threshold by a whole point safely.

Does grey-listing still work today?

Reportedly, yes, if it is done correctly. Unfortunately, the original simple concept has proven to have a number of edge & corner cases that can require you to set up things in a complex mail system that you otherwise would not need to, such as a reliable database with shared access if you have more than one host acting as an MX. I don't use it because I've never been desperate enough to make mail routinely delayed at that scale.

Is there an easy way to enable it in either SpamAssassin or Exim? I don't want to fiddle around with databases and such for days in a running system.

Simple answer: SA definitely not because SA isn't a greylisting tool. I would *GUESS* that Exim can't do it without substantial effort because soundly-implemented greylisting is a subtle mechanism that almost never is directly embedded in an MTA but rather is hooked in externally and just that process of getting the integration right can be a chore.

Reply via email to