On 9/15/2019 10:53 PM, Bert Van de Poel wrote: > Dear fellow Spamassassin users, > > I'm contacting you as a member of ULYSSIS. ULYSSIS is a student > non-profit organisation at the University of Leuven trying to make > computers and technology more approachable and available to students. > As part of this objective, we run a hosting service within our > university's network for student organisations, student unions and > individuals at our university. > > We've battled with spam from time to time, since we seem to attract a > lot of exotic languages which are rather well able to circumvent > commonly used methods. This has had us resort to some custom rulesets > to battle against mostly targetted French and SEO spam often coming > from very respectable servers and very normal addresses. > > Now because SEO spam specifically has been adapting quite well to any > rule we think of (finding alternative ways of saying the same thing > time and time again), I was hoping to write a rule that basically > boiled down to "give some spam score to emails that contain the word > SEO 3 or more times" to push those already being detected by other > rules over the edge. To be clear, this will be a low score rule, I'm > aware that ham can perfectly well contain that word 3 times, just like > this email for example. Now while investigating I started wondering > how to tackle that some spam will just have a plain text body, while > others will also feature HTML, which means that suddenly the amount > may double/half. Beyond that it seems quite hacky to use a regex that > boils down to something like /\bSEO\b.*\bSEO\b.*\bSEO\b/i instead of > something that is properly aware of the count of certain words. > > Since I sort of expected Spamassassin to have a solution for both the > text/text+html and the counting problems, I asked around on IRC but > was pointed here. So uhm, any suggestions or pointers are more than > welcome. Not too sure if any more information is required, but feel > free to ask questions or corect my presumptions if necessary. > Bert, off the cuff, SA pretty readily handles things like this. What we normally ask for is a sample of an email with all headers showing the problem. Put it up on pastebin.com since it's likely to be blocked if you email it.
you likely want a rule that looks for SEO and a multiple maxhits tflag. You can look at http://www.mcgrail.com/downloads/KAM.cf for examples. Regards, KAM -- Kevin A. McGrail kmcgr...@apache.org Member, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171