On Fri, 2012-10-05 at 10:17 -0700, Cathryn Mataga wrote:
> Thanks for the comments. I'll see if I can cook something up here.
> Someone asked to see the
> actual messages.
>
> I collected 4 of these messages and put them at this link.
>
> http://www.mataga.net/mataga/spam.txt
>
Here's another version. This successfully recognises all four of your
examples and doesn't fire on any of my other spam test messages:
describe MG_TWOLETTER_OBFUSCATION Two letter obfuscation (X:X X :X))
header MG_TWOLETTER_OBFUSCATION
Subject =~ /[A-Z][:%~;^][A-Z]\s{0,1}[:%~;^][A-Z0-9]/
score MG_TWOLETTER_OBFUSCATION 5.0
This rather longer regexp was wrapping when pasted into this reply, so I
split the line at 'Subject' for clarity.
Martin
PS: this may be a well-known trick, but I haven't seen it mentioned
here: current versions of grep will execute Perl regexes if you use the
-P option, so rather than writing writing a rule using a new /regex/ you
can rapidly debug it first using grep to execute it and command line
editing to modify it:
grep -P 'regex' corpus/testmessages*
noting that grep doesn't use Perl delimiters ('/') round the regex. Then
when the regex is more or less working you can write the rule and hammer
it some more.