Re: Words with embedded symbols

Martin Gregorie Fri, 05 Oct 2012 17:29:54 -0700

On Fri, 2012-10-05 at 10:17 -0700, Cathryn Mataga wrote:
> Thanks for the comments. I'll see if I can cook something up here.
> Someone asked to see the
> actual messages.
> 
> I collected 4 of these messages and put them at this link.
> 
> http://www.mataga.net/mataga/spam.txt
>
Here's another version. This successfully recognises all four of your
examples and doesn't fire on any of my other spam test messages:


describe MG_TWOLETTER_OBFUSCATION Two letter obfuscation (X:X X :X))
header   MG_TWOLETTER_OBFUSCATION 
         Subject =~ /[A-Z][:%~;^][A-Z]\s{0,1}[:%~;^][A-Z0-9]/
score    MG_TWOLETTER_OBFUSCATION 5.0

This rather longer regexp was wrapping when pasted into this reply, so I
split the line at 'Subject' for clarity.


Martin


PS: this may be a well-known trick, but I haven't seen it mentioned
here: current versions of grep will execute Perl regexes if you use the
-P option, so rather than writing writing a rule using a new /regex/ you
can rapidly debug it first using grep to execute it and command line
editing to modify it:

        grep -P 'regex' corpus/testmessages*

noting that grep doesn't use Perl delimiters ('/') round the regex. Then
when the regex is more or less working you can write the rule and hammer
it some more.

Re: Words with embedded symbols

Reply via email to