On 23/10/2010 2:28 PM, Lawrence @ Rogers wrote:
Hello all,

I noticed recently that our users are getting spam with the subject similar to the following:

SehxpyNaturalRedheaddFayeReaganHasHerFirstLesbianExperienceWithBrunet

SpamAssassin seems to be having a hard time determining whether it is spam or not because it appears as one long word.

In all cases, the subject contains no spaces (to prevent detection I would think) and is longer than 62 characters (not sure why they do this, but it is true in every sample I've seen so far).

I would like to create a rule to pick up on this, but having a bit of difficult with the regex for the rule. This is what I've come up with so far

header CR_SUBJECT_SPAMMY    Subject =~ /.{62}/
describe CR_SUBJECT_SPAMMY Subject looks spammy (contains a lot of characters, and no spaces)
score CR_SUBJECT_SPAMMY     2.5

I just need to modify the regex to check that the Subject contains no spaces.

I've done some research, and the longest non-coined word in a major dictionary is 30 characters long, meaning that if it was used twice in a subject, the total length would still only be 60 characters, There may be some FPs if the sender used formatting like commas and such, but the possibility of them using 2 of the word, then formatting without spacing, would probably be extremely remote.

Any assistance or advice would be greatly appreciated.

Regards,

Lawrence Williams
LCWSoft


This is the rule I've come up with now

# Matches a new technique used by spammers in the Subject line
# Running a bunch of pornographic words together (with no spaces) to evade
# spam filters
# This rule tests for the Subject containing any numbers, letters, or common formatting
# string must be at least 42 characters and contain no spaces

header CR_SUBJECT_SPAMMY    Subject =~ /^[0-9a-zA-Z,.+]{42,}$/
describe CR_SUBJECT_SPAMMY Subject looks spammy (contains a lot of characters, and no spaces)
score CR_SUBJECT_SPAMMY     3.5
tflags CR_SUBJECT_SPAMMY noautolearn

Reply via email to