Hello.

Following discussions on this list about obfuscating words to avoid spam detection, and not being a ninja, I'd like some feedback about the possible efficacy or pitfalls on rules like the following.

As noted in other discussions, words with scrambled letters between the first and last letter can be caught by checking the permutations of the letters:

/\ba(?:ess|ess|ses|ses)s\b/i <- finds permutations of "asses"

However, this quickly gets unweildy when building a regex checking all the permutations of more than 5 letters. Couldn't one use a regex that simply looks for the letters used and uses a negative look-ahead assertation to eliminate other words of the same length by first running the expresssion through a dictionary of words and phrases. For example, a rule for the word "exploited" after run through a dictionary of 617709 words and phrases:


    /\b(?!exploited|elliptoid|epitoxoid)e[xploite]{7}d\b/i

or perhaps an additional rule for added letters "expploited", etc.

          /\b(?!epileptoid)e[xploite]{8}d\b/i
          /\be[xploite]{9}d\b/i

or combined:

  /\b(?!exploited|elliptoid|epitoxoid|epileptoid)e[xploite]{7,9}d\b/i

Usually the obfusticated word still resembles the word with the meaning the spammer wants to convey. I doubt the spammer wants to use the word "elliptoid", and anyway, the idea is to use these rules as non-scoring rules for use with meta rules.

    (OBFU_EXPLOIT + RULE1 + RULE2 + RULE3) > 1   etc.

or whatever. Thoughts?  Other samples:

  subject =~ /\b(?!cartoon|croatan|carroon)c[arto]{5}n\b/i
  subject =~ /\b(?!downloadable)d[ownladb]{10}e\b/i
  subject =~ /\b(?!dripping)d[ripn]{6}g\b/i
  subject =~ /\b(?!ejaculating|enunciating)e[jacultin]{9}g\b/i

Of course, you could add "1" and "0" in the character set if the word contained a "o" or "l", and the like.

-- Mike

Reply via email to