I see the logic you are adopting, but unfortunately it doesn't quite pan
out.

Take the 4th example you provided. Here you acknowledge that while
enunciating is not an anagram of ejaculating, it is still a possible
outcome from your set.

Mathematically the problem faced is this:

Writing the anagrams out specifically for a 5 letter set gives:

5*4*3*2*1 = 120 variations 

Working out all the variations using the strategy you list below is

5*5*5*5*5 = 3125 variations.

While most of these 3125 variations are likely not to be words, you need
to check them all, not only in english but in other languages to ensure
that there are no FPs. Assuming only 1% of the choices work as words,
there are still 30 words to list, which isn't actually that great an
improvement given the amount of set up time it costs you. The more
letters you add the more obvious the problem becomes.

The simplest way to think about a regular expression is like a flow
chart without variables. If you can draw out a flow chart using each
character in sequence as an input, then it can be made into a regular
expression. For an anagram, it can't be. In effect, an anagram is a
one-way function (anyone care to speculate on its use a crypto one way
function?). For more info on Finite state machines see
http://en.wikipedia.org/wiki/Finite_state_machine

I'll refrain from discussing this any further on the list because I
don't want to point out anything of further use to spammers, so if
anyone wants to talk about this more, mail me privately, and we can hit
reply all (if I know who you are :) )

R

-----Original Message-----
From: Mike Grau [mailto:[EMAIL PROTECTED] 
Sent: 28 February 2005 17:55
To: [email protected]
Subject: Re: Rule advice please

> <SNIP>
> 
>    subject =~ /\b(?!cartoon|croatan|carroon)c[arto]{5}n\b/i
>    subject =~ /\b(?!downloadable)d[ownladb]{10}e\b/i
>    subject =~ /\b(?!dripping)d[ripn]{6}g\b/i
>    subject =~ /\b(?!ejaculating|enunciating)e[jacultin]{9}g\b/i
> 
> You can't use rules like this. The pattern "caaaaan" matches your 
> first example. Similarly "drrrrrrg" matches the third line.
> 

Yes, but, using meta rules for scoring and assuming we're not talking
about binary data, if I don't want

           (HOT && DRIPPING && WOMEN)

should I want

           (HOT && DRRRRRRG && WOMEN)

?

I wouldn't score this high enough to reject the message by itself, but
when combined with all the other SA rules might it not be a indicator
worthy of some scoring?


-- Mike


---------------------------------------------------
This email from dns has been validated by dnsMSS Managed Email Security and is 
free from all known viruses.

For further information contact [EMAIL PROTECTED]




Reply via email to