I see the logic you are adopting, but unfortunately it doesn't quite pan out.
Take the 4th example you provided. Here you acknowledge that while enunciating is not an anagram of ejaculating, it is still a possible outcome from your set. Mathematically the problem faced is this: Writing the anagrams out specifically for a 5 letter set gives: 5*4*3*2*1 = 120 variations Working out all the variations using the strategy you list below is 5*5*5*5*5 = 3125 variations. While most of these 3125 variations are likely not to be words, you need to check them all, not only in english but in other languages to ensure that there are no FPs. Assuming only 1% of the choices work as words, there are still 30 words to list, which isn't actually that great an improvement given the amount of set up time it costs you. The more letters you add the more obvious the problem becomes. The simplest way to think about a regular expression is like a flow chart without variables. If you can draw out a flow chart using each character in sequence as an input, then it can be made into a regular expression. For an anagram, it can't be. In effect, an anagram is a one-way function (anyone care to speculate on its use a crypto one way function?). For more info on Finite state machines see http://en.wikipedia.org/wiki/Finite_state_machine I'll refrain from discussing this any further on the list because I don't want to point out anything of further use to spammers, so if anyone wants to talk about this more, mail me privately, and we can hit reply all (if I know who you are :) ) R -----Original Message----- From: Mike Grau [mailto:[EMAIL PROTECTED] Sent: 28 February 2005 17:55 To: [email protected] Subject: Re: Rule advice please > <SNIP> > > subject =~ /\b(?!cartoon|croatan|carroon)c[arto]{5}n\b/i > subject =~ /\b(?!downloadable)d[ownladb]{10}e\b/i > subject =~ /\b(?!dripping)d[ripn]{6}g\b/i > subject =~ /\b(?!ejaculating|enunciating)e[jacultin]{9}g\b/i > > You can't use rules like this. The pattern "caaaaan" matches your > first example. Similarly "drrrrrrg" matches the third line. > Yes, but, using meta rules for scoring and assuming we're not talking about binary data, if I don't want (HOT && DRIPPING && WOMEN) should I want (HOT && DRRRRRRG && WOMEN) ? I wouldn't score this high enough to reject the message by itself, but when combined with all the other SA rules might it not be a indicator worthy of some scoring? -- Mike --------------------------------------------------- This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses. For further information contact [EMAIL PROTECTED]
