On Fri, 10 Jul 2009, Daniel Schaefer wrote:

Gerry Maddock wrote:
> >  McDonald, Dan wrote:
> >
> >  body DRUG_SITE /www(\.|\
> > ) *(med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)[0-9]{2}(\.|\ > > ) )*(net|com)/ > > You should avoid the use of *, as it allows spammers to consume all > of your memory and cpu. limit it using the {} syntax. You also > should tell perl to not keep the results of your () with (?:\.|\ ) > instead of (\.|\ ). And with single characters, the [ab] syntax is > faster to process than (?:a|b).

 Perhaps you could attach an example showing exactly what your stating
 for this rule?

This is my new rule. I think this is what he means:

body DRUG_SITE /www[\.\ ] *(?:med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)[0-9]{2}[\.\ *(?:net|com)/

You missed some of the suggestions.

Try this:

body DRUG_SITE 
/\bwww[.\s]{1,3}(?:med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)\d{2}[.\s]{1,3}(?:net|com)\b/

Also, if the spammers start registering three-digit domain names, this will start missing. Something like \d{2,5} would be better.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Gun Control laws cannot reduce violent crime, because gun control
  laws focus obsessively on a tool a criminal might use to commit a
  crime rather than the criminal himself and his act of violence.
-----------------------------------------------------------------------
 10 days until the 40th anniversary of Apollo 11 landing on the Moon

Reply via email to