On Fri, 10 Jul 2009, Daniel Schaefer wrote:
Gerry Maddock wrote:
> > McDonald, Dan wrote:
> >
> > body DRUG_SITE /www(\.|\
> > ) *(med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)[0-9]{2}(\.|\
> > ) )*(net|com)/
>
> You should avoid the use of *, as it allows spammers to consume all
> of your memory and cpu. limit it using the {} syntax. You also
> should tell perl to not keep the results of your () with (?:\.|\ )
> instead of (\.|\ ). And with single characters, the [ab] syntax is
> faster to process than (?:a|b).
Perhaps you could attach an example showing exactly what your stating
for this rule?
This is my new rule. I think this is what he means:
body DRUG_SITE /www[\.\
] *(?:med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)[0-9]{2}[\.\ *(?:net|com)/
You missed some of the suggestions.
Try this:
body DRUG_SITE
/\bwww[.\s]{1,3}(?:med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)\d{2}[.\s]{1,3}(?:net|com)\b/
Also, if the spammers start registering three-digit domain names, this
will start missing. Something like \d{2,5} would be better.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Gun Control laws cannot reduce violent crime, because gun control
laws focus obsessively on a tool a criminal might use to commit a
crime rather than the criminal himself and his act of violence.
-----------------------------------------------------------------------
10 days until the 40th anniversary of Apollo 11 landing on the Moon