Dnia 2009-06-19, pią o godzinie 09:45 -0700, John Hardin pisze:
> On Fri, 2009-06-19 at 09:24 -0700, John Hardin wrote:
> > On Fri, 2009-06-19 at 16:21 +0200, Paweł Tęcza wrote:
> > >
> > > >> body   AE_MEDS35  /w{2,4}\s{0,4}meds\d{1,4}\s{0,4}(?:net|com|org)/
> > >
> > > I've just noticed "missing" 'i' switch for your rule regexp. Is it a bug
> > > or a feature? :)
> > 
> > That depends. If the URIs are always lowercasein the spams, making the
> > RE case-insensitive doesn't help and may hurt.

Hi John,

I could see only lowercase URIs, but I rather prefer case-insensitive
rules. Simply I don't want to get a lot of spam, because the spammer
read that thread and changed only one letter :)

> > > BTW, probably \s+ will be better than \s{0,4}. Similarly with w{2,4} and
> > > \d{1,4}.
> > 
> > No, it's not. In SA, unbounded matches are hazardous and should be
> > avoided. {0,20} is safer than * and {1,20} is safer than +.
> > 
> > This is not a general rule, it only applies where the text being scanned
> > is from an untrusted (and possibly actively hostile) source.
> > 
> > Another improvement: add word boundaries at the beginning and end:
> > 
> >   /\bw{2,4}\s{0,10}meds\d{1,4}\s{0,10}(?:net|com|org)\b/

Thanks a lot for your tips! It's next valuable lesson for me today :)

> > If the parentheses in the original example are actually in the message,
> > including them will help to. Are they actually in the message?

Yes, I can see the parentheses in all the spam messages I received. But
spammers can remove them soon, of course.

> D'oh, /me checks pastebins from first message...
> 
> Also, body rules match cleaned-up text with runs of spaces collapsed, so
> you don't need to use + or {1,...}
> 
> Try this:
> 
>    /\(\s?w{2,4}\smeds\d{1,4}\s(?:net|com|org)\s?\)/

Yes, I noticed it when I was testing my own rule:

[1438] dbg: rules: ran body rule LOCAL_BODY_WWW_MEDSXX_NET ======> got
hit: "(www meds88 net)"

My best regards,

Pawel


Reply via email to