Dnia 2009-06-19, pią o godzinie 09:45 -0700, John Hardin pisze: > On Fri, 2009-06-19 at 09:24 -0700, John Hardin wrote: > > On Fri, 2009-06-19 at 16:21 +0200, Paweł Tęcza wrote: > > > > > > >> body AE_MEDS35 /w{2,4}\s{0,4}meds\d{1,4}\s{0,4}(?:net|com|org)/ > > > > > > I've just noticed "missing" 'i' switch for your rule regexp. Is it a bug > > > or a feature? :) > > > > That depends. If the URIs are always lowercasein the spams, making the > > RE case-insensitive doesn't help and may hurt.
Hi John, I could see only lowercase URIs, but I rather prefer case-insensitive rules. Simply I don't want to get a lot of spam, because the spammer read that thread and changed only one letter :) > > > BTW, probably \s+ will be better than \s{0,4}. Similarly with w{2,4} and > > > \d{1,4}. > > > > No, it's not. In SA, unbounded matches are hazardous and should be > > avoided. {0,20} is safer than * and {1,20} is safer than +. > > > > This is not a general rule, it only applies where the text being scanned > > is from an untrusted (and possibly actively hostile) source. > > > > Another improvement: add word boundaries at the beginning and end: > > > > /\bw{2,4}\s{0,10}meds\d{1,4}\s{0,10}(?:net|com|org)\b/ Thanks a lot for your tips! It's next valuable lesson for me today :) > > If the parentheses in the original example are actually in the message, > > including them will help to. Are they actually in the message? Yes, I can see the parentheses in all the spam messages I received. But spammers can remove them soon, of course. > D'oh, /me checks pastebins from first message... > > Also, body rules match cleaned-up text with runs of spaces collapsed, so > you don't need to use + or {1,...} > > Try this: > > /\(\s?w{2,4}\smeds\d{1,4}\s(?:net|com|org)\s?\)/ Yes, I noticed it when I was testing my own rule: [1438] dbg: rules: ran body rule LOCAL_BODY_WWW_MEDSXX_NET ======> got hit: "(www meds88 net)" My best regards, Pawel