On Sat, 2009-10-17 at 17:37 -0400, "Alex" wrote: > > > rawbody __CCM_UNSUB > > > /"https?:..visitor\.constantcontact.com\/[^<>]{60,200}>SafeUnsubscribe</ > > > > Ouch! Rawbody, that hurts. > > Do you mean that it's much more resource-intensive than a regular > "body" check?
You can't use body rules here -- the difference between rawbody and body is, that HTML tags(!) and line breaks are removed before matching for body rules. See the M::SA::Conf docs. What I mean is, that URIDetail will be faster than the equivalent rawbody rule. All URIs have already been parsed out, along with some details. This holds especially true with large-ish text parts. > When is it necessary (or possible) to use it over the > URIDetail substitute you mentioned? Possible always. Necessary only, in case some vital parts you need to match on are not provided by URIdetail. But that should be kind of, err, obvious, no? > rawbody DDN_SPAM_3 /\/.{5}\-.{4}\-.{3}\/.{5}\-.{4}\-.{3}\-1\.jpg" > border=0\>\<\/a\>\<br\>/ Argh. Neither a dash, nor angle brackets need to be escaped. It just makes reading the RE harder. Speaking of which... I you want to use the dash in your RE, use the general m// with a different delimiter. The // is just a shorthand for m// with its purpose defeated by introducing fences. (No, I am not getting tired of repeating this advice. Never.) m~^http://[^/]{1,5}/~ is equivalent to /^http:\/\/[^\/]{1,5}\//, though only one of them is easily readable. ;) > Is there a way to easily measure the overhead of a particular rule? Other than common sense and woodoo? No. :) It depends on the RE (Is it properly anchored? Does it backtrack?) and the specific test case you are evaluating, including the message's text-parts size. More specifically, as an example, this is a common source for false security bugs filed, triggered by a self-DoS RE and a pathetic edge-case message. The RE rule is fine processing hundreds of thousands of mails without imposing any noticeable impact -- until that one, legit, horribly broken HTML format mail comes along, bringing the server down to its knees by backtracking the hell out of the poor RE engine. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}