On Sat, 2009-10-17 at 17:37 -0400, "Alex" wrote:
> > > rawbody  __CCM_UNSUB 
> > > /"https?:..visitor\.constantcontact.com\/[^<>]{60,200}>SafeUnsubscribe</
> >
> > Ouch!  Rawbody, that hurts.
> 
> Do you mean that it's much more resource-intensive than a regular
> "body" check?

You can't use body rules here -- the difference between rawbody and body
is, that HTML tags(!) and line breaks are removed before matching for
body rules. See the M::SA::Conf docs.

What I mean is, that URIDetail will be faster than the equivalent
rawbody rule. All URIs have already been parsed out, along with some
details. This holds especially true with large-ish text parts.

> When is it necessary (or possible) to use it over the
> URIDetail substitute you mentioned?

Possible always. Necessary only, in case some vital parts you need to
match on are not provided by URIdetail. But that should be kind of, err,
obvious, no?


> rawbody    DDN_SPAM_3   /\/.{5}\-.{4}\-.{3}\/.{5}\-.{4}\-.{3}\-1\.jpg" 
> border=0\>\<\/a\>\<br\>/

Argh. Neither a dash, nor angle brackets need to be escaped. It just
makes reading the RE harder. Speaking of which...

I you want to use the dash in your RE, use the general m// with a
different delimiter. The // is just a shorthand for m// with its purpose
defeated by introducing fences. (No, I am not getting tired of repeating
this advice. Never.)

m~^http://[^/]{1,5}/~ is equivalent to /^http:\/\/[^\/]{1,5}\//, though
only one of them is easily readable. ;)


> Is there a way to easily measure the overhead of a particular rule?

Other than common sense and woodoo?  No. :)

It depends on the RE (Is it properly anchored? Does it backtrack?) and
the specific test case you are evaluating, including the message's
text-parts size.

More specifically, as an example, this is a common source for false
security bugs filed, triggered by a self-DoS RE and a pathetic edge-case
message. The RE rule is fine processing hundreds of thousands of mails
without imposing any noticeable impact -- until that one, legit,
horribly broken HTML format mail comes along, bringing the server down
to its knees by backtracking the hell out of the poor RE engine.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to