>-----Original Message-----
>From: Robert Menschel [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, August 11, 2004 1:04 AM
>To: [EMAIL PROTECTED]
>Subject: Header rule performance
>
>
>One of the longer SARE rules in our header rule set is
>
>header    SARE_HEAD_SPAM           ALL =~ 
>/(?:Error-path|Rot|X-(?:BounceTrace|Camp...|ClientHost|cross|Co
>ntact|CS-IP|E(?:[Mm]ail)?|Encoding-Version|ENVID|EXP32-SerialNo
>|Find|[Ii][Mm]?|INFO_.Z|JLH|L-C|LIDCode|Mailid|MailingID|Messag
>e-Info|Misc_ID|mlcipher|mlmsgid|mpm|ms|ntc|PMG-.+|POPFile-Link|
>Rec|RMD-Text|SP-Track-ID|srk|Text-Classification|TID|T2-Posting
>-ID|Tnz-Problem-Type|Trans|Vig|WCMailID|yd)):/
>describe  SARE_HEAD_SPAM           Message headers used which 
>identify spam
>score     SARE_HEAD_SPAM           2.222
>#stype    SARE_HEAD_SPAM           spamp 
>#hist     SARE_HEAD_SPAM           June 5 2004: Added X-T2-Posting-ID
>#hist     SARE_HEAD_SPAM           Aug 10 2004: Added several 
>more headers
>#counts   SARE_HEAD_SPAM           3260s/0h of 58338 corpus 
>(33610s/24728h RM) 08/07/04
>#counts   SARE_HEAD_SPAM           2143s/1h of 32586 corpus 
>(9341s/23245h JH) 06/10/04
>#counts   SARE_HEAD_SPAM           731s/3h of 17050 corpus 
>(14617s/2433h MY) 08/08/04
>
>An alternative form of this same rule would be:
>header    __SARE_HEAD_SPAM_01      exists:Error-path
>header    __SARE_HEAD_SPAM_02      exists:Rot
>  ...
>header    __SARE_HEAD_SPAM_xx      exists:X-T2-Posting-I
>  ...
>meta      SARE_HEAD_SPAM           __SARE_HEAD_SPAM_01 || 
>__SARE_HEAD_SPAM_02 || ...
>
>I suspect that the "exists" version would be more efficient, use less
>resources.  I guess this because I believe SA identifies the headers as
>or shortly after it first reads the email, and each "exists" test is
>simply a boolean "have we seen it?", while the regex at the 
>top requires
>a full scan of the headers to see if any of them match.
>
>Can anyone confirm this?  If so, I'll rework SARE_HEAD_SPAM into an
>"exists" format for the release expected shortly.
>
>Bob Menschel

I believe this is true. exists: is much faster. But isn't that just for the
header name, and no the contents in that header? I'm trying to remember this
rule. But it looks like it also looks for the contenets within that header.
Which I don't think exists: checks for.

But I could be wrong. 

--Chris (This week I'm addicted to Paris Web radio! Ecoute!)

Reply via email to