>-----Original Message----- >From: Robert Menschel [mailto:[EMAIL PROTECTED] >Sent: Wednesday, August 11, 2004 1:04 AM >To: [EMAIL PROTECTED] >Subject: Header rule performance > > >One of the longer SARE rules in our header rule set is > >header SARE_HEAD_SPAM ALL =~ >/(?:Error-path|Rot|X-(?:BounceTrace|Camp...|ClientHost|cross|Co >ntact|CS-IP|E(?:[Mm]ail)?|Encoding-Version|ENVID|EXP32-SerialNo >|Find|[Ii][Mm]?|INFO_.Z|JLH|L-C|LIDCode|Mailid|MailingID|Messag >e-Info|Misc_ID|mlcipher|mlmsgid|mpm|ms|ntc|PMG-.+|POPFile-Link| >Rec|RMD-Text|SP-Track-ID|srk|Text-Classification|TID|T2-Posting >-ID|Tnz-Problem-Type|Trans|Vig|WCMailID|yd)):/ >describe SARE_HEAD_SPAM Message headers used which >identify spam >score SARE_HEAD_SPAM 2.222 >#stype SARE_HEAD_SPAM spamp >#hist SARE_HEAD_SPAM June 5 2004: Added X-T2-Posting-ID >#hist SARE_HEAD_SPAM Aug 10 2004: Added several >more headers >#counts SARE_HEAD_SPAM 3260s/0h of 58338 corpus >(33610s/24728h RM) 08/07/04 >#counts SARE_HEAD_SPAM 2143s/1h of 32586 corpus >(9341s/23245h JH) 06/10/04 >#counts SARE_HEAD_SPAM 731s/3h of 17050 corpus >(14617s/2433h MY) 08/08/04 > >An alternative form of this same rule would be: >header __SARE_HEAD_SPAM_01 exists:Error-path >header __SARE_HEAD_SPAM_02 exists:Rot > ... >header __SARE_HEAD_SPAM_xx exists:X-T2-Posting-I > ... >meta SARE_HEAD_SPAM __SARE_HEAD_SPAM_01 || >__SARE_HEAD_SPAM_02 || ... > >I suspect that the "exists" version would be more efficient, use less >resources. I guess this because I believe SA identifies the headers as >or shortly after it first reads the email, and each "exists" test is >simply a boolean "have we seen it?", while the regex at the >top requires >a full scan of the headers to see if any of them match. > >Can anyone confirm this? If so, I'll rework SARE_HEAD_SPAM into an >"exists" format for the release expected shortly. > >Bob Menschel
I believe this is true. exists: is much faster. But isn't that just for the header name, and no the contents in that header? I'm trying to remember this rule. But it looks like it also looks for the contenets within that header. Which I don't think exists: checks for. But I could be wrong. --Chris (This week I'm addicted to Paris Web radio! Ecoute!)
