On Thu, 04 Dec 2003 11:43:30 -0800, Greg Webster <[EMAIL PROTECTED]> writes:
> Seems like it would be much better to simplify and shorten these rules > with better regexp. > > Samples: > rawbody BigEvilList_22 > /\b(?:agnitum\.com|ahamembership\.com|aicpa-eca\.org|aic > pa\.org|aih01\.com|ai\.hitbox\.com|AIRMARCH\.COM|AIRSHADE\.COM|ajc\.com|akss\.or > g|albuminfo\.org|alertquotes\.com|alfy\.com)\b/i > describe BigEvilList_22 Generated BigEvilList_22 If the rules look like this (abc|aef|agh), then you should get greater performance factoring the 'a' out of the expression. a(bc|ef|gh) Because this means it can bail out fast if the string doesn't start with an $a$. There might be an optimization in the re engine to autodetect this, but doing it manually won't hurt. Also doing additional factoring may be a win: hotbox|hoturls|hotgyrls|hotlemons|hotstocks|honestmerchangs|happymerchants --> h(ot(box|urls|gyrls|lemons|stocks)|onestemerchangs|appymerchants) Factor out the h so that it can do a prefix-reject quickly, and then factor out the 'ot' so that it won't check 'hox' against 'hotbox' .. 'hotstocks'. Scott ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk