On Thu, 04 Dec 2003 11:43:30 -0800, Greg Webster <[EMAIL PROTECTED]> writes:

> Seems like it would be much better to simplify and shorten these rules
> with better regexp.
> 
> Samples:

> rawbody BigEvilList_22 
> /\b(?:agnitum\.com|ahamembership\.com|aicpa-eca\.org|aic
> pa\.org|aih01\.com|ai\.hitbox\.com|AIRMARCH\.COM|AIRSHADE\.COM|ajc\.com|akss\.or
> g|albuminfo\.org|alertquotes\.com|alfy\.com)\b/i
> describe BigEvilList_22 Generated BigEvilList_22

If the rules look like this (abc|aef|agh), then you should get greater
performance factoring the 'a' out of the expression. a(bc|ef|gh)
Because this means it can bail out fast if the string doesn't start
with an $a$. There might be an optimization in the re engine to
autodetect this, but doing it manually won't hurt.

Also doing additional factoring may be a win:

  hotbox|hoturls|hotgyrls|hotlemons|hotstocks|honestmerchangs|happymerchants

-->

  h(ot(box|urls|gyrls|lemons|stocks)|onestemerchangs|appymerchants)

Factor out the h so that it can do a prefix-reject quickly, and then
factor out the 'ot' so that it won't check 'hox' against 'hotbox'
.. 'hotstocks'.


Scott


-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to