On Tue, Jan 22, 2008 at 05:24:00PM +0000, Justin Mason wrote:
>
>Jim Maul writes:
>> Justin Mason wrote:
>> > John D. Hardin writes:
>> >> On Tue, 22 Jan 2008, George Georgalis wrote:
>> >>
>> >>> On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote:
>> >>>
>> >>>> Neither am I. Another thing to consider is the fraction of defined
>> >>>> rules that actually hit and affect the score is rather small. The
>> >>>> greatest optimization would be to not test REs you know will fail;  
>> >>>> but how do you do *that*?
>> >>> thanks for all the followups on my inquiry. I'm glad the topic is/was
>> >>> considered and it looks like there is some room for development, but
>> >>> I now realize it is not as simple as I thought it might have been.
>> >>> In answer to above question, maybe the tests need their own scoring?
>> >>> eg fast tests and with big spam scores get a higher test score than
>> >>> slow tests with low spam scores.
>> >>>
>> >>> maybe if there was some way to establish a hierachy at startup
>> >>> which groups rule processing into nodes. some nodes finish
>> >>> quickly, some have dependencies, some are negative, etc.
>> >> Loren mentioned to me in a private email: "common subexpressions".
>> >>
>> >> It would be theoretically possible to analyze all the rules in a given
>> >> set (e.g. body rules) to extract common subexpressions and develop a
>> >> processing/pruning tree based on that. You'd probably gain some
>> >> performance scanning messages, but at the cost of how much
>> >> startup/compiling time?
>> > 
>> > I experimented with this concept in my sa-compile work, but I could
>> > achieve any speedup on real-world mixed spam/ham datasets.
>> > 
>> > Feel free to give it a try though ;)
>> > 
>> > --j.
>> > 
>> > 
>> 
>> You do mean *couldn't* achieve any speedup, correct?
>
>yep
>

Just wanted to point out, this topic came out when site dns
cache service started to fail due to excessive dnsbl queries. My
slowdown was due to multiple timeouts and/or delay, probably
related to "answering joe-job rbldns backscatter" -- that's the
reason I was looking for early exit on scans in process.

// George



-- 
George Georgalis, information system scientist <IXOYE><

Reply via email to