Re: Rule Design Benchmark/Resource Question

Matt Kettler 31 Mar 2005 22:16:47 -0000

Rocky Olsen wrote:

>Before i pull my hair out doing bench/resource test, i was wondering if
>anyone out there knew if there was much of a speed/resource usage
>difference between the following way of writing the same rule.
>
>
>Method A:
>body   rule_a          /(?:feh|meh|bleh)/i
>
>vs.
>
>Method B:
>
>bod            __rule_a        /(?:feh)/i
>body   __rule_b        /(?:meh)/i
>body   __rule_c        /(?:bleh)/i
>
>meta   rule_d          (__rule_a || __rule_b || __rule_c)
>
>
>There probably isn't much difference using just 3 rules, but i'm thinking
>more along the lines of large(500+) lists and it isn't limited to just body
>stuff.  So if anyone has some realworld benching/experience with what is
>preferred or if the developers know which is faster for SA, i would love
>the input.
>  
>


To start with, use perl's regex debugger as your friend:

$perl -Mre=debug -e  "/(?:feh|meh|bleh)/i"
size 11 Got 92 bytes for offset annotations.

$ perl -Mre=debug -e  "/(?:feh)/i"
Freeing REx: `","'
Compiling REx `(?:feh)'
size 3 Got 28 bytes for offset annotations.

(repeat 2 times)

However, this only deals with part of the story. The cost of the regex
itself. It does not deal with the per-rule overhead in SA.

In general I'd favor the combined approach, unless for some reason your
combined rule is considerably larger than the sum of it's parts. Bigevil
ran much better once Chris S did some combining and common subexpression
elimination.




Also, I'd suggest eliminating the (?:) for the single-text-matches. It
does nothing of use, and doesn't change the evaluation of the regex any
for a simple single text match. All it does is waste 4 bytes of disk
space per rule.

body __RULE_A   /feh/i

instead of:
body __RULE_A   /(?:feh)/i

I leave comparing the two using re=debug as an exercise for the student.
Also compare to /(feh)/i and /(feh)\1/i to see how backtracking works.

Re: Rule Design Benchmark/Resource Question

Reply via email to