Rocky Olsen wrote: >Before i pull my hair out doing bench/resource test, i was wondering if >anyone out there knew if there was much of a speed/resource usage >difference between the following way of writing the same rule. > > >Method A: >body rule_a /(?:feh|meh|bleh)/i > >vs. > >Method B: > >bod __rule_a /(?:feh)/i >body __rule_b /(?:meh)/i >body __rule_c /(?:bleh)/i > >meta rule_d (__rule_a || __rule_b || __rule_c) > > >There probably isn't much difference using just 3 rules, but i'm thinking >more along the lines of large(500+) lists and it isn't limited to just body >stuff. So if anyone has some realworld benching/experience with what is >preferred or if the developers know which is faster for SA, i would love >the input. > >
To start with, use perl's regex debugger as your friend: $perl -Mre=debug -e "/(?:feh|meh|bleh)/i" size 11 Got 92 bytes for offset annotations. $ perl -Mre=debug -e "/(?:feh)/i" Freeing REx: `","' Compiling REx `(?:feh)' size 3 Got 28 bytes for offset annotations. (repeat 2 times) However, this only deals with part of the story. The cost of the regex itself. It does not deal with the per-rule overhead in SA. In general I'd favor the combined approach, unless for some reason your combined rule is considerably larger than the sum of it's parts. Bigevil ran much better once Chris S did some combining and common subexpression elimination. Also, I'd suggest eliminating the (?:) for the single-text-matches. It does nothing of use, and doesn't change the evaluation of the regex any for a simple single text match. All it does is waste 4 bytes of disk space per rule. body __RULE_A /feh/i instead of: body __RULE_A /(?:feh)/i I leave comparing the two using re=debug as an exercise for the student. Also compare to /(feh)/i and /(feh)\1/i to see how backtracking works.