Re: body speedups using new features in perl 5.9.x

David Landgren Wed, 12 Jul 2006 09:13:41 -0700

Bowie Bailey wrote:

[EMAIL PROTECTED] wrote:

While I doubt it'd have quite the performance gains that A-C can
offer, Regexp::Assemble certainly sounds like something worth
trying...the coderef trick, in particular, is very nifty.

Forgot to mention in the other thread I just replied to, if you'vedownloaded the package, look at eg/ircwatcher for a slightly mindlessdemo of the tracked mode. If you have a copy of O'Reilly's _Perl Hacks_,a much more fleshed-out demo appears in there.

It can work well.  After reading about it here, I tried it on one of
my programs that compares about 1600 words and phrases against a
document.  My scan time dropped by 30%.  This doesn't count the time
taken to assemble the regex (about .27 seconds), but since this
program runs as a daemon and only has to do the assembly once, it
wasn't relevant to me.


Here's some background that people may find interesting.

I have a Postfix access map that is an assembly of currently 4145patterns, that correspond to residential broadband DNS names.


Patterns like

        \d+-\d+-\d+-\d+\.netabc\.com\.br
        a\d+[abc]\d+\.neo\.lrun\.com
        \d+\.\d+\.\d+\.\d+\.adsl\.abc\.tiscali\.dk

to match DNS names like

        host217-34-41-132.in-addr.btopenworld.com
        dsl-200-67-157-162.prodigy.net.mx
        host80-39.pool212171.interbusiness.it
        bgp01069788bgs.vnburn01.mi.comcast.net
        cpe-68-112-253-235.ma.charter.com
        adsl-68-73-64-222.dsl.klmzmi.ameritech.net

At first this was to catch spam, now I'm happy that an unexpectedside-effect is that it discards connections during virus storms. I nevereven accept the DATA, much less overload my AV scanner.

Anyway, when I started out, I noticed the performance of the Postfixserver dropping through the floor. So I wound up writingRegexp::Assemble. Now, instead of going through a list of patterns, itgoes through one. (I had to recompile pcre and up the LINK_TYPE #defineso that pcre could compile the pattern).


Running a test on a small (1000) sample of host names speaks eloquently:

% perl5.9.4 racmp host.1k
assembled 4145 patterns in 3.83324813842773 seconds
R::A: good = 971, bad = 29 in 0.0148990154266357 seconds
list: good = 971, bad = 29 in 5.72843599319458 seconds
 A_C: good = 971, bad = 29 in 8.56000709533691 seconds
RA  len: 87491
A_C len: 174644

That is, the assembled approach runs in under 1/500th of the time oflooping through the list of REs. A_C is even worse, but given that thepattern is over twice as long, and chock full of metacharacters Isuppose this shouldn't come as a surprise but it does seem odd. I'llcheck back with Yves and see if my methodology is sane there.


David
--

Much of the propaganda that passes for news in our own society is givento immobilising and pacifying people and diverting them from the ideathat they can confront power. -- John Pilger

Re: body speedups using new features in perl 5.9.x

Reply via email to