Re: [Templates] New text.replace virtual method for v2.15

Josh Rosenbaum Thu, 02 Feb 2006 17:21:26 -0800

Andy Wardley wrote:

What a great team effort!  :-)


------------------------------------------------------------------

Hey, who knows how long it'll be until the next version. :P We want toget this one right! ;) I suppose it probably wouldn't be as bad to get

another version going after having touched this version. The hard part

is picking up the project after not touching it for a while. (Moretext from me follows below.)

------------------------------------------------------------------

It's based on Paul's most recent example, re-written to use a closure toavoid the duplication, and adding Sergey's handling of escaped characters.It passes all the tests in Sergey's test suite (which have also beenadded to t/vmethod/replace.t). Not bad for 23 lines of code!
    'replace' => sub {
        my ($text, $pattern, $replace, $global) = @_;
        $text    = '' unless defined $text;
        $pattern = '' unless defined $pattern;
        $replace = '' unless defined $replace;
        $global  = 1  unless defined $global;
        my $expand = sub {
            my ($chunk, $start, $end) = @_;
            $chunk =~ s{ \\(\\|\$) | \$ (\d+) }{
                $1 ? $1
: ($2 > $#$start || $2 == 0) ? '': substr($text, $start->[$2], $end->[$2] - $start->[$2]);
            }exg;
            $chunk;
        };
        if ($global) {
            $text =~ s{$pattern}{ &$expand($replace, [EMAIL PROTECTED], [EMAIL 
PROTECTED]) }eg;
}else {
            $text =~ s{$pattern}{ &$expand($replace, , [EMAIL PROTECTED], 
[EMAIL PROTECTED]) }e;
        }
        return $text;
    },


------------------------------------------------------------------

Hmm, looks good. I like it. I'm not too fond of the sub being usedmainly because it's going to be called over and over again, possibly alot. (I know it's readability vs. speed, but in this case I'd ratherhave speed.) Also, this doesn't check to see if we even need backrefs,which is a minimal cost which gives HUGE gains when there are nobackrefs. (#2 below is still faster or on par with #1 with this checkand removing the sub. Although I would guess that a megabyte sizedreplace might change that, but I consider that a rare case.)

I'm attaching a new test file you can run along with a file ofbenchmarks. This new and improved test file runs through 5 or so testsby me, and all the tests by Sergey and benches each replace function.There are three replace functions in there:

1) This standard one. It's called replace_andy

2) replace_andy_with_patch - My patch without the sub, and usingbackref check.3) Another replace function I worked out called 'replace_josh_new2'.This one seems to do pretty well. Of course it has it's trade-offs aswell. It uses symbolic refs and "no strict 'refs'" for a bit. It alsocould be memory costly, because it actually stores the matches into ahash. In these particular tests it seems to do the best overall,though. I'd suggest #2 mostly to not have to worry about memory problems.


Here's the "overall totals" output of my last benchmark:
Totals:
                           Rate replace_andy
replace_andy            13670/s           --
replace_andy_with_patch 19290/s          41%
replace_josh_new2       24466/s          79%

I'd take this "totals" bench with a grain of salt, though, and insteadcheck out the individual results which are much more interesting.


-- Josh


Running test1 'alskjdfkas dfklasdj fa sdkfjalsdf afjkla...' =~ 
s{(sfdklasjf)}{$1 josh $1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  4 wallclock secs ( 4.06 usr +  0.00 sys =  4.06 CPU) @ 
24630.54/s (n=100000)
replace_andy_with_patch:  4 wallclock secs ( 3.70 usr +  0.00 sys =  3.70 CPU) 
@ 27027.03/s (n=100000)
replace_josh_new2:  3 wallclock secs ( 2.98 usr +  0.01 sys =  2.99 CPU) @ 
33444.82/s (n=100000)
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            24631/s           --                     -9%            
  -26%
replace_andy_with_patch 27027/s          10%                      --            
  -19%
replace_josh_new2       33445/s          36%                     24%            
    --


Running test2 'alskjdfkas dfklasdj fa sdkfjalsdf afjkla...' =~ s{(sfdklasjf)}{a 
josh b}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  3 wallclock secs ( 2.50 usr +  0.00 sys =  2.50 CPU) @ 
40000.00/s (n=100000)
replace_andy_with_patch:  0 wallclock secs ( 0.61 usr +  0.00 sys =  0.61 CPU) 
@ 163934.43/s (n=100000)
replace_josh_new2: -1 wallclock secs ( 0.65 usr +  0.00 sys =  0.65 CPU) @ 
153846.15/s (n=100000)
                            Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy             40000/s           --                    -76%           
   -74%
replace_andy_with_patch 163934/s         310%                      --           
     7%
replace_josh_new2       153846/s         285%                     -6%           
     --


Running test3 'alskjdfkas sdfjklsdfkjsdf ksjf sdlfjafkj...' =~ s{([fl])}{this 
is going to be a longer replace string to test "$1" if there is a difference in 
longer replace}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy: 50 wallclock secs (49.26 usr +  0.02 sys = 49.28 CPU) @ 2029.22/s 
(n=100000)
replace_andy_with_patch: 52 wallclock secs (51.00 usr +  0.05 sys = 51.05 CPU) 
@ 1958.86/s (n=100000)
replace_josh_new2: 39 wallclock secs (38.74 usr +  0.06 sys = 38.80 CPU) @ 
2577.32/s (n=100000)
                          Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            2029/s           --                      4%             
 -21%
replace_andy_with_patch 1959/s          -3%                      --             
 -24%
replace_josh_new2       2577/s          27%                     32%             
   --


Running test4 'alskjdfkas sdfjklsdfkjsdf ksjf sdlfjafkj...' =~ s{([fl])}{this 
is going to be a longer replace string to test if there is a difference in 
longer replace}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy: 34 wallclock secs (33.34 usr +  0.01 sys = 33.35 CPU) @ 2998.50/s 
(n=100000)
replace_andy_with_patch:  2 wallclock secs ( 1.96 usr +  0.00 sys =  1.96 CPU) 
@ 51020.41/s (n=100000)
replace_josh_new2:  2 wallclock secs ( 1.97 usr +  0.00 sys =  1.97 CPU) @ 
50761.42/s (n=100000)
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy             2999/s           --                    -94%            
  -94%
replace_andy_with_patch 51020/s        1602%                      --            
    1%
replace_josh_new2       50761/s        1593%                     -1%            
    --


Running test5 'Total: $5.55' =~ s{$5.55}{\$6.55}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  1 wallclock secs ( 0.78 usr +  0.00 sys =  0.78 CPU) @ 
128205.13/s (n=100000)
replace_andy_with_patch:  1 wallclock secs ( 0.60 usr +  0.00 sys =  0.60 CPU) 
@ 166666.67/s (n=100000)
replace_josh_new2:  1 wallclock secs ( 0.64 usr +  0.00 sys =  0.64 CPU) @ 
156250.00/s (n=100000)
                            Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            128205/s           --                    -23%           
   -18%
replace_andy_with_patch 166667/s          30%                      --           
     7%
replace_josh_new2       156250/s          22%                     -6%           
     --


Running test6 'Total: $5.55' =~ s{$5.55}{\$6.55}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  1 wallclock secs ( 0.82 usr +  0.00 sys =  0.82 CPU) @ 
121951.22/s (n=100000)
replace_andy_with_patch:  1 wallclock secs ( 0.59 usr +  0.00 sys =  0.59 CPU) 
@ 169491.53/s (n=100000)
replace_josh_new2:  0 wallclock secs ( 0.64 usr + -0.01 sys =  0.63 CPU) @ 
158730.16/s (n=100000)
                            Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            121951/s           --                    -28%           
   -23%
replace_andy_with_patch 169492/s          39%                      --           
     7%
replace_josh_new2       158730/s          30%                     -6%           
     --


Running test7 'Total: $5.55' =~ s{$5.55}{\\$6.55}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy: -1 wallclock secs ( 0.83 usr +  0.00 sys =  0.83 CPU) @ 
120481.93/s (n=100000)
replace_andy_with_patch: -1 wallclock secs ( 0.57 usr +  0.00 sys =  0.57 CPU) 
@ 175438.60/s (n=100000)
replace_josh_new2:  1 wallclock secs ( 0.92 usr +  0.00 sys =  0.92 CPU) @ 
108695.65/s (n=100000)
                            Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            120482/s           --                    -31%           
    11%
replace_andy_with_patch 175439/s          46%                      --           
    61%
replace_josh_new2       108696/s         -10%                    -38%           
     --


Running Sergey Test 0. 'foo bar' =~ s{foo}{bar}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  3 wallclock secs ( 2.07 usr +  0.00 sys =  2.07 CPU) @ 
48309.18/s (n=100000)
replace_andy_with_patch:  0 wallclock secs ( 0.39 usr +  0.00 sys =  0.39 CPU) 
@ 256410.26/s (n=100000)
            (warning: too few iterations for a reliable count)
replace_josh_new2:  1 wallclock secs ( 0.40 usr +  0.00 sys =  0.40 CPU) @ 
250000.00/s (n=100000)
                            Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy             48309/s           --                    -81%           
   -81%
replace_andy_with_patch 256410/s         431%                      --           
     3%
replace_josh_new2       250000/s         417%                     -2%           
     --


Running Sergey Test 1. 'foo bar foo' =~ s{(?i)FOO}{zoo}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  3 wallclock secs ( 2.77 usr +  0.00 sys =  2.77 CPU) @ 
36101.08/s (n=100000)
replace_andy_with_patch:  0 wallclock secs ( 0.40 usr +  0.00 sys =  0.40 CPU) 
@ 250000.00/s (n=100000)
replace_josh_new2:  1 wallclock secs ( 0.42 usr +  0.00 sys =  0.42 CPU) @ 
238095.24/s (n=100000)
                            Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy             36101/s           --                    -86%           
   -85%
replace_andy_with_patch 250000/s         592%                      --           
     5%
replace_josh_new2       238095/s         560%                     -5%           
     --


Running Sergey Test 2. 'foo $1 bar' =~ s{(foo)(.*)(bar)}{$1$2$3}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  4 wallclock secs ( 4.65 usr +  0.00 sys =  4.65 CPU) @ 
21505.38/s (n=100000)
replace_andy_with_patch:  3 wallclock secs ( 4.18 usr +  0.00 sys =  4.18 CPU) 
@ 23923.44/s (n=100000)
replace_josh_new2:  4 wallclock secs ( 3.45 usr +  0.01 sys =  3.46 CPU) @ 
28901.73/s (n=100000)
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            21505/s           --                    -10%            
  -26%
replace_andy_with_patch 23923/s          11%                      --            
  -17%
replace_josh_new2       28902/s          34%                     21%            
    --


Running Sergey Test 3. 'foo $1 bar' =~ s{(foo)(.*)(bar)}{$3$2$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  4 wallclock secs ( 4.67 usr +  0.00 sys =  4.67 CPU) @ 
21413.28/s (n=100000)
replace_andy_with_patch:  5 wallclock secs ( 4.17 usr +  0.00 sys =  4.17 CPU) 
@ 23980.82/s (n=100000)
replace_josh_new2:  3 wallclock secs ( 3.45 usr +  0.00 sys =  3.45 CPU) @ 
28985.51/s (n=100000)
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            21413/s           --                    -11%            
  -26%
replace_andy_with_patch 23981/s          12%                      --            
  -17%
replace_josh_new2       28986/s          35%                     21%            
    --


Running Sergey Test 4. 'foo $200bar foobar' =~ s{(f)(o+)}{zoo}
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  3 wallclock secs ( 2.51 usr +  0.00 sys =  2.51 CPU) @ 
39840.64/s (n=100000)
replace_andy_with_patch:  1 wallclock secs ( 0.50 usr +  0.00 sys =  0.50 CPU) 
@ 200000.00/s (n=100000)
replace_josh_new2:  0 wallclock secs ( 0.51 usr +  0.00 sys =  0.51 CPU) @ 
196078.43/s (n=100000)
                            Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy             39841/s           --                    -80%           
   -80%
replace_andy_with_patch 200000/s         402%                      --           
     2%
replace_josh_new2       196078/s         392%                     -2%           
     --


Running Sergey Test 5. 'foo $200bar foobar' =~ s{(f)(o+)}{zoo}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  2 wallclock secs ( 3.75 usr +  0.00 sys =  3.75 CPU) @ 
26666.67/s (n=100000)
replace_andy_with_patch:  1 wallclock secs ( 0.64 usr +  0.00 sys =  0.64 CPU) 
@ 156250.00/s (n=100000)
replace_josh_new2:  1 wallclock secs ( 0.67 usr +  0.00 sys =  0.67 CPU) @ 
149253.73/s (n=100000)
Use of uninitialized value in concatenation (.) or string at (eval 94) line 1.
Use of uninitialized value in concatenation (.) or string at (eval 94) line 1.
Use of uninitialized value in concatenation (.) or string at (eval 94) line 1.
Use of uninitialized value in concatenation (.) or string at (eval 94) line 1.
                            Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy             26667/s           --                    -83%           
   -82%
replace_andy_with_patch 156250/s         486%                      --           
     5%
replace_josh_new2       149254/s         460%                     -4%           
     --


Running Sergey Test 6. 'foo bar foobar' =~ s{(o)|([ar])}{$2!}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy: 16 wallclock secs (16.13 usr +  0.00 sys = 16.13 CPU) @ 6199.63/s 
(n=100000)
replace_andy_with_patch: 16 wallclock secs (15.63 usr +  0.03 sys = 15.66 CPU) 
@ 6385.70/s (n=100000)
replace_josh_new2:  8 wallclock secs ( 9.45 usr +  0.01 sys =  9.46 CPU) @ 
10570.82/s (n=100000)
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy             6200/s           --                     -3%            
  -41%
replace_andy_with_patch  6386/s           3%                      --            
  -40%
replace_josh_new2       10571/s          71%                     66%            
    --


Running Sergey Test 7. 'foo fgoo foooo bar' =~ 
s{((?:f([^o]*)(o+)\s)+)}{1=$1;2=$2;3=$3;}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  5 wallclock secs ( 5.16 usr +  0.00 sys =  5.16 CPU) @ 
19379.84/s (n=100000)
replace_andy_with_patch:  4 wallclock secs ( 4.73 usr +  0.00 sys =  4.73 CPU) 
@ 21141.65/s (n=100000)
replace_josh_new2:  3 wallclock secs ( 4.06 usr +  0.00 sys =  4.06 CPU) @ 
24630.54/s (n=100000)
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            19380/s           --                     -8%            
  -21%
replace_andy_with_patch 21142/s           9%                      --            
  -14%
replace_josh_new2       24631/s          27%                     17%            
    --


Running Sergey Test 8. 'foo bar' =~ s{(f)(o+)}{$2$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  4 wallclock secs ( 4.03 usr +  0.00 sys =  4.03 CPU) @ 
24813.90/s (n=100000)
replace_andy_with_patch:  4 wallclock secs ( 3.48 usr +  0.01 sys =  3.49 CPU) 
@ 28653.30/s (n=100000)
replace_josh_new2:  3 wallclock secs ( 2.94 usr +  0.00 sys =  2.94 CPU) @ 
34013.61/s (n=100000)
Use of uninitialized value in concatenation (.) or string at (eval 115) line 1.
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            24814/s           --                    -13%            
  -27%
replace_andy_with_patch 28653/s          15%                      --            
  -16%
replace_josh_new2       34014/s          37%                     19%            
    --


Running Sergey Test 9. 'foo bar' =~ s{(f)(o+)}{$20$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  4 wallclock secs ( 3.77 usr +  0.00 sys =  3.77 CPU) @ 
26525.20/s (n=100000)
replace_andy_with_patch:  3 wallclock secs ( 3.32 usr +  0.00 sys =  3.32 CPU) 
@ 30120.48/s (n=100000)
replace_josh_new2:  3 wallclock secs ( 2.96 usr +  0.00 sys =  2.96 CPU) @ 
33783.78/s (n=100000)
Use of uninitialized value in concatenation (.) or string at (eval 122) line 1.
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            26525/s           --                    -12%            
  -21%
replace_andy_with_patch 30120/s          14%                      --            
  -11%
replace_josh_new2       33784/s          27%                     12%            
    --


Running Sergey Test 10. 'foo bar' =~ s{(f)(o+)}{$2$10}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  2 wallclock secs ( 3.79 usr +  0.01 sys =  3.80 CPU) @ 
26315.79/s (n=100000)
replace_andy_with_patch:  2 wallclock secs ( 3.29 usr +  0.00 sys =  3.29 CPU) 
@ 30395.14/s (n=100000)
replace_josh_new2:  3 wallclock secs ( 3.01 usr +  0.00 sys =  3.01 CPU) @ 
33222.59/s (n=100000)
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            26316/s           --                    -13%            
  -21%
replace_andy_with_patch 30395/s          16%                      --            
   -9%
replace_josh_new2       33223/s          26%                      9%            
    --


Running Sergey Test 11. 'foo bar' =~ s{(f)(o+)}{\$2$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  4 wallclock secs ( 3.70 usr +  0.00 sys =  3.70 CPU) @ 
27027.03/s (n=100000)
replace_andy_with_patch:  4 wallclock secs ( 3.38 usr +  0.01 sys =  3.39 CPU) 
@ 29498.53/s (n=100000)
replace_josh_new2:  2 wallclock secs ( 2.71 usr +  0.02 sys =  2.73 CPU) @ 
36630.04/s (n=100000)
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            27027/s           --                     -8%            
  -26%
replace_andy_with_patch 29499/s           9%                      --            
  -19%
replace_josh_new2       36630/s          36%                     24%            
    --


Running Sergey Test 12. 'foo bar' =~ s{(f)(o+)}{x$1\\y$2}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  3 wallclock secs ( 4.35 usr +  0.01 sys =  4.36 CPU) @ 
22935.78/s (n=100000)
replace_andy_with_patch:  4 wallclock secs ( 3.75 usr +  0.00 sys =  3.75 CPU) 
@ 26666.67/s (n=100000)
replace_josh_new2:  3 wallclock secs ( 3.27 usr +  0.00 sys =  3.27 CPU) @ 
30581.04/s (n=100000)
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            22936/s           --                    -14%            
  -25%
replace_andy_with_patch 26667/s          16%                      --            
  -13%
replace_josh_new2       30581/s          33%                     15%            
    --


Running Sergey Test 13. 'foo bar' =~ s{(f)(o+)}{$2\\$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  5 wallclock secs ( 4.16 usr +  0.01 sys =  4.17 CPU) @ 
23980.82/s (n=100000)
replace_andy_with_patch:  4 wallclock secs ( 3.68 usr +  0.00 sys =  3.68 CPU) 
@ 27173.91/s (n=100000)
replace_josh_new2:  3 wallclock secs ( 3.15 usr +  0.00 sys =  3.15 CPU) @ 
31746.03/s (n=100000)
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            23981/s           --                    -12%            
  -24%
replace_andy_with_patch 27174/s          13%                      --            
  -14%
replace_josh_new2       31746/s          32%                     17%            
    --


Running Sergey Test 14. 'foo bar' =~ s{(f)(o+)}{$2\\\$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch, 
replace_josh_new2...
replace_andy:  4 wallclock secs ( 3.78 usr +  0.00 sys =  3.78 CPU) @ 
26455.03/s (n=100000)
replace_andy_with_patch:  3 wallclock secs ( 3.38 usr +  0.00 sys =  3.38 CPU) 
@ 29585.80/s (n=100000)
replace_josh_new2:  3 wallclock secs ( 2.83 usr +  0.00 sys =  2.83 CPU) @ 
35335.69/s (n=100000)
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            26455/s           --                    -11%            
  -25%
replace_andy_with_patch 29586/s          12%                      --            
  -16%
replace_josh_new2       35336/s          34%                     19%            
    --


Totals:
                           Rate replace_andy replace_andy_with_patch 
replace_josh_new2
replace_andy            13670/s           --                    -29%            
  -44%
replace_andy_with_patch 19290/s          41%                      --            
  -21%
replace_josh_new2       24466/s          79%                     27%            
    --

use strict;
use warnings;
use Benchmark qw[timethese];

my @all_times;

my %tests = (
             test1 => {
                       text => "alskjdfkas dfklasdj fa sdkfjalsdf afjklasdfja 
faksldfja sfdklasjf aslfkaj dfklajsdfa sdfjklsdfkjsdf ksjf sdlfjafkjsdflksjfd",
                       pattern => '(sfdklasjf)',
                       replace => '$1 josh $1',
                       global => 1,
                      },

             test2 => {
                       text => "alskjdfkas dfklasdj fa sdkfjalsdf afjklasdfja 
faksldfja sfdklasjf aslfkaj dfklajsdfa sdfjklsdfkjsdf ksjf sdlfjafkjsdflksjfd",
                       pattern => '(sfdklasjf)',
                       replace => 'a josh b',
                       global => 1,
                      },

             test3 => {
                       text => "alskjdfkas sdfjklsdfkjsdf ksjf 
sdlfjafkjsdflksjfd",
                       pattern => '([fl])',
                       replace => 'this is going to be a longer replace string 
to test "$1" if there is a difference in longer replace',
                       global => 1,
                      },


             test4 => { # no backref version
                       text => "alskjdfkas sdfjklsdfkjsdf ksjf 
sdlfjafkjsdflksjfd",
                       pattern => '([fl])',
                       replace => 'this is going to be a longer replace string 
to test if there is a difference in longer replace',
                       global => 1,
                      },

             test5 => { # no backref version
                       text => 'Total: $5.55',
                       pattern => '$5.55',
                       replace => '\$6.55',
                       global => 1,
                      },

             test6 => { # no backref version
                       text => 'Total: $5.55',
                       pattern => '$5.55',
                       replace => '\\$6.55',
                       global => 1,
                      },

             test7 => { # no backref version
                       text => 'Total: $5.55',
                       pattern => '$5.55',
                       replace => '\\\$6.55',
                       global => 1,
                      },

            );



my $serg_tests = [
                  {
                   text => 'foo bar',
                   pattern => 'foo',
                   replace => 'bar',
                  },
                  {
                   text => 'foo bar foo',
                   pattern => '(?i)FOO',
                   replace => 'zoo',
                   global  => 1,
                  },
                  {
                   text => 'foo $1 bar',
                   pattern => '(foo)(.*)(bar)',
                   replace => '$1$2$3',
                  },
                  {
                   text => 'foo $1 bar',
                   pattern => '(foo)(.*)(bar)',
                   replace => '$3$2$1',
                  },
                  {
                   text => 'foo $200bar foobar',
                   pattern => '(f)(o+)',
                   replace => 'zoo',
                   global  => 0,
                  },
                  {
                   text => 'foo $200bar foobar',
                   pattern => '(f)(o+)',
                   replace => 'zoo',
                  },
                  {
                   text => 'foo bar foobar',
                   pattern => '(o)|([ar])',
                   replace => '$2!',
                  },
                  {
                   text => 'foo fgoo foooo bar',
                   pattern => '((?:f([^o]*)(o+)\s)+)',
                   replace => '1=$1;2=$2;3=$3;',
                  },
                  {
                   text => 'foo bar',
                   pattern => '(f)(o+)',
                   replace => '$2$1',
                  },
                  {
                   text => 'foo bar',
                   pattern => '(f)(o+)',
                   replace => '$20$1',
                  },
                  {
                   text => 'foo bar',
                   pattern => '(f)(o+)',
                   replace => '$2$10',
                  },
                  {
                   text => 'foo bar',
                   pattern => '(f)(o+)',
                   replace => '\$2$1',
                  },
                  {
                   text => 'foo bar',
                   pattern => '(f)(o+)',
                   replace => 'x$1\\\\y$2', # this is '$2\\$1'
                  },
                  {
                   text => 'foo bar',
                   pattern => '(f)(o+)',
                   replace => '$2\\\\$1', # this is '$2\\$1'
                  },
                  {
                   text => 'foo bar',
                   pattern => '(f)(o+)',
                   replace => '$2\\\\\\$1', # this is '$2\\\$1'
                  },
                 ];


foreach my $testname(sort keys %tests){
  my $hash = $tests{$testname};
  my $global = $hash->{global};
  $global  = 1  unless defined $global;
  $global = $global ? 'g' : '';
  my $text = $hash->{text};
  if (length $text > 40){
    $text = substr($text, 0, 40) . '...';
  }
  print "\n\nRunning $testname '$text' =~ 
s{$hash->{pattern}}{$hash->{replace}}$global\n";
  do_timings($hash);
}

my $i = 0;
foreach my $hash(@$serg_tests){
  my $global = $hash->{global};
  $global  = 1  unless defined $global;
  $global = $global ? 'g' : '';
  my $text = $hash->{text};
  if (length $text > 40){
    $text = substr($text, 0, 40) . '...';
  }
  print "\n\nRunning Sergey Test $i. '$text' =~ 
s{$hash->{pattern}}{$hash->{replace}}$global\n";
  do_timings($hash, );
  $i++;
}


print "\n\nTotals:\n";
my $total_time = shift @all_times;
while (my $x = shift @all_times){
  foreach my $k(keys %$total_time){
    $total_time->{$k} = Benchmark::timesum($total_time->{$k}, $x->{$k});
  }
}

cmpthese_custom ($total_time);

exit; # all done!


sub do_timings{
  my $current_hash = shift;
  my $text1 = $current_hash->{text};
  my $pattern1 = $current_hash->{pattern};
  my $replace1 = $current_hash->{replace};
  my $global1 = $current_hash->{global};

  my $g  = $global1;
  $g = 1  unless defined $g;
  $g = $g ? 'g' : '';

  my $actual = $text1;
  # don't evaluate pattern, cause we're not going to do that in TT.
  eval '$actual =~ s/$pattern1/' . "$replace1/$g;";


  my %tests = (
               'replace_andy' => sub {
                 my ($text, $pattern, $replace, $global) = ($text1, $pattern1, 
$replace1, $global1);
                 $text    = '' unless defined $text;
                 $pattern = '' unless defined $pattern;
                 $replace = '' unless defined $replace;
                 $global  = 1  unless defined $global;
                 my $expand = sub {
                   my ($chunk, $start, $end) = @_;
                   $chunk =~ s{ \\(\\|\$) | \$ (\d+) }{
                     $1 ? $1
                       : ($2 > $#$start || $2 == 0) ? '' 
                         : substr($text, $start->[$2], $end->[$2] - 
$start->[$2]);
                   }exg;
                   $chunk;
                 };
                 if ($global) {
                   $text =~ s{$pattern}{ &$expand($replace, [EMAIL PROTECTED], 
[EMAIL PROTECTED]) }eg;
                 } else {
                   $text =~ s{$pattern}{ &$expand($replace, , [EMAIL 
PROTECTED], [EMAIL PROTECTED]) }e;
                 }

                 if ($text ne $actual) {
                   die "Unequal replace_andy results!\n|$text|\n|$actual|";
                 }

                 return $text;
               },


               'replace_andy_with_patch' => sub {
                 my ($text, $pattern, $replace, $global) = ($text1, $pattern1, 
$replace1, $global1);
                 $replace = '' unless defined $replace;
                 return $text unless defined $text and defined $pattern;

                 $global  = 1  unless defined $global;

                 my $replace_has_backref;
                 while ($replace =~ m/(\\*) \$\d+/gx) {
                   if (length($1 || '') % 2 == 0) {
                     $replace_has_backref = 1;
                     last;
                   }
                 }

                 if (!$replace_has_backref) {
                   if ($global) {
                     $text =~ s/$pattern/$replace/g;
                   } else {
                     $text =~ s/$pattern/$replace/;
                   }

                   if ($text ne $actual) {
                     die "replace_andy_with_patch results!\n|$text|\n|$actual|";
                   }

                   return $text;
                 }

                 if ($global) {
                   $text =~ s{$pattern}{
                     my ($chunk, $start, $end) = ($replace, [EMAIL PROTECTED], 
[EMAIL PROTECTED]);
                     $chunk =~ s{ \\(\\|\$) | \$ (\d+) }{
                       $1 ? $1
                         : ($2 > $#$start || $2 == 0) ? '' 
                           : substr($text, $start->[$2], $end->[$2] - 
$start->[$2]);
                     }exg;
                     $chunk;
                   }eg;
                 } else {
                   $text =~ s{$pattern}{
                     my ($chunk, $start, $end) = ($replace, [EMAIL PROTECTED], 
[EMAIL PROTECTED]);
                     $chunk =~ s{ \\(\\|\$) | \$ (\d+) }{
                       $1 ? $1
                         : ($2 > $#$start || $2 == 0) ? '' 
                           : substr($text, $start->[$2], $end->[$2] - 
$start->[$2]);
                     }exg;
                     $chunk;
                   }e;
                 }

                 if ($text ne $actual) {
                   die "replace_andy_with_patch results!\n|$text|\n|$actual|";
                 }

                 return $text;
               },


               # this one seems to work pretty good, but requires a no strict 
'refs' section.
               replace_josh_new2 => sub{
                 my ($str, $search, $replace, $global) = ($text1, $pattern1, 
$replace1, $global1);
                 $replace = '' unless defined $replace;
                 return $str unless defined $str and defined $search;

                 $global  = 1  unless defined $global;

                 my %ids_to_store;
                 while ($replace =~ m/(\\*) \$(\d+)/gx) { # allow backslashing.
                   if (length($1 || '') % 2 == 0) {
                     $ids_to_store{$2} = 1;
                   }
                 }
                 if (scalar(keys %ids_to_store) == 0) {
                   if ($global) {
                     $str =~ s/$search/$replace/g;
                   } else {
                     $str =~ s/$search/$replace/;
                   }
                   if ($str ne $actual) {
                     die "Unequal replace_josh_new2 
results!\n|$str|\n|$actual|";
                   }

                   return $str;
                 }

                 no strict 'refs';

                 if ($global) {
                   $str =~ s/$search/
                     foreach (keys %ids_to_store) {
                       $ids_to_store{$_} = ${$_}; # use symbolic ref
                     }

                     my $template = $replace;

                     $template =~ s! \\(\\|\$) | \$ (\d+) !
                       $1 ? $1 : ($ids_to_store{$2} || '');
                     !exg;

                     $template;
                   /ge;
                 } else {
                   $str =~ s/$search/
                     foreach (keys %ids_to_store) {
                       $ids_to_store{$_} = ${$_}; # use symbolic ref
                     }

                     my $template = $replace;
                     $template =~ s! \\(\\|\$) | \$ (\d+) !
                       $1 ? $1 : ($ids_to_store{$2} || '');
                     !exg;

                     $template;
                   /e;
                 }

                 use strict 'refs';


                 if ($str ne $actual) {
                   die "Unequal replace_josh_new2 results!\n|$str|\n|$actual|";
                 }

                 return $str;
               },

              );

  my $results = timethese(100_000, \%tests );

  cmpthese_custom ($results);
  push @all_times, $results;
}









###############################################################
# No need to read below here
###############################################################


### This is from Benchmark.pm. I wanted cmpthese to sort things by key name
### rather than slowest to fastest, which makes it hard to read.
our %_Usage;
sub usage { 
    my $calling_sub = (caller(1))[3];
    $calling_sub =~ s/^Benchmark:://;
    return $_Usage{$calling_sub} || '';
}
$_Usage{cmpthese} = <<'USAGE';
usage: cmpthese($count, { Name1 => 'code1', ... });        or
       cmpthese($count, { Name1 => sub { code1 }, ... });  or
       cmpthese($result, $style);
USAGE

sub cmpthese_custom{
    my ($results, $style);

    if( ref $_[0] ) {
        ($results, $style) = @_;
      }
    else {
        my($count, $code) = @_[0,1];
        $style = $_[2] if defined $_[2];

        die usage unless ref $code eq 'HASH';

        $results = timethese($count, $code, ($style || "none"));
    }

    $style = "" unless defined $style;

    # Flatten in to an array of arrays with the name as the first field
    my @vals = map{ [ $_, @{$results->{$_}} ] } keys %$results;

    for (@vals) {
      # The epsilon fudge here is to prevent div by 0.  Since clock
      # resolutions are much larger, it's below the noise floor.
      my $rate = $_->[6] / (( $style eq 'nop' ? $_->[4] + $_->[5]
                              : $_->[2] + $_->[3]) + 0.000000000000001 );
      $_->[7] = $rate;
    }

    # Sort by rate
    @vals = sort { $a->[0] cmp $b->[0] } @vals;

    # If more than half of the rates are greater than one...
    my $display_as_rate = @vals ? ($vals[$#vals>>1]->[7] > 1) : 0;

    my @rows;
    my @col_widths;

    my @top_row = ( 
        '', 
                   $display_as_rate ? 'Rate' : 's/iter', 
                   map { $_->[0] } @vals 
    );

    push @rows, [EMAIL PROTECTED];
    @col_widths = map { length( $_ ) } @top_row;

    # Build the data rows
    # We leave the last column in even though it never has any data.  Perhaps
    # it should go away.  Also, perhaps a style for a single column of
    # percentages might be nice.
    for my $row_val ( @vals ) {
      my @row;

        # Column 0 = test name
      push @row, $row_val->[0];
      $col_widths[0] = length( $row_val->[0] )
            if length( $row_val->[0] ) > $col_widths[0];

        # Column 1 = performance
      my $row_rate = $row_val->[7];

      # We assume that we'll never get a 0 rate.
      my $rate = $display_as_rate ? $row_rate : 1 / $row_rate;

      # Only give a few decimal places before switching to sci. notation,
      # since the results aren't usually that accurate anyway.
      my $format = 
           $rate >= 100 ? 
                    "%0.0f" : 
                         $rate >= 10 ?
                                  "%0.1f" :
                                       $rate >= 1 ?
                                                "%0.2f" :
                                                     $rate >= 0.1 ?
                                                              "%0.3f" :
                                                                       "%0.2e";

      $format .= "/s"
            if $display_as_rate;

      my $formatted_rate = sprintf( $format, $rate );
      push @row, $formatted_rate;
      $col_widths[1] = length( $formatted_rate )
            if length( $formatted_rate ) > $col_widths[1];

        # Columns 2..N = performance ratios
      my $skip_rest = 0;
      for ( my $col_num = 0 ; $col_num < @vals ; ++$col_num ) {
            my $col_val = $vals[$col_num];
                my $out;
            if ( $skip_rest ) {
              $out = '';
            }
            elsif ( $col_val->[0] eq $row_val->[0] ) {
              $out = "--";
              # $skip_rest = 1;
            }
                else {
                  my $col_rate = $col_val->[7];
                  $out = sprintf( "%.0f%%", 100*$row_rate/$col_rate - 100 );
                      }
                push @row, $out;
                $col_widths[$col_num+2] = length( $out )
                  if length( $out ) > $col_widths[$col_num+2];

                # A little wierdness to set the first column width properly
                $col_widths[$col_num+2] = length( $col_val->[0] )
                  if length( $col_val->[0] ) > $col_widths[$col_num+2];
          }
      push @rows, [EMAIL PROTECTED];
    }

    return [EMAIL PROTECTED] if $style eq "none";

    # Equalize column widths in the chart as much as possible without
    # exceeding 80 characters.  This does not use or affect cols 0 or 1.
    my @sorted_width_refs = 
       sort { $$a <=> $$b } map { \$_ } @col_widths[2..$#col_widths];
    my $max_width = ${$sorted_width_refs[-1]};

    my $total = @col_widths - 1 ;
for ( @col_widths ) { $total += $_ }

STRETCHER:
while ( $total < 80 ) {
  my $min_width = ${$sorted_width_refs[0]};
last
     if $min_width == $max_width;
for ( @sorted_width_refs ) {
      last 
        if $$_ > $min_width;
          ++$$_;
          ++$total;
          last STRETCHER
            if $total >= 80;
    }
    }

    # Dump the output
    my $format = join( ' ', map { "%${_}s" } @col_widths ) . "\n";
    substr( $format, 1, 0 ) = '-';
for ( @rows ) {
  printf $format, @$_;
}

    return [EMAIL PROTECTED] ;
}

Re: [Templates] New text.replace virtual method for v2.15

Reply via email to