Andy Wardley wrote:
What a great team effort! :-)
------------------------------------------------------------------
Hey, who knows how long it'll be until the next version. :P We want to
get this one right! ;) I suppose it probably wouldn't be as bad to get
another version going after having touched this version. The hard part
is picking up the project after not touching it for a while. (More
text from me follows below.)
------------------------------------------------------------------
It's based on Paul's most recent example, re-written to use a closure to
avoid the duplication, and adding Sergey's handling of escaped characters.
It passes all the tests in Sergey's test suite (which have also been
added to t/vmethod/replace.t). Not bad for 23 lines of code!
'replace' => sub {
my ($text, $pattern, $replace, $global) = @_;
$text = '' unless defined $text;
$pattern = '' unless defined $pattern;
$replace = '' unless defined $replace;
$global = 1 unless defined $global;
my $expand = sub {
my ($chunk, $start, $end) = @_;
$chunk =~ s{ \\(\\|\$) | \$ (\d+) }{
$1 ? $1
: ($2 > $#$start || $2 == 0) ? ''
: substr($text, $start->[$2], $end->[$2] - $start->[$2]);
}exg;
$chunk;
};
if ($global) {
$text =~ s{$pattern}{ &$expand($replace, [EMAIL PROTECTED], [EMAIL
PROTECTED]) }eg;
}
else {
$text =~ s{$pattern}{ &$expand($replace, , [EMAIL PROTECTED],
[EMAIL PROTECTED]) }e;
}
return $text;
},
------------------------------------------------------------------
Hmm, looks good. I like it. I'm not too fond of the sub being used
mainly because it's going to be called over and over again, possibly a
lot. (I know it's readability vs. speed, but in this case I'd rather
have speed.) Also, this doesn't check to see if we even need backrefs,
which is a minimal cost which gives HUGE gains when there are no
backrefs. (#2 below is still faster or on par with #1 with this check
and removing the sub. Although I would guess that a megabyte sized
replace might change that, but I consider that a rare case.)
I'm attaching a new test file you can run along with a file of
benchmarks. This new and improved test file runs through 5 or so tests
by me, and all the tests by Sergey and benches each replace function.
There are three replace functions in there:
1) This standard one. It's called replace_andy
2) replace_andy_with_patch - My patch without the sub, and using
backref check.
3) Another replace function I worked out called 'replace_josh_new2'.
This one seems to do pretty well. Of course it has it's trade-offs as
well. It uses symbolic refs and "no strict 'refs'" for a bit. It also
could be memory costly, because it actually stores the matches into a
hash. In these particular tests it seems to do the best overall,
though. I'd suggest #2 mostly to not have to worry about memory problems.
Here's the "overall totals" output of my last benchmark:
Totals:
Rate replace_andy
replace_andy 13670/s --
replace_andy_with_patch 19290/s 41%
replace_josh_new2 24466/s 79%
I'd take this "totals" bench with a grain of salt, though, and instead
check out the individual results which are much more interesting.
-- Josh
Running test1 'alskjdfkas dfklasdj fa sdkfjalsdf afjkla...' =~
s{(sfdklasjf)}{$1 josh $1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 4 wallclock secs ( 4.06 usr + 0.00 sys = 4.06 CPU) @
24630.54/s (n=100000)
replace_andy_with_patch: 4 wallclock secs ( 3.70 usr + 0.00 sys = 3.70 CPU)
@ 27027.03/s (n=100000)
replace_josh_new2: 3 wallclock secs ( 2.98 usr + 0.01 sys = 2.99 CPU) @
33444.82/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 24631/s -- -9%
-26%
replace_andy_with_patch 27027/s 10% --
-19%
replace_josh_new2 33445/s 36% 24%
--
Running test2 'alskjdfkas dfklasdj fa sdkfjalsdf afjkla...' =~ s{(sfdklasjf)}{a
josh b}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 3 wallclock secs ( 2.50 usr + 0.00 sys = 2.50 CPU) @
40000.00/s (n=100000)
replace_andy_with_patch: 0 wallclock secs ( 0.61 usr + 0.00 sys = 0.61 CPU)
@ 163934.43/s (n=100000)
replace_josh_new2: -1 wallclock secs ( 0.65 usr + 0.00 sys = 0.65 CPU) @
153846.15/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 40000/s -- -76%
-74%
replace_andy_with_patch 163934/s 310% --
7%
replace_josh_new2 153846/s 285% -6%
--
Running test3 'alskjdfkas sdfjklsdfkjsdf ksjf sdlfjafkj...' =~ s{([fl])}{this
is going to be a longer replace string to test "$1" if there is a difference in
longer replace}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 50 wallclock secs (49.26 usr + 0.02 sys = 49.28 CPU) @ 2029.22/s
(n=100000)
replace_andy_with_patch: 52 wallclock secs (51.00 usr + 0.05 sys = 51.05 CPU)
@ 1958.86/s (n=100000)
replace_josh_new2: 39 wallclock secs (38.74 usr + 0.06 sys = 38.80 CPU) @
2577.32/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 2029/s -- 4%
-21%
replace_andy_with_patch 1959/s -3% --
-24%
replace_josh_new2 2577/s 27% 32%
--
Running test4 'alskjdfkas sdfjklsdfkjsdf ksjf sdlfjafkj...' =~ s{([fl])}{this
is going to be a longer replace string to test if there is a difference in
longer replace}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 34 wallclock secs (33.34 usr + 0.01 sys = 33.35 CPU) @ 2998.50/s
(n=100000)
replace_andy_with_patch: 2 wallclock secs ( 1.96 usr + 0.00 sys = 1.96 CPU)
@ 51020.41/s (n=100000)
replace_josh_new2: 2 wallclock secs ( 1.97 usr + 0.00 sys = 1.97 CPU) @
50761.42/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 2999/s -- -94%
-94%
replace_andy_with_patch 51020/s 1602% --
1%
replace_josh_new2 50761/s 1593% -1%
--
Running test5 'Total: $5.55' =~ s{$5.55}{\$6.55}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 1 wallclock secs ( 0.78 usr + 0.00 sys = 0.78 CPU) @
128205.13/s (n=100000)
replace_andy_with_patch: 1 wallclock secs ( 0.60 usr + 0.00 sys = 0.60 CPU)
@ 166666.67/s (n=100000)
replace_josh_new2: 1 wallclock secs ( 0.64 usr + 0.00 sys = 0.64 CPU) @
156250.00/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 128205/s -- -23%
-18%
replace_andy_with_patch 166667/s 30% --
7%
replace_josh_new2 156250/s 22% -6%
--
Running test6 'Total: $5.55' =~ s{$5.55}{\$6.55}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 1 wallclock secs ( 0.82 usr + 0.00 sys = 0.82 CPU) @
121951.22/s (n=100000)
replace_andy_with_patch: 1 wallclock secs ( 0.59 usr + 0.00 sys = 0.59 CPU)
@ 169491.53/s (n=100000)
replace_josh_new2: 0 wallclock secs ( 0.64 usr + -0.01 sys = 0.63 CPU) @
158730.16/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 121951/s -- -28%
-23%
replace_andy_with_patch 169492/s 39% --
7%
replace_josh_new2 158730/s 30% -6%
--
Running test7 'Total: $5.55' =~ s{$5.55}{\\$6.55}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: -1 wallclock secs ( 0.83 usr + 0.00 sys = 0.83 CPU) @
120481.93/s (n=100000)
replace_andy_with_patch: -1 wallclock secs ( 0.57 usr + 0.00 sys = 0.57 CPU)
@ 175438.60/s (n=100000)
replace_josh_new2: 1 wallclock secs ( 0.92 usr + 0.00 sys = 0.92 CPU) @
108695.65/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 120482/s -- -31%
11%
replace_andy_with_patch 175439/s 46% --
61%
replace_josh_new2 108696/s -10% -38%
--
Running Sergey Test 0. 'foo bar' =~ s{foo}{bar}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 3 wallclock secs ( 2.07 usr + 0.00 sys = 2.07 CPU) @
48309.18/s (n=100000)
replace_andy_with_patch: 0 wallclock secs ( 0.39 usr + 0.00 sys = 0.39 CPU)
@ 256410.26/s (n=100000)
(warning: too few iterations for a reliable count)
replace_josh_new2: 1 wallclock secs ( 0.40 usr + 0.00 sys = 0.40 CPU) @
250000.00/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 48309/s -- -81%
-81%
replace_andy_with_patch 256410/s 431% --
3%
replace_josh_new2 250000/s 417% -2%
--
Running Sergey Test 1. 'foo bar foo' =~ s{(?i)FOO}{zoo}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 3 wallclock secs ( 2.77 usr + 0.00 sys = 2.77 CPU) @
36101.08/s (n=100000)
replace_andy_with_patch: 0 wallclock secs ( 0.40 usr + 0.00 sys = 0.40 CPU)
@ 250000.00/s (n=100000)
replace_josh_new2: 1 wallclock secs ( 0.42 usr + 0.00 sys = 0.42 CPU) @
238095.24/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 36101/s -- -86%
-85%
replace_andy_with_patch 250000/s 592% --
5%
replace_josh_new2 238095/s 560% -5%
--
Running Sergey Test 2. 'foo $1 bar' =~ s{(foo)(.*)(bar)}{$1$2$3}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 4 wallclock secs ( 4.65 usr + 0.00 sys = 4.65 CPU) @
21505.38/s (n=100000)
replace_andy_with_patch: 3 wallclock secs ( 4.18 usr + 0.00 sys = 4.18 CPU)
@ 23923.44/s (n=100000)
replace_josh_new2: 4 wallclock secs ( 3.45 usr + 0.01 sys = 3.46 CPU) @
28901.73/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 21505/s -- -10%
-26%
replace_andy_with_patch 23923/s 11% --
-17%
replace_josh_new2 28902/s 34% 21%
--
Running Sergey Test 3. 'foo $1 bar' =~ s{(foo)(.*)(bar)}{$3$2$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 4 wallclock secs ( 4.67 usr + 0.00 sys = 4.67 CPU) @
21413.28/s (n=100000)
replace_andy_with_patch: 5 wallclock secs ( 4.17 usr + 0.00 sys = 4.17 CPU)
@ 23980.82/s (n=100000)
replace_josh_new2: 3 wallclock secs ( 3.45 usr + 0.00 sys = 3.45 CPU) @
28985.51/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 21413/s -- -11%
-26%
replace_andy_with_patch 23981/s 12% --
-17%
replace_josh_new2 28986/s 35% 21%
--
Running Sergey Test 4. 'foo $200bar foobar' =~ s{(f)(o+)}{zoo}
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 3 wallclock secs ( 2.51 usr + 0.00 sys = 2.51 CPU) @
39840.64/s (n=100000)
replace_andy_with_patch: 1 wallclock secs ( 0.50 usr + 0.00 sys = 0.50 CPU)
@ 200000.00/s (n=100000)
replace_josh_new2: 0 wallclock secs ( 0.51 usr + 0.00 sys = 0.51 CPU) @
196078.43/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 39841/s -- -80%
-80%
replace_andy_with_patch 200000/s 402% --
2%
replace_josh_new2 196078/s 392% -2%
--
Running Sergey Test 5. 'foo $200bar foobar' =~ s{(f)(o+)}{zoo}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 2 wallclock secs ( 3.75 usr + 0.00 sys = 3.75 CPU) @
26666.67/s (n=100000)
replace_andy_with_patch: 1 wallclock secs ( 0.64 usr + 0.00 sys = 0.64 CPU)
@ 156250.00/s (n=100000)
replace_josh_new2: 1 wallclock secs ( 0.67 usr + 0.00 sys = 0.67 CPU) @
149253.73/s (n=100000)
Use of uninitialized value in concatenation (.) or string at (eval 94) line 1.
Use of uninitialized value in concatenation (.) or string at (eval 94) line 1.
Use of uninitialized value in concatenation (.) or string at (eval 94) line 1.
Use of uninitialized value in concatenation (.) or string at (eval 94) line 1.
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 26667/s -- -83%
-82%
replace_andy_with_patch 156250/s 486% --
5%
replace_josh_new2 149254/s 460% -4%
--
Running Sergey Test 6. 'foo bar foobar' =~ s{(o)|([ar])}{$2!}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 16 wallclock secs (16.13 usr + 0.00 sys = 16.13 CPU) @ 6199.63/s
(n=100000)
replace_andy_with_patch: 16 wallclock secs (15.63 usr + 0.03 sys = 15.66 CPU)
@ 6385.70/s (n=100000)
replace_josh_new2: 8 wallclock secs ( 9.45 usr + 0.01 sys = 9.46 CPU) @
10570.82/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 6200/s -- -3%
-41%
replace_andy_with_patch 6386/s 3% --
-40%
replace_josh_new2 10571/s 71% 66%
--
Running Sergey Test 7. 'foo fgoo foooo bar' =~
s{((?:f([^o]*)(o+)\s)+)}{1=$1;2=$2;3=$3;}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 5 wallclock secs ( 5.16 usr + 0.00 sys = 5.16 CPU) @
19379.84/s (n=100000)
replace_andy_with_patch: 4 wallclock secs ( 4.73 usr + 0.00 sys = 4.73 CPU)
@ 21141.65/s (n=100000)
replace_josh_new2: 3 wallclock secs ( 4.06 usr + 0.00 sys = 4.06 CPU) @
24630.54/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 19380/s -- -8%
-21%
replace_andy_with_patch 21142/s 9% --
-14%
replace_josh_new2 24631/s 27% 17%
--
Running Sergey Test 8. 'foo bar' =~ s{(f)(o+)}{$2$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 4 wallclock secs ( 4.03 usr + 0.00 sys = 4.03 CPU) @
24813.90/s (n=100000)
replace_andy_with_patch: 4 wallclock secs ( 3.48 usr + 0.01 sys = 3.49 CPU)
@ 28653.30/s (n=100000)
replace_josh_new2: 3 wallclock secs ( 2.94 usr + 0.00 sys = 2.94 CPU) @
34013.61/s (n=100000)
Use of uninitialized value in concatenation (.) or string at (eval 115) line 1.
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 24814/s -- -13%
-27%
replace_andy_with_patch 28653/s 15% --
-16%
replace_josh_new2 34014/s 37% 19%
--
Running Sergey Test 9. 'foo bar' =~ s{(f)(o+)}{$20$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 4 wallclock secs ( 3.77 usr + 0.00 sys = 3.77 CPU) @
26525.20/s (n=100000)
replace_andy_with_patch: 3 wallclock secs ( 3.32 usr + 0.00 sys = 3.32 CPU)
@ 30120.48/s (n=100000)
replace_josh_new2: 3 wallclock secs ( 2.96 usr + 0.00 sys = 2.96 CPU) @
33783.78/s (n=100000)
Use of uninitialized value in concatenation (.) or string at (eval 122) line 1.
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 26525/s -- -12%
-21%
replace_andy_with_patch 30120/s 14% --
-11%
replace_josh_new2 33784/s 27% 12%
--
Running Sergey Test 10. 'foo bar' =~ s{(f)(o+)}{$2$10}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 2 wallclock secs ( 3.79 usr + 0.01 sys = 3.80 CPU) @
26315.79/s (n=100000)
replace_andy_with_patch: 2 wallclock secs ( 3.29 usr + 0.00 sys = 3.29 CPU)
@ 30395.14/s (n=100000)
replace_josh_new2: 3 wallclock secs ( 3.01 usr + 0.00 sys = 3.01 CPU) @
33222.59/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 26316/s -- -13%
-21%
replace_andy_with_patch 30395/s 16% --
-9%
replace_josh_new2 33223/s 26% 9%
--
Running Sergey Test 11. 'foo bar' =~ s{(f)(o+)}{\$2$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 4 wallclock secs ( 3.70 usr + 0.00 sys = 3.70 CPU) @
27027.03/s (n=100000)
replace_andy_with_patch: 4 wallclock secs ( 3.38 usr + 0.01 sys = 3.39 CPU)
@ 29498.53/s (n=100000)
replace_josh_new2: 2 wallclock secs ( 2.71 usr + 0.02 sys = 2.73 CPU) @
36630.04/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 27027/s -- -8%
-26%
replace_andy_with_patch 29499/s 9% --
-19%
replace_josh_new2 36630/s 36% 24%
--
Running Sergey Test 12. 'foo bar' =~ s{(f)(o+)}{x$1\\y$2}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 3 wallclock secs ( 4.35 usr + 0.01 sys = 4.36 CPU) @
22935.78/s (n=100000)
replace_andy_with_patch: 4 wallclock secs ( 3.75 usr + 0.00 sys = 3.75 CPU)
@ 26666.67/s (n=100000)
replace_josh_new2: 3 wallclock secs ( 3.27 usr + 0.00 sys = 3.27 CPU) @
30581.04/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 22936/s -- -14%
-25%
replace_andy_with_patch 26667/s 16% --
-13%
replace_josh_new2 30581/s 33% 15%
--
Running Sergey Test 13. 'foo bar' =~ s{(f)(o+)}{$2\\$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 5 wallclock secs ( 4.16 usr + 0.01 sys = 4.17 CPU) @
23980.82/s (n=100000)
replace_andy_with_patch: 4 wallclock secs ( 3.68 usr + 0.00 sys = 3.68 CPU)
@ 27173.91/s (n=100000)
replace_josh_new2: 3 wallclock secs ( 3.15 usr + 0.00 sys = 3.15 CPU) @
31746.03/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 23981/s -- -12%
-24%
replace_andy_with_patch 27174/s 13% --
-14%
replace_josh_new2 31746/s 32% 17%
--
Running Sergey Test 14. 'foo bar' =~ s{(f)(o+)}{$2\\\$1}g
Benchmark: timing 100000 iterations of replace_andy, replace_andy_with_patch,
replace_josh_new2...
replace_andy: 4 wallclock secs ( 3.78 usr + 0.00 sys = 3.78 CPU) @
26455.03/s (n=100000)
replace_andy_with_patch: 3 wallclock secs ( 3.38 usr + 0.00 sys = 3.38 CPU)
@ 29585.80/s (n=100000)
replace_josh_new2: 3 wallclock secs ( 2.83 usr + 0.00 sys = 2.83 CPU) @
35335.69/s (n=100000)
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 26455/s -- -11%
-25%
replace_andy_with_patch 29586/s 12% --
-16%
replace_josh_new2 35336/s 34% 19%
--
Totals:
Rate replace_andy replace_andy_with_patch
replace_josh_new2
replace_andy 13670/s -- -29%
-44%
replace_andy_with_patch 19290/s 41% --
-21%
replace_josh_new2 24466/s 79% 27%
--
use strict;
use warnings;
use Benchmark qw[timethese];
my @all_times;
my %tests = (
test1 => {
text => "alskjdfkas dfklasdj fa sdkfjalsdf afjklasdfja
faksldfja sfdklasjf aslfkaj dfklajsdfa sdfjklsdfkjsdf ksjf sdlfjafkjsdflksjfd",
pattern => '(sfdklasjf)',
replace => '$1 josh $1',
global => 1,
},
test2 => {
text => "alskjdfkas dfklasdj fa sdkfjalsdf afjklasdfja
faksldfja sfdklasjf aslfkaj dfklajsdfa sdfjklsdfkjsdf ksjf sdlfjafkjsdflksjfd",
pattern => '(sfdklasjf)',
replace => 'a josh b',
global => 1,
},
test3 => {
text => "alskjdfkas sdfjklsdfkjsdf ksjf
sdlfjafkjsdflksjfd",
pattern => '([fl])',
replace => 'this is going to be a longer replace string
to test "$1" if there is a difference in longer replace',
global => 1,
},
test4 => { # no backref version
text => "alskjdfkas sdfjklsdfkjsdf ksjf
sdlfjafkjsdflksjfd",
pattern => '([fl])',
replace => 'this is going to be a longer replace string
to test if there is a difference in longer replace',
global => 1,
},
test5 => { # no backref version
text => 'Total: $5.55',
pattern => '$5.55',
replace => '\$6.55',
global => 1,
},
test6 => { # no backref version
text => 'Total: $5.55',
pattern => '$5.55',
replace => '\\$6.55',
global => 1,
},
test7 => { # no backref version
text => 'Total: $5.55',
pattern => '$5.55',
replace => '\\\$6.55',
global => 1,
},
);
my $serg_tests = [
{
text => 'foo bar',
pattern => 'foo',
replace => 'bar',
},
{
text => 'foo bar foo',
pattern => '(?i)FOO',
replace => 'zoo',
global => 1,
},
{
text => 'foo $1 bar',
pattern => '(foo)(.*)(bar)',
replace => '$1$2$3',
},
{
text => 'foo $1 bar',
pattern => '(foo)(.*)(bar)',
replace => '$3$2$1',
},
{
text => 'foo $200bar foobar',
pattern => '(f)(o+)',
replace => 'zoo',
global => 0,
},
{
text => 'foo $200bar foobar',
pattern => '(f)(o+)',
replace => 'zoo',
},
{
text => 'foo bar foobar',
pattern => '(o)|([ar])',
replace => '$2!',
},
{
text => 'foo fgoo foooo bar',
pattern => '((?:f([^o]*)(o+)\s)+)',
replace => '1=$1;2=$2;3=$3;',
},
{
text => 'foo bar',
pattern => '(f)(o+)',
replace => '$2$1',
},
{
text => 'foo bar',
pattern => '(f)(o+)',
replace => '$20$1',
},
{
text => 'foo bar',
pattern => '(f)(o+)',
replace => '$2$10',
},
{
text => 'foo bar',
pattern => '(f)(o+)',
replace => '\$2$1',
},
{
text => 'foo bar',
pattern => '(f)(o+)',
replace => 'x$1\\\\y$2', # this is '$2\\$1'
},
{
text => 'foo bar',
pattern => '(f)(o+)',
replace => '$2\\\\$1', # this is '$2\\$1'
},
{
text => 'foo bar',
pattern => '(f)(o+)',
replace => '$2\\\\\\$1', # this is '$2\\\$1'
},
];
foreach my $testname(sort keys %tests){
my $hash = $tests{$testname};
my $global = $hash->{global};
$global = 1 unless defined $global;
$global = $global ? 'g' : '';
my $text = $hash->{text};
if (length $text > 40){
$text = substr($text, 0, 40) . '...';
}
print "\n\nRunning $testname '$text' =~
s{$hash->{pattern}}{$hash->{replace}}$global\n";
do_timings($hash);
}
my $i = 0;
foreach my $hash(@$serg_tests){
my $global = $hash->{global};
$global = 1 unless defined $global;
$global = $global ? 'g' : '';
my $text = $hash->{text};
if (length $text > 40){
$text = substr($text, 0, 40) . '...';
}
print "\n\nRunning Sergey Test $i. '$text' =~
s{$hash->{pattern}}{$hash->{replace}}$global\n";
do_timings($hash, );
$i++;
}
print "\n\nTotals:\n";
my $total_time = shift @all_times;
while (my $x = shift @all_times){
foreach my $k(keys %$total_time){
$total_time->{$k} = Benchmark::timesum($total_time->{$k}, $x->{$k});
}
}
cmpthese_custom ($total_time);
exit; # all done!
sub do_timings{
my $current_hash = shift;
my $text1 = $current_hash->{text};
my $pattern1 = $current_hash->{pattern};
my $replace1 = $current_hash->{replace};
my $global1 = $current_hash->{global};
my $g = $global1;
$g = 1 unless defined $g;
$g = $g ? 'g' : '';
my $actual = $text1;
# don't evaluate pattern, cause we're not going to do that in TT.
eval '$actual =~ s/$pattern1/' . "$replace1/$g;";
my %tests = (
'replace_andy' => sub {
my ($text, $pattern, $replace, $global) = ($text1, $pattern1,
$replace1, $global1);
$text = '' unless defined $text;
$pattern = '' unless defined $pattern;
$replace = '' unless defined $replace;
$global = 1 unless defined $global;
my $expand = sub {
my ($chunk, $start, $end) = @_;
$chunk =~ s{ \\(\\|\$) | \$ (\d+) }{
$1 ? $1
: ($2 > $#$start || $2 == 0) ? ''
: substr($text, $start->[$2], $end->[$2] -
$start->[$2]);
}exg;
$chunk;
};
if ($global) {
$text =~ s{$pattern}{ &$expand($replace, [EMAIL PROTECTED],
[EMAIL PROTECTED]) }eg;
} else {
$text =~ s{$pattern}{ &$expand($replace, , [EMAIL
PROTECTED], [EMAIL PROTECTED]) }e;
}
if ($text ne $actual) {
die "Unequal replace_andy results!\n|$text|\n|$actual|";
}
return $text;
},
'replace_andy_with_patch' => sub {
my ($text, $pattern, $replace, $global) = ($text1, $pattern1,
$replace1, $global1);
$replace = '' unless defined $replace;
return $text unless defined $text and defined $pattern;
$global = 1 unless defined $global;
my $replace_has_backref;
while ($replace =~ m/(\\*) \$\d+/gx) {
if (length($1 || '') % 2 == 0) {
$replace_has_backref = 1;
last;
}
}
if (!$replace_has_backref) {
if ($global) {
$text =~ s/$pattern/$replace/g;
} else {
$text =~ s/$pattern/$replace/;
}
if ($text ne $actual) {
die "replace_andy_with_patch results!\n|$text|\n|$actual|";
}
return $text;
}
if ($global) {
$text =~ s{$pattern}{
my ($chunk, $start, $end) = ($replace, [EMAIL PROTECTED],
[EMAIL PROTECTED]);
$chunk =~ s{ \\(\\|\$) | \$ (\d+) }{
$1 ? $1
: ($2 > $#$start || $2 == 0) ? ''
: substr($text, $start->[$2], $end->[$2] -
$start->[$2]);
}exg;
$chunk;
}eg;
} else {
$text =~ s{$pattern}{
my ($chunk, $start, $end) = ($replace, [EMAIL PROTECTED],
[EMAIL PROTECTED]);
$chunk =~ s{ \\(\\|\$) | \$ (\d+) }{
$1 ? $1
: ($2 > $#$start || $2 == 0) ? ''
: substr($text, $start->[$2], $end->[$2] -
$start->[$2]);
}exg;
$chunk;
}e;
}
if ($text ne $actual) {
die "replace_andy_with_patch results!\n|$text|\n|$actual|";
}
return $text;
},
# this one seems to work pretty good, but requires a no strict
'refs' section.
replace_josh_new2 => sub{
my ($str, $search, $replace, $global) = ($text1, $pattern1,
$replace1, $global1);
$replace = '' unless defined $replace;
return $str unless defined $str and defined $search;
$global = 1 unless defined $global;
my %ids_to_store;
while ($replace =~ m/(\\*) \$(\d+)/gx) { # allow backslashing.
if (length($1 || '') % 2 == 0) {
$ids_to_store{$2} = 1;
}
}
if (scalar(keys %ids_to_store) == 0) {
if ($global) {
$str =~ s/$search/$replace/g;
} else {
$str =~ s/$search/$replace/;
}
if ($str ne $actual) {
die "Unequal replace_josh_new2
results!\n|$str|\n|$actual|";
}
return $str;
}
no strict 'refs';
if ($global) {
$str =~ s/$search/
foreach (keys %ids_to_store) {
$ids_to_store{$_} = ${$_}; # use symbolic ref
}
my $template = $replace;
$template =~ s! \\(\\|\$) | \$ (\d+) !
$1 ? $1 : ($ids_to_store{$2} || '');
!exg;
$template;
/ge;
} else {
$str =~ s/$search/
foreach (keys %ids_to_store) {
$ids_to_store{$_} = ${$_}; # use symbolic ref
}
my $template = $replace;
$template =~ s! \\(\\|\$) | \$ (\d+) !
$1 ? $1 : ($ids_to_store{$2} || '');
!exg;
$template;
/e;
}
use strict 'refs';
if ($str ne $actual) {
die "Unequal replace_josh_new2 results!\n|$str|\n|$actual|";
}
return $str;
},
);
my $results = timethese(100_000, \%tests );
cmpthese_custom ($results);
push @all_times, $results;
}
###############################################################
# No need to read below here
###############################################################
### This is from Benchmark.pm. I wanted cmpthese to sort things by key name
### rather than slowest to fastest, which makes it hard to read.
our %_Usage;
sub usage {
my $calling_sub = (caller(1))[3];
$calling_sub =~ s/^Benchmark:://;
return $_Usage{$calling_sub} || '';
}
$_Usage{cmpthese} = <<'USAGE';
usage: cmpthese($count, { Name1 => 'code1', ... }); or
cmpthese($count, { Name1 => sub { code1 }, ... }); or
cmpthese($result, $style);
USAGE
sub cmpthese_custom{
my ($results, $style);
if( ref $_[0] ) {
($results, $style) = @_;
}
else {
my($count, $code) = @_[0,1];
$style = $_[2] if defined $_[2];
die usage unless ref $code eq 'HASH';
$results = timethese($count, $code, ($style || "none"));
}
$style = "" unless defined $style;
# Flatten in to an array of arrays with the name as the first field
my @vals = map{ [ $_, @{$results->{$_}} ] } keys %$results;
for (@vals) {
# The epsilon fudge here is to prevent div by 0. Since clock
# resolutions are much larger, it's below the noise floor.
my $rate = $_->[6] / (( $style eq 'nop' ? $_->[4] + $_->[5]
: $_->[2] + $_->[3]) + 0.000000000000001 );
$_->[7] = $rate;
}
# Sort by rate
@vals = sort { $a->[0] cmp $b->[0] } @vals;
# If more than half of the rates are greater than one...
my $display_as_rate = @vals ? ($vals[$#vals>>1]->[7] > 1) : 0;
my @rows;
my @col_widths;
my @top_row = (
'',
$display_as_rate ? 'Rate' : 's/iter',
map { $_->[0] } @vals
);
push @rows, [EMAIL PROTECTED];
@col_widths = map { length( $_ ) } @top_row;
# Build the data rows
# We leave the last column in even though it never has any data. Perhaps
# it should go away. Also, perhaps a style for a single column of
# percentages might be nice.
for my $row_val ( @vals ) {
my @row;
# Column 0 = test name
push @row, $row_val->[0];
$col_widths[0] = length( $row_val->[0] )
if length( $row_val->[0] ) > $col_widths[0];
# Column 1 = performance
my $row_rate = $row_val->[7];
# We assume that we'll never get a 0 rate.
my $rate = $display_as_rate ? $row_rate : 1 / $row_rate;
# Only give a few decimal places before switching to sci. notation,
# since the results aren't usually that accurate anyway.
my $format =
$rate >= 100 ?
"%0.0f" :
$rate >= 10 ?
"%0.1f" :
$rate >= 1 ?
"%0.2f" :
$rate >= 0.1 ?
"%0.3f" :
"%0.2e";
$format .= "/s"
if $display_as_rate;
my $formatted_rate = sprintf( $format, $rate );
push @row, $formatted_rate;
$col_widths[1] = length( $formatted_rate )
if length( $formatted_rate ) > $col_widths[1];
# Columns 2..N = performance ratios
my $skip_rest = 0;
for ( my $col_num = 0 ; $col_num < @vals ; ++$col_num ) {
my $col_val = $vals[$col_num];
my $out;
if ( $skip_rest ) {
$out = '';
}
elsif ( $col_val->[0] eq $row_val->[0] ) {
$out = "--";
# $skip_rest = 1;
}
else {
my $col_rate = $col_val->[7];
$out = sprintf( "%.0f%%", 100*$row_rate/$col_rate - 100 );
}
push @row, $out;
$col_widths[$col_num+2] = length( $out )
if length( $out ) > $col_widths[$col_num+2];
# A little wierdness to set the first column width properly
$col_widths[$col_num+2] = length( $col_val->[0] )
if length( $col_val->[0] ) > $col_widths[$col_num+2];
}
push @rows, [EMAIL PROTECTED];
}
return [EMAIL PROTECTED] if $style eq "none";
# Equalize column widths in the chart as much as possible without
# exceeding 80 characters. This does not use or affect cols 0 or 1.
my @sorted_width_refs =
sort { $$a <=> $$b } map { \$_ } @col_widths[2..$#col_widths];
my $max_width = ${$sorted_width_refs[-1]};
my $total = @col_widths - 1 ;
for ( @col_widths ) { $total += $_ }
STRETCHER:
while ( $total < 80 ) {
my $min_width = ${$sorted_width_refs[0]};
last
if $min_width == $max_width;
for ( @sorted_width_refs ) {
last
if $$_ > $min_width;
++$$_;
++$total;
last STRETCHER
if $total >= 80;
}
}
# Dump the output
my $format = join( ' ', map { "%${_}s" } @col_widths ) . "\n";
substr( $format, 1, 0 ) = '-';
for ( @rows ) {
printf $format, @$_;
}
return [EMAIL PROTECTED] ;
}