[EMAIL PROTECTED] wrote:
Here is another code snippet that you might consider. It has the unfortunate
item of needing to be copied twice for global. But it does handle $10. It
also handles $10 more similarly to Perl. The problem is that if I have $10
in my string - but I have less than 10 patterns, then I will end up with
"mymatch0" rather than "" should $1 contain mymatch. Also, it should be
decided about what to do with $1 variables in the replacement portion should
there not be any matches. The version you had given leaves them there -
though I don't think that is desired behavior.
Any way - here is a quick code snippet - that I think I may use in some of my
applications.
sub vmethod_replace {
}
Paul
I like Paul's code as it's cleaner, quicker, and seems to work nicer.
Also, as he mentioned there are some problems in the previous code
like $10 not getting removed when there isn't a 10th back reference.
(Instead $1 is replaced looking like: "{$1 replace}0")
I just spent a decent amount of time benchmarking/testing and creating
code for this replace function. I've attached the file that I've been
using to test some of my own replace functions I've come up with in
addition to these other ones. (I've also included benchmarks up top.)
Anyone who spots any bugs in the file, feel free to throw them out, as
I'm pretty tired now and imagine there might be a couple here and there.
All of these functions are much slower than the normal search/replace.
Our replace function should almost definitely check for a backref in
the replace string FIRST, and if there isn't one, then use standard
search/replace. If the user does not need need a backref, we shouldn't
use the new replace code. (You can play with this in the test script,
and see how I did the check in the various functions.) There is a
fairly minor performance hit for the backref check. The benchmarks
will show you this. (There could be more test cases for sure, but I've
already spent more time on this than I should have.)
Let me know if there are any questions or comments.
-- Josh
# The only thing you need to do with this file is change $current_test below,
and
# CHECK_FOR_BACK_REFS_IN_REPLACE as well.
# You can also add more tests, but will need to make sure to also add a section
# in the replace_norm area. (Take heed of the note in that section, though.
Adding
# tests will slow it down quite a bit the lower down they are in the if
conditional.)
####### (test1) - This has backrefs in the replace.
####### CHECK_FOR_BACK_REFS_IN_REPLACE = 0
# Benchmark: timing 100000 iterations of replace_josh, replace_josh_new,
replace_josh_new2, replace_norm, replace_substr, vmethod_replace...
# replace_josh: 3 wallclock secs ( 2.26 usr + 0.00 sys = 2.26 CPU) @
44247.79/s (n=100000)
# replace_josh_new: 3 wallclock secs ( 3.50 usr + 0.00 sys = 3.50 CPU) @
28571.43/s (n=100000)
# replace_josh_new2: 3 wallclock secs ( 2.42 usr + 0.00 sys = 2.42 CPU) @
41322.31/s (n=100000)
# replace_norm: 1 wallclock secs ( 0.82 usr + 0.00 sys = 0.82 CPU) @
121951.22/s (n=100000)
# replace_substr: 3 wallclock secs ( 3.17 usr + 0.00 sys = 3.17 CPU) @
31545.74/s (n=100000)
# vmethod_replace: 3 wallclock secs ( 3.09 usr + 0.00 sys = 3.09 CPU) @
32362.46/s (n=100000)
####### (test1) - This has backrefs in the replace.
####### CHECK_FOR_BACK_REFS_IN_REPLACE = 1
####### This checking will make replaces that don't require
backrefs much faster.
# Benchmark: timing 100000 iterations of replace_josh, replace_josh_new,
replace_josh_new2, replace_norm, replace_substr, vmethod_replace...
# replace_josh: 3 wallclock secs ( 2.45 usr + 0.00 sys = 2.45 CPU) @
40816.33/s (n=100000)
# replace_josh_new: 3 wallclock secs ( 3.56 usr + 0.00 sys = 3.56 CPU) @
28089.89/s (n=100000)
# replace_josh_new2: 3 wallclock secs ( 2.55 usr + 0.00 sys = 2.55 CPU) @
39215.69/s (n=100000)
# replace_norm: 1 wallclock secs ( 0.80 usr + 0.00 sys = 0.80 CPU) @
125000.00/s (n=100000)
# replace_substr: 3 wallclock secs ( 3.32 usr + 0.00 sys = 3.32 CPU) @
30120.48/s (n=100000)
# vmethod_replace: 4 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 CPU) @
31948.88/s (n=100000)
####### (test2) - This test has no backrefs in the replace.
####### CHECK_FOR_BACK_REFS_IN_REPLACE = 0
# Benchmark: timing 100000 iterations of replace_josh, replace_josh_new,
replace_josh_new2, replace_norm, replace_substr, vmethod_replace...
# replace_josh: 2 wallclock secs ( 1.43 usr + 0.00 sys = 1.43 CPU) @
69930.07/s (n=100000)
# replace_josh_new: 2 wallclock secs ( 1.93 usr + 0.00 sys = 1.93 CPU) @
51813.47/s (n=100000)
# replace_josh_new2: 1 wallclock secs ( 0.71 usr + 0.00 sys = 0.71 CPU) @
140845.07/s (n=100000)
# replace_norm: 1 wallclock secs ( 0.51 usr + 0.00 sys = 0.51 CPU) @
196078.43/s (n=100000)
# replace_substr: 3 wallclock secs ( 2.80 usr + 0.01 sys = 2.81 CPU) @
35587.19/s (n=100000)
# vmethod_replace: 1 wallclock secs ( 1.53 usr + 0.01 sys = 1.54 CPU) @
64935.06/s (n=100000)
####### (test2) - This test has no backrefs in the replace.
####### CHECK_FOR_BACK_REFS_IN_REPLACE = 1
# Benchmark: timing 100000 iterations of replace_josh, replace_josh_new,
replace_josh_new2, replace_norm, replace_substr, vmethod_replace...
# replace_josh: 1 wallclock secs ( 0.60 usr + 0.00 sys = 0.60 CPU) @
166666.67/s (n=100000)
# replace_josh_new: 1 wallclock secs ( 0.59 usr + 0.00 sys = 0.59 CPU) @
169491.53/s (n=100000)
# replace_josh_new2: 1 wallclock secs ( 0.71 usr + 0.00 sys = 0.71 CPU) @
140845.07/s (n=100000)
# replace_norm: 0 wallclock secs ( 0.52 usr + 0.00 sys = 0.52 CPU) @
192307.69/s (n=100000)
# replace_substr: -1 wallclock secs ( 0.66 usr + 0.00 sys = 0.66 CPU) @
151515.15/s (n=100000)
# vmethod_replace: 1 wallclock secs ( 0.63 usr + 0.00 sys = 0.63 CPU) @
158730.16/s (n=100000)
use strict;
use warnings;
use Benchmark qw[timethese];
my $current_test = 'test1'; # just toggle me to see tests.
use constant CHECK_FOR_BACK_REFS_IN_REPLACE => 1;
my %tests = (
test1 => {
text => "alskjdfkas dfklasdj fa sdkfjalsdf afjklasdfja
faksldfja sfdklasjf aslfkaj dfklajsdfa sdfjklsdfkjsdf ksjf sdlfjafkjsdflksjfd",
pattern => '(sfdklasjf)',
replace => '$1 josh $1', #replace_substr chokes on $10
if put in here.
global => 1,
},
test2 => {
text => "alskjdfkas dfklasdj fa sdkfjalsdf afjklasdfja
faksldfja sfdklasjf aslfkaj dfklajsdfa sdfjklsdfkjsdf ksjf sdlfjafkjsdflksjfd",
pattern => '(sfdklasjf)',
replace => 'a josh b',
global => 1,
},
);
my $current_hash = $tests{$current_test};
my $text1 = $current_hash->{text};
my $pattern1 = $current_hash->{pattern};
my $replace1 = $current_hash->{replace};
my $global1 = $current_hash->{global};
my $actual = $text1;
eval "\$actual =~ s/$pattern1/$replace1/;";
# used by replace_josh_new below. (It's slow, but an idea.)
my %is_digit = map {$_ => 1} (0..9);
timethese(100_000, {
vmethod_replace => sub{
my ($str, $search, $replace, $global) = ($text1,
$pattern1, $replace1, $global1);
$str = '' if ! defined $str;
$search = '' if ! defined $search;
$replace = '' if ! defined $replace;
$global = 1 if ! defined $global;
my $replace_has_backref;
if(CHECK_FOR_BACK_REFS_IN_REPLACE){
$replace_has_backref = ($replace =~ m: (?<!\\) \$
(\d+):x);
}else{
$replace_has_backref = 1;
}
if ($global) {
if (!$replace_has_backref){
$str =~ s/$search/$replace/g;
if ($str ne $actual) {
die "Unequal1 results!\n|$str|\n|$actual|";
}
return $str;
}
$str =~ s{$search}{
my @start = @-;
my @end = @+;
my $copy = $replace;
$copy =~ s{ (?<!\\) \$ (\d+) }{
($1 > $#start || $1 == 0) ? '' : substr($str,
$start[$1],
$end[$1] -
$start[$1]);
}exg;
$copy;
}eg;
} else {
if (!$replace_has_backref){
$str =~ s/$search/$replace/;
if ($str ne $actual) {
die "Unequal1 results!\n|$str|\n|$actual|";
}
return $str;
}
$str =~ s{$search}{
my @start = @-;
my @end = @+;
my $copy = $replace;
$copy =~ s{ (?<!\\) \$ (\d+) }{
($1 > $#start || $1 == 0) ? '' : substr($str,
$start[$1],
$end[$1] -
$start[$1]);
}exg;
$copy;
}e;
}
if ($str ne $actual) {
die "Unequal1 results!\n|$str|\n|$actual|";
}
return $str;
},
'replace_norm' => sub{
my ($str, $search, $replace, $global) = ($text1,
$pattern1, $replace1, $global1);
if ($current_test eq 'test1'){ # these if statements will
slow this down a bit
if ($global){
$str =~ s/$search/$1 josh $1/g;
}else{
$str =~ s/$search/$1 josh $1/;
}
}elsif($current_test eq 'test2'){
if ($global){
$str =~ s/$search/a josh b/g;
}else{
$str =~ s/$search/a josh b/;
}
}else{
die "Entry needed in replace_norm";
}
if ($str ne $actual) {
die "Unequal results!\n|$str|\n|$actual|";
}
return $str;
},
# this one seems to work pretty good, but requires a no
strict 'refs' section.
replace_josh_new2 => sub{
my ($str, $search, $replace, $global) = ($text1,
$pattern1, $replace1, $global1);
if (!defined($global)) {
$global = 1; #match globally by default.
}
$replace = '' unless defined $replace;
return $str unless defined $str and defined $search;
my %ids_to_store = map {$_ => 1} ($replace =~ /\$(\d+)/);
if (scalar(keys %ids_to_store) == 0){
if ($global) {
$str =~ s/$search/$replace/g;
}else{
$str =~ s/$search/$replace/;
}
}
no strict 'refs';
if ($global) {
$str =~ s!$search!
foreach (keys %ids_to_store) {
$ids_to_store{$_} = ${$_}; # use symbolic ref
}
my $template = $replace;
$template =~ s/\$(\d+)/$ids_to_store{$1} || ''/eg;
$template;
!ge;
} else {
$str =~ s!$search!
foreach (keys %ids_to_store) {
$ids_to_store{$_} = ${$_}; # use symbolic ref
}
my $template = $replace;
$template =~ s/\$(\d+)/$ids_to_store{$1} || ''/eg;
$template;
!e;
}
use strict 'refs';
if ($str ne $actual) {
die "Unequal results!\n|$str|\n|$actual|";
}
return $str;
},
# This is a pretty crazy one that tries to iterate through
the replace character
# by character and do replaces. It also requires no strict
'refs' to be on.
# This one kind of sucks compared to the rest, but I left
it in case anyone got
# got any ideas from it.
'replace_josh_new' => sub {
my ($str, $search, $replace, $global) = ($text1,
$pattern1, $replace1, $global1);
if (!defined($global)) {
$global = 1; #match globally by default.
}
$replace = '' unless defined $replace;
return $str unless defined $str and defined $search;
my $replace_has_backref;
if(CHECK_FOR_BACK_REFS_IN_REPLACE){
$replace_has_backref = ($replace =~ m: (?<!\\) \$
(\d+):x);
}else{
$replace_has_backref = 1;
}
no strict 'refs';
if ($global) {
if (!$replace_has_backref){
$str =~ s/$search/$replace/g;
if ($str ne $actual) {
die "Unequal1 results!\n|$str|\n|$actual|";
}
return $str;
}
$str =~ s{$search}{
my $template = $replace;
my $len = length($template);
my $i = 0;
my $start;
my $end;
#print "length is: $len\n";
while($i < $len){ # do decrement before, cause we
can't do anything if there is only one character.
$start = $i++;
my $char = substr($template, $start, 1);
#print "char is: $char. Start is: $start. i is:
$i\n";
if ($char ne "\$"){
next;
}
#print "Found \$ char at $start.\n";
my $digit_string = '';
while($i <= $len){
$char = substr($template, $i, 1);
if (!$is_digit{$char}){
#print "HERE. char is: '$char'. i is $i\n";
last;
}
$i++;
$end = $i;
$digit_string .= $char;
}
if (!$digit_string){
next;
}
my $saved_string = ${$digit_string} || ''; #using
symbolic ref here.
substr($template, $start, $end - $start) =
$saved_string;
$i = $i + length($saved_string) - ($end-$start);
$len = length $template;
}
$template;
}ge;
} else {
if (!$replace_has_backref){
$str =~ s/$search/$replace/;
if ($str ne $actual) {
die "Unequal1 results!\n|$str|\n|$actual|";
}
return $str;
}
$str =~ s{$search}{
my $template = $replace;
my $len = length($template);
my $i = 0;
my $start;
my $end;
#print "length is: $len\n";
while($i < $len){ # do decrement before, cause we
can't do anything if there is only one character.
$start = $i++;
my $char = substr($template, $start, 1);
#print "char is: $char. Start is: $start. i is:
$i\n";
if ($char ne "\$"){
next;
}
#print "Found \$ char at $start.\n";
my $digit_string = '';
while($i <= $len){
$char = substr($template, $i, 1);
if (!$is_digit{$char}){
#print "HERE. char is: '$char'. i is $i\n";
last;
}
$i++;
$end = $i;
$digit_string .= $char;
}
if (!$digit_string){
next;
}
my $saved_string = ${$digit_string} || ''; #using
symbolic ref here.
substr($template, $start, $end - $start) =
$saved_string;
$i = $i + length($saved_string) - ($end-$start);
$len = length $template;
}
$template;
}e;
}
use strict 'refs';
if ($str ne $actual) {
die "Unequal results!\n|$str|\n|$actual|";
}
return $str;
},
# limitations -- only 9 backreferences
'replace_josh' => sub {
my ($str, $search, $replace, $global) = ($text1,
$pattern1, $replace1, $global1);
if (!defined($global)) {
$global = 1; #match globally by default.
}
$replace = '' unless defined $replace;
return $str unless defined $str and defined $search;
my $replace_has_backref;
if(CHECK_FOR_BACK_REFS_IN_REPLACE){
$replace_has_backref = ($replace =~ m: (?<!\\) \$
(\d+):x);
}else{
$replace_has_backref = 1;
}
if ($global) {
if (!$replace_has_backref){
$str =~ s/$search/$replace/g;
if ($str ne $actual) {
die "Unequal1 results!\n|$str|\n|$actual|";
}
return $str;
}
$str =~ s!$search!
my @vals = ($1, $2, $3, $4, $5, $6, $7, $8, $9);
my $template = $replace;
$template =~ s/\$(\d+)/$vals[$1 - 1] || ''/eg;
$template;
!ge;
} else {
if (!$replace_has_backref){
$str =~ s/$search/$replace/;
if ($str ne $actual) {
die "Unequal1 results!\n|$str|\n|$actual|";
}
return $str;
}
$str =~ s!$search!
my @vals = ($1, $2, $3, $4, $5, $6, $7, $8, $9);
my $template = $replace;
$template =~ s/\$(\d+)/$vals[$1 - 1] || ''/eg;
$template;
!e;
}
if ($str ne $actual) {
die "Unequal results!\n|$str|\n|$actual|";
}
return $str;
},
#limitations: when there are 10 backrefs and you use $10,
we leave that in, and it
# gets used as $1 later on. It should be whacked.
'replace_substr' => sub {
my ($str, $search, $replace, $global) = ($text1,
$pattern1, $replace1, $global1);
my ($matched, $after, $backref, @start, @end);
my $result = '';
$global = 1 unless defined $global;
my $replace_has_backref;
if(CHECK_FOR_BACK_REFS_IN_REPLACE){
$replace_has_backref = ($replace =~ m: (?<!\\) \$
(\d+):x);
}else{
$replace_has_backref = 1;
}
if (!$replace_has_backref){
if ($global){
$str =~ s/$search/$replace/g;
}else{
$str =~ s/$search/$replace/;
}
if ($str ne $actual) {
die "Unequal1 results!\n|$str|\n|$actual|";
}
return $str;
}
while ($str =~ m/$search/) {
if ($#- == 0) {
# no captured groups so do a simple search and replace
if ($global) {
$str =~ s/$search/$replace/g;
} else {
$str =~ s/$search/$replace/;
}
last;
}
# extract the bit before the match, the match itself,
the
# bit after and the positions of all subgroups
$result .= substr($str, 0, $-[0]) if $-[0];
$matched = substr($str, $-[0], $+[0] - $-[0]);
$after = substr($str, $+[0]);
@start = @-;
@end = @+;
# do the s/// leaving the placeholders (literally '$1'
etc) in place
$matched =~ s/$search/$replace/;
# then replace the $1, $2, etc., placeholders in
reverse order
# to ensure we do $10 before $1
for (my $i = $#start; $i; $i--) {
$backref = substr( $str, $start[$i], $end[$i] -
$start[$i] );
$matched =~ s/\$$i/$backref/g;
}
# add the modified $matched output to the result and
loop if global
$result .= $matched;
$str = $after;
last unless $global && length $str;
}
my $ret = $result . $str;
if ($ret ne $actual) {
die "(replace_substr) Unequal
results!\n|$ret|\n|$actual|";
}
return $ret;
#return $result . $str;
},
}
);