Re: [Templates] Vmethods to add to 2.15

Josh Rosenbaum Wed, 01 Feb 2006 20:34:04 -0800

[EMAIL PROTECTED] wrote:

Here is another code snippet that you might consider. It has the unfortunateitem of needing to be copied twice for global. But it does handle $10. Italso handles $10 more similarly to Perl. The problem is that if I have $10in my string - but I have less than 10 patterns, then I will end up with"mymatch0" rather than "" should $1 contain mymatch. Also, it should bedecided about what to do with $1 variables in the replacement portion shouldthere not be any matches. The version you had given leaves them there -though I don't think that is desired behavior.
Any way - here is a quick code snippet - that I think I may use in some of myapplications.
sub vmethod_replace {
}

Paul

I like Paul's code as it's cleaner, quicker, and seems to work nicer.Also, as he mentioned there are some problems in the previous codelike $10 not getting removed when there isn't a 10th back reference.(Instead $1 is replaced looking like: "{$1 replace}0")

I just spent a decent amount of time benchmarking/testing and creatingcode for this replace function. I've attached the file that I've beenusing to test some of my own replace functions I've come up with inaddition to these other ones. (I've also included benchmarks up top.)Anyone who spots any bugs in the file, feel free to throw them out, asI'm pretty tired now and imagine there might be a couple here and there.

All of these functions are much slower than the normal search/replace.Our replace function should almost definitely check for a backref inthe replace string FIRST, and if there isn't one, then use standardsearch/replace. If the user does not need need a backref, we shouldn'tuse the new replace code. (You can play with this in the test script,and see how I did the check in the various functions.) There is afairly minor performance hit for the backref check. The benchmarkswill show you this. (There could be more test cases for sure, but I'vealready spent more time on this than I should have.)


Let me know if there are any questions or comments.

-- Josh

# The only thing you need to do with this file is change $current_test below, 
and
# CHECK_FOR_BACK_REFS_IN_REPLACE as well.
# You can also add more tests, but will need to make sure to also add a section
# in the replace_norm area. (Take heed of the note in that section, though. 
Adding
# tests will slow it down quite a bit the lower down they are in the if 
conditional.)


####### (test1) - This has backrefs in the replace.
#######  CHECK_FOR_BACK_REFS_IN_REPLACE = 0
# Benchmark: timing 100000 iterations of replace_josh, replace_josh_new, 
replace_josh_new2, replace_norm, replace_substr, vmethod_replace...
#  replace_josh:  3 wallclock secs ( 2.26 usr +  0.00 sys =  2.26 CPU) @ 
44247.79/s (n=100000)
#  replace_josh_new:  3 wallclock secs ( 3.50 usr +  0.00 sys =  3.50 CPU) @ 
28571.43/s (n=100000)
#  replace_josh_new2:  3 wallclock secs ( 2.42 usr +  0.00 sys =  2.42 CPU) @ 
41322.31/s (n=100000)
#  replace_norm:  1 wallclock secs ( 0.82 usr +  0.00 sys =  0.82 CPU) @ 
121951.22/s (n=100000)
#  replace_substr:  3 wallclock secs ( 3.17 usr +  0.00 sys =  3.17 CPU) @ 
31545.74/s (n=100000)
#  vmethod_replace:  3 wallclock secs ( 3.09 usr +  0.00 sys =  3.09 CPU) @ 
32362.46/s (n=100000)


####### (test1) - This has backrefs in the replace.
#######  CHECK_FOR_BACK_REFS_IN_REPLACE = 1
#######                This checking will make replaces that don't require 
backrefs much faster.
# Benchmark: timing 100000 iterations of replace_josh, replace_josh_new, 
replace_josh_new2, replace_norm, replace_substr, vmethod_replace...
#  replace_josh:  3 wallclock secs ( 2.45 usr +  0.00 sys =  2.45 CPU) @ 
40816.33/s (n=100000)
#  replace_josh_new:  3 wallclock secs ( 3.56 usr +  0.00 sys =  3.56 CPU) @ 
28089.89/s (n=100000)
#  replace_josh_new2:  3 wallclock secs ( 2.55 usr +  0.00 sys =  2.55 CPU) @ 
39215.69/s (n=100000)
#  replace_norm:  1 wallclock secs ( 0.80 usr +  0.00 sys =  0.80 CPU) @ 
125000.00/s (n=100000)
#  replace_substr:  3 wallclock secs ( 3.32 usr +  0.00 sys =  3.32 CPU) @ 
30120.48/s (n=100000)
#  vmethod_replace:  4 wallclock secs ( 3.13 usr +  0.00 sys =  3.13 CPU) @ 
31948.88/s (n=100000)



####### (test2) - This test has no backrefs in the replace.
#######  CHECK_FOR_BACK_REFS_IN_REPLACE = 0
# Benchmark: timing 100000 iterations of replace_josh, replace_josh_new, 
replace_josh_new2, replace_norm, replace_substr, vmethod_replace...
#  replace_josh:  2 wallclock secs ( 1.43 usr +  0.00 sys =  1.43 CPU) @ 
69930.07/s (n=100000)
#  replace_josh_new:  2 wallclock secs ( 1.93 usr +  0.00 sys =  1.93 CPU) @ 
51813.47/s (n=100000)
#  replace_josh_new2:  1 wallclock secs ( 0.71 usr +  0.00 sys =  0.71 CPU) @ 
140845.07/s (n=100000)
#  replace_norm:  1 wallclock secs ( 0.51 usr +  0.00 sys =  0.51 CPU) @ 
196078.43/s (n=100000)
#  replace_substr:  3 wallclock secs ( 2.80 usr +  0.01 sys =  2.81 CPU) @ 
35587.19/s (n=100000)
#  vmethod_replace:  1 wallclock secs ( 1.53 usr +  0.01 sys =  1.54 CPU) @ 
64935.06/s (n=100000)


####### (test2) - This test has no backrefs in the replace.
#######  CHECK_FOR_BACK_REFS_IN_REPLACE = 1
# Benchmark: timing 100000 iterations of replace_josh, replace_josh_new, 
replace_josh_new2, replace_norm, replace_substr, vmethod_replace...
#  replace_josh:  1 wallclock secs ( 0.60 usr +  0.00 sys =  0.60 CPU) @ 
166666.67/s (n=100000)
#  replace_josh_new:  1 wallclock secs ( 0.59 usr +  0.00 sys =  0.59 CPU) @ 
169491.53/s (n=100000)
#  replace_josh_new2:  1 wallclock secs ( 0.71 usr +  0.00 sys =  0.71 CPU) @ 
140845.07/s (n=100000)
#  replace_norm:  0 wallclock secs ( 0.52 usr +  0.00 sys =  0.52 CPU) @ 
192307.69/s (n=100000)
#  replace_substr: -1 wallclock secs ( 0.66 usr +  0.00 sys =  0.66 CPU) @ 
151515.15/s (n=100000)
#  vmethod_replace:  1 wallclock secs ( 0.63 usr +  0.00 sys =  0.63 CPU) @ 
158730.16/s (n=100000)



use strict;
use warnings;
use Benchmark qw[timethese];

my $current_test = 'test1'; # just toggle me to see tests.
use constant CHECK_FOR_BACK_REFS_IN_REPLACE => 1;

my %tests = (
             test1 => {
                       text => "alskjdfkas dfklasdj fa sdkfjalsdf afjklasdfja 
faksldfja sfdklasjf aslfkaj dfklajsdfa sdfjklsdfkjsdf ksjf sdlfjafkjsdflksjfd",
                       pattern => '(sfdklasjf)',
                       replace => '$1 josh $1', #replace_substr chokes on $10 
if put in here.
                       global => 1,
                      },

             test2 => {
                       text => "alskjdfkas dfklasdj fa sdkfjalsdf afjklasdfja 
faksldfja sfdklasjf aslfkaj dfklajsdfa sdfjklsdfkjsdf ksjf sdlfjafkjsdflksjfd",
                       pattern => '(sfdklasjf)',
                       replace => 'a josh b',
                       global => 1,
                      },

            );


my $current_hash = $tests{$current_test};
my $text1 = $current_hash->{text};
my $pattern1 = $current_hash->{pattern};
my $replace1 = $current_hash->{replace};
my $global1 = $current_hash->{global};

my $actual = $text1;
eval "\$actual =~ s/$pattern1/$replace1/;";

# used by replace_josh_new below. (It's slow, but an idea.)
my %is_digit = map {$_ => 1} (0..9);

timethese(100_000, {

                    vmethod_replace => sub{
                      my ($str, $search, $replace, $global) = ($text1, 
$pattern1, $replace1, $global1);
                      $str     = '' if ! defined $str;
                      $search     = '' if ! defined $search;
                      $replace = '' if ! defined $replace;
                      $global  = 1  if ! defined $global;

                      my $replace_has_backref;
                      if(CHECK_FOR_BACK_REFS_IN_REPLACE){
                        $replace_has_backref = ($replace =~ m: (?<!\\) \$ 
(\d+):x);
                      }else{
                        $replace_has_backref = 1;
                      }

                      if ($global) {
                        if (!$replace_has_backref){
                          $str =~ s/$search/$replace/g;

                          if ($str ne $actual) {
                            die "Unequal1 results!\n|$str|\n|$actual|";
                          }

                          return $str;
                        }
                        $str =~ s{$search}{
                          my @start = @-;
                          my @end   = @+;
                          my $copy  = $replace;
                          $copy =~ s{ (?<!\\) \$ (\d+) }{
                            ($1 > $#start || $1 == 0) ? '' : substr($str, 
$start[$1],
                                                                    $end[$1] - 
$start[$1]);
                          }exg;
                          $copy;
                        }eg;
                      } else {
                        if (!$replace_has_backref){
                          $str =~ s/$search/$replace/;

                          if ($str ne $actual) {
                            die "Unequal1 results!\n|$str|\n|$actual|";
                          }
                          return $str;
                        }

                        $str =~ s{$search}{
                          my @start = @-;
                          my @end   = @+;
                          my $copy  = $replace;
                          $copy =~ s{ (?<!\\) \$ (\d+) }{
                            ($1 > $#start || $1 == 0) ? '' : substr($str, 
$start[$1],
                                                                    $end[$1] - 
$start[$1]);
                          }exg;
                          $copy;
                        }e;
                      }


                      if ($str ne $actual) {
                        die "Unequal1 results!\n|$str|\n|$actual|";
                      }


                      return $str;
                    },








                    'replace_norm' => sub{
                      my ($str, $search, $replace, $global) = ($text1, 
$pattern1, $replace1, $global1);
                      if ($current_test eq 'test1'){ # these if statements will 
slow this down a bit
                        if ($global){
                          $str =~ s/$search/$1 josh $1/g;
                        }else{
                          $str =~ s/$search/$1 josh $1/;
                        }
                      }elsif($current_test eq 'test2'){
                        if ($global){
                          $str =~ s/$search/a josh b/g;
                        }else{
                          $str =~ s/$search/a josh b/;
                        }
                      }else{
                        die "Entry needed in replace_norm";
                      }

                      if ($str ne $actual) {
                        die "Unequal results!\n|$str|\n|$actual|";
                      }

                      return $str;
                    },








                    # this one seems to work pretty good, but requires a no 
strict 'refs' section.
                    replace_josh_new2 => sub{
                      my ($str, $search, $replace, $global) = ($text1, 
$pattern1, $replace1, $global1);

                      if (!defined($global)) {
                        $global = 1; #match globally by default.
                      }

                      $replace = '' unless defined $replace;
                      return $str unless defined $str and defined $search;

                      my %ids_to_store = map {$_ => 1} ($replace =~ /\$(\d+)/);
                      if (scalar(keys %ids_to_store) == 0){
                        if ($global) {
                          $str =~ s/$search/$replace/g;
                        }else{
                          $str =~ s/$search/$replace/;
                        }
                      }

                      no strict 'refs';
                      if ($global) {
                        $str =~ s!$search!
                          foreach (keys %ids_to_store) {
                            $ids_to_store{$_} = ${$_}; # use symbolic ref
                          }

                          my $template = $replace;
                          $template =~ s/\$(\d+)/$ids_to_store{$1} || ''/eg;
                          $template;
                        !ge;
                      } else {
                        $str =~ s!$search!
                          foreach (keys %ids_to_store) {
                            $ids_to_store{$_} = ${$_}; # use symbolic ref
                          }

                          my $template = $replace;
                          $template =~ s/\$(\d+)/$ids_to_store{$1} || ''/eg;
                          $template;
                        !e;
                      }
                      use strict 'refs';


                      if ($str ne $actual) {
                        die "Unequal results!\n|$str|\n|$actual|";
                      }

                      return $str;
                    },












                    # This is a pretty crazy one that tries to iterate through 
the replace character
                    # by character and do replaces. It also requires no strict 
'refs' to be on.
                    # This one kind of sucks compared to the rest, but I left 
it in case anyone got
                    # got any ideas from it.
                    'replace_josh_new' => sub {
                      my ($str, $search, $replace, $global) = ($text1, 
$pattern1, $replace1, $global1);
                      if (!defined($global)) {
                        $global = 1; #match globally by default.
                      }

                      $replace = '' unless defined $replace;
                      return $str unless defined $str and defined $search;

                      my $replace_has_backref;
                      if(CHECK_FOR_BACK_REFS_IN_REPLACE){
                        $replace_has_backref = ($replace =~ m: (?<!\\) \$ 
(\d+):x);
                      }else{
                        $replace_has_backref = 1;
                      }


                      no strict 'refs';
                      if ($global) {
                        if (!$replace_has_backref){
                          $str =~ s/$search/$replace/g;

                          if ($str ne $actual) {
                            die "Unequal1 results!\n|$str|\n|$actual|";
                          }

                          return $str;
                        }

                        $str =~ s{$search}{
                          my $template = $replace;
                          my $len = length($template);

                          my $i = 0;
                          my $start;
                          my $end;

                          #print "length is: $len\n";

                          while($i < $len){ # do decrement before, cause we 
can't do anything if there is only one character.
                            $start = $i++;
                            my $char = substr($template, $start, 1);
                            #print "char is: $char. Start is: $start. i is: 
$i\n";
                            if ($char ne "\$"){
                              next;
                            }

                            #print "Found \$ char at $start.\n";

                            my $digit_string = '';
                            while($i <= $len){
                              $char = substr($template, $i, 1);
                              if (!$is_digit{$char}){
                                #print "HERE. char is: '$char'. i is $i\n";
                                last;
                              }

                              $i++;
                              $end = $i;
                              $digit_string .= $char;
                            }

                            if (!$digit_string){
                              next;
                            }

                            my $saved_string = ${$digit_string} || ''; #using 
symbolic ref here.
                            substr($template, $start, $end - $start) = 
$saved_string;
                            $i = $i + length($saved_string) - ($end-$start);
                            $len = length $template;
                          }

                          $template;
                        }ge;


                      } else {
                        if (!$replace_has_backref){
                          $str =~ s/$search/$replace/;

                          if ($str ne $actual) {
                            die "Unequal1 results!\n|$str|\n|$actual|";
                          }

                          return $str;
                        }

                        $str =~ s{$search}{
                          my $template = $replace;
                          my $len = length($template);

                          my $i = 0;
                          my $start;
                          my $end;

                          #print "length is: $len\n";

                          while($i < $len){ # do decrement before, cause we 
can't do anything if there is only one character.
                            $start = $i++;
                            my $char = substr($template, $start, 1);
                            #print "char is: $char. Start is: $start. i is: 
$i\n";
                            if ($char ne "\$"){
                              next;
                            }

                            #print "Found \$ char at $start.\n";

                            my $digit_string = '';
                            while($i <= $len){
                              $char = substr($template, $i, 1);
                              if (!$is_digit{$char}){
                                #print "HERE. char is: '$char'. i is $i\n";
                                last;
                              }

                              $i++;
                              $end = $i;
                              $digit_string .= $char;
                            }

                            if (!$digit_string){
                              next;
                            }

                            my $saved_string = ${$digit_string} || ''; #using 
symbolic ref here.
                            substr($template, $start, $end - $start) = 
$saved_string;
                            $i = $i + length($saved_string) - ($end-$start);
                            $len = length $template;
                          }

                          $template;
                        }e;

                      }
                      use strict 'refs';

                      if ($str ne $actual) {
                        die "Unequal results!\n|$str|\n|$actual|";
                      }

                      return $str;
                    },











                    # limitations -- only 9 backreferences
                    'replace_josh' => sub {
                      my ($str, $search, $replace, $global) = ($text1, 
$pattern1, $replace1, $global1);
                      if (!defined($global)) {
                        $global = 1; #match globally by default.
                      }

                      $replace = '' unless defined $replace;
                      return $str unless defined $str and defined $search;

                      my $replace_has_backref;
                      if(CHECK_FOR_BACK_REFS_IN_REPLACE){
                        $replace_has_backref = ($replace =~ m: (?<!\\) \$ 
(\d+):x);
                      }else{
                        $replace_has_backref = 1;
                      }


                      if ($global) {
                        if (!$replace_has_backref){
                          $str =~ s/$search/$replace/g;

                          if ($str ne $actual) {
                            die "Unequal1 results!\n|$str|\n|$actual|";
                          }

                          return $str;
                        }


                        $str =~ s!$search!
                          my @vals = ($1, $2, $3, $4, $5, $6, $7, $8, $9);
                        my $template = $replace;
                        $template =~ s/\$(\d+)/$vals[$1 - 1] || ''/eg;
                        $template;
                        !ge;
                      } else {
                        if (!$replace_has_backref){
                          $str =~ s/$search/$replace/;

                          if ($str ne $actual) {
                            die "Unequal1 results!\n|$str|\n|$actual|";
                          }

                          return $str;
                        }


                        $str =~ s!$search!
                          my @vals = ($1, $2, $3, $4, $5, $6, $7, $8, $9);
                        my $template = $replace;
                        $template =~ s/\$(\d+)/$vals[$1 - 1] || ''/eg;
                        $template;
                        !e;
                      }

                      if ($str ne $actual) {
                        die "Unequal results!\n|$str|\n|$actual|";
                      }

                      return $str;
                    },












                    #limitations: when there are 10 backrefs and you use $10, 
we leave that in, and it
                    # gets used as $1 later on. It should be whacked.
                    'replace_substr' => sub {
                      my ($str, $search, $replace, $global) = ($text1, 
$pattern1, $replace1, $global1);
                      my ($matched, $after, $backref, @start, @end);
                      my $result = '';

                      $global = 1 unless defined $global;

                      my $replace_has_backref;
                      if(CHECK_FOR_BACK_REFS_IN_REPLACE){
                        $replace_has_backref = ($replace =~ m: (?<!\\) \$ 
(\d+):x);
                      }else{
                        $replace_has_backref = 1;
                      }
                      if (!$replace_has_backref){
                        if ($global){
                          $str =~ s/$search/$replace/g;
                        }else{
                          $str =~ s/$search/$replace/;
                        }

                        if ($str ne $actual) {
                          die "Unequal1 results!\n|$str|\n|$actual|";
                        }

                        return $str;
                      }


                      while ($str =~ m/$search/) {
                        if ($#- == 0) {
                          # no captured groups so do a simple search and replace
                          if ($global) {
                            $str =~ s/$search/$replace/g;
                          } else {
                            $str =~ s/$search/$replace/;
                          }
                          last;
                        }

                        # extract the bit before the match, the match itself, 
the
                        # bit after and the positions of all subgroups
                        $result .= substr($str, 0, $-[0]) if $-[0];
                        $matched = substr($str, $-[0], $+[0] - $-[0]);
                        $after   = substr($str, $+[0]);
                        @start   = @-;
                        @end     = @+;

                        # do the s/// leaving the placeholders (literally '$1' 
etc) in place
                        $matched =~ s/$search/$replace/;

                        # then replace the $1, $2, etc., placeholders in 
reverse order
                        # to ensure we do $10 before $1

                        for (my $i = $#start; $i; $i--) {
                          $backref = substr( $str, $start[$i], $end[$i] - 
$start[$i] );
                          $matched =~ s/\$$i/$backref/g;
                        }

                        # add the modified $matched output to the result and 
loop if global
                        $result .= $matched;
                        $str    = $after;
                        last unless $global && length $str;
                      }
                      my $ret = $result . $str;
                      if ($ret ne $actual) {
                        die "(replace_substr) Unequal 
results!\n|$ret|\n|$actual|";
                      }
                      return $ret;
                      #return $result . $str;
                    },

                   }
         );

Re: [Templates] Vmethods to add to 2.15

Reply via email to