resend with 78 columns: Re: Regexp compilation in mod_perl?

Len Walter Sat, 02 Jun 2001 03:12:28 -0700
whoops, Netscape automatically formatted my mail before.
Here's a more readable version.


Curtis,

thanks for your response. I didn't realise that the recompilation problem 
didn't apply unless I used /o.

I am actually trying to split some multiline data into single lines, so
the caret is intentional. The data is typed into a TEXTAREA, one url or 
string to a line. The url of the script when called looks like this:
http://fw/cgi-bin/searchweb.pl?PAGES=http%3A%2F%2Fwww.colossalrecords.com.au%
2Fnewrelease-page.htm%0D%0Ahttp%3A%2F%2Fwww.bonzairecords.com%2Fcatalogue.htm
&STRINGS=Rush%0D%0AHold+It&Search=Search (it's for a DJ friend of mine).

Since the data is "string\n\rstring" I figured I could use split /^/ to 
separate out the individual strings. There's probably an easier way to do it 
though... the split does work correctly, and both @strings and %content get 
filled apparently ok.

The strange thing is that it returns some results but not others that it
should have. For example, if I run it with 
PAGES=http://216.167.127.43/sall.htm
(appropriately escaped) and STRINGS=Spirit%0D%0ANitro it returns a hit on 
Nitro (as it should) but not on Spirit, which is also on the page (warning: 
that url is a 6MB HTML file). In fact there are about 20 strings in
STRINGS and although half a dozen of them are on that page, only the last 
string in the list returns a match. You can see why I thought it might have 
been a problem with the regex being cached in some way.

I've replaced the old script with the new one (less the escapes for the
carets) but it's still behaving the same way.

I also tried making the /$pattern/ a /$pattern/g but no effect...

BTW, you mentioned that if I don't use strict, the %content will get
leaked. Does that mean that use strict makes "my" vars pass out of scope 
after execution to be garbage collected?

Thanks for your help,
Len

Curtis Poe wrote:
> In the "mod_perl_traps" page, when it refers to regexes only being compiled
> once, it is specifically referring to regexes that use the /o modifier:
>
>     my $x =~ /$somevar/o;
>
> In regular Perl (and mod_perl), that causes the pattern to only be compiled
> once.  If the value of $somevar changes, the regular expression will still
> try to match against the old pattern.  This is a common problem.  In the
> mod_perl environment, the problem is that when using the /o modifier, the
> regex is still only being compiled once and subsequent requests to your
> script will still use the first regex pattern encountered, regardless of
> what you specify.  The mod_perl_traps page offers strategies to avoid this.
>  Since you are not using the /o modifier, this shouldn't apply to you.
>
> I ran your script from the command line and it works fine.  However, I did
> notice that you weren't using strict and this may be a source of some
> problems.  I am guessing that since you didn't use it, your %content hash
> has old data hanging around in subsequent invocations of the script.
> However, while this would be a memory leak, it shouldn't cause a problem.
>
> My suspician is that your "splits" may be an issue:
>
>     foreach my $line (split /^/, $textstr) {
>
> Since the caret "^" in the first position of a regex is an anchor to the
> beginning of the string, you are attempting to split on the beginning of
> the string.  If you must use the caret as a delimeter, try escaping it in
> the regex:
>
>     foreach my $line (split /\^/, $textstr) {
>
> Here's an example of the problem (sorry, I'm on a Win32 system so my
> command line perl looks funky):
>
> C:\>perl -e "$x=q/a^b^c/;@x=split/^/,$x;print $x[0];"
> a^b^c
>
> Notice that it wasn't split.  By escaping the caret, the split works fine:
>
> C:\>perl -e "$x=q/a^b^c/;@x=split/\^/,$x;print $x[0];"
> a
>
> If you have a caret in your params, this will cause your script to fail.
>
> Hope this helps!
>
> Cheers,
> Curtis Poe
>
> PS:  Here's a corrected version of your script with "strict" added.
>
> #!/usr/bin/perl -w
> # search for each of a number of strings in a number of web pages
> use strict;
> use CGI;
> require LWP::UserAgent;
>
> my $q = new CGI;
>
> my $textstr = $q->param('STRINGS');
> my $pages = $q->param('PAGES');
> my @strings;
> my $i = 0;
> my $ua = new LWP::UserAgent;
> my %content;
>
> print $q->header(-expires=>'-1d');
> print <<EOH;
> <html>
> <title>Search results</title>
> <body bgcolor=ffffff>
> <h1>Search results</h1>
> EOH
>
> foreach my $line (split /\^/, $textstr) {
>     chomp $line;
>     $strings[$i] = $line;
>     $i++;
> }
>
> foreach my $line (split /\^/, $pages) {
>     chomp $line;
>     my $request = new HTTP::Request(GET => $line);
>     print "Loading $line<br>";
>     my $response = $ua->request($request);
>     if ($response->is_success) {
>                 $content{$line} = $response->content;
>     } else {
>                 print "<b>Error: $line".$response->status_line."</b><br>";
>     }
> }
>
> print "<br>Searching<br>";
>
> foreach my $page (keys %content) {
>     print $page."<br>";
>     for ($i=0; $i <= $#strings; $i++) {
>                 # \Q deals with () in pattern
>                 if ($content{$page} =~ /\Q$strings[$i]/) {
>                     print '<blockquote>'.$strings[$i].'
> found</blockquote>'; }
>     }
> }
>
> print <<EOF;
> </body>
> </html>
> EOF
>
> =====
> Senior Programmer
> Onsite! Technology (http://www.onsitetech.com/)
> "Ovid" on http://www.perlmonks.org/
>
> __________________________________________________
> Do You Yahoo!?
> Get personalized email addresses from Yahoo! Mail - only $35
> a year!  http://personal.mail.yahoo.com/

-------------------------------------------------------
resend with 78 columns: Re: Regexp compilation in mod_perl?

Reply via email to