[Note to moderator: I'm not able to use my normal e-mail address ([EMAIL 
PROTECTED]) right now, so please pass me through just once -- Thanks :)]

On Tue, 5 Aug 2003 11:22:24 -0400 (EDT)
Norman Tuttle <[EMAIL PROTECTED]> wrote:

> Dear fellow Flood developers:
[...]
> 2) The following questions, regarding regex and responsetemplate, are
> addressed to the Flood developers on the list, and those familiar with
> regex (PCRE) library and regular expressions:
> 
> We're trying to use the responses coming back from the websites to fill
> variables which can be used in future expressions sent back to the
> website. While I understand that regular expressions should be able to
> represent an expression to match including parts which represent a
> variable or optional piece of information, I am not so familiar with how
> Flood is using this information to feed the responsename value. For one, I
> am not sure why this information is picked up by match[1],

Most regexp packages works that way. The first item in matches table (that is: 
match[0]) is the portion of the input string that was used for matching. You 
can assume that everytime it is exact replica of input string. Personally I 
have no idea what this is good for.

> why there is an nmatch value of 10 passed to the regexec function when 
> picking up the
> single matched value (when we are only matching for one responsename),

I have absolutelly no idea. The result is that if there are more matches, they 
end up in match[2], match[3] and so on, but as you can tell from the following 
code -- only the first match is used. Maybe Justin or Aaron can shed some light 
on this...

> why a value of 2 for the initial variable name-pattern pattern match,

it just makes sure, that regexec returns array with 2 elements:

match[0] -- entire string
match[1] -- first match

So even when your regexp matches twice or more -- regexec returns only first 
match. Because of that users are required to finetune their regexp's, so that 
they are unique across whole response.

> and what the other match[] members might represent.

simply, next matches. If your regexp is quite generic it might match more than 
one time. Look at regexp from round-robin-dynamic.xml (XML specific encoding 
stripped)

/<a href="([^"]*)">/

given some HTML you can have:

match[0] = ... // whole HTML
match[1] = http://www.apache.org/
match[2] = http://cvs.apache.org/
match[2] = http://perl.apache.org/

and so on, you get the idea...

Personally I think that regexp matching code schould be rewritten at some 
point, so that you can have any number of regexps against response, and every 
match turned into flood variable. However, this is more work with config file 
than with the code. Because of XML restrictions (attribute names must be 
unique) we have to change url element to something like this:

<url>
   <address>http://www.example.com/</address>
   <postprocess>
      <regexp>
         <pattern>&lt;a href=&quot;([^&quot]*);&quot;&gt;</pattern>
         <matches>
            <var>first_match</var>
            <var>second_match</var>
            <!-- any number of variables -->
         </matches>
      </regexp>
      <regexp>
         <!-- another regexp -->
      </regexp>
   </postprocess>
</url>

This is however serious change (breaks existing configs), so it schould be 
scheduled for major flood rewrite (like flood 2.0 with apr-serf on board :)

regards,
Jacek Prucia

Reply via email to