Hello Nick,

On Monday, March 10, 2003 at 18:37 GMT +0000, legend has it that Nick
Dutton [ND] recited the incantation:

ND> Success!  I'd managed to convince myself that print_recipient2 wasn't
ND> getting called - so I was pleasantly surprised to find that your new one
ND> works a treat.

Well, I'm confused, but glad it works.

ND>>> I'm no regex expert, but I'm clear on what's going on up until
ND>>> line 3 of print_recipient - please, no laughing!

Ok, here is a short summary.  I'm only going to analyze one regexp,
but three are basically the same, just the subpatterning is
*slightly* different.

,----- [ Begin ] -----
| %IF:'%-
| %SETPATTREGEXP=$(?i)^\d*\n\s*((\"?)(.*?)\2\s*(\<.*?\>)?\s*[;,]\s*)$%-
| %REGEXPMATCH=$%COMMENT$'<>'':'%-
`----- [  End  ] -----

This segment just looks to see if there is anything of interest in the
comment field.

,----- [ Begin ] -----
| %SETPATTREGEXP="\s{%-
| %-%-%SETPATTREGEXP=#^\d*#%-
| %-%-%REGEXPMATCH=#%COMMENT#}"%-
| %REGEXPMATCH="                                        "%-
`----- [  End  ] -----

This is a nested set of regexps.  The outer one is trying to match a
certain number of spaces. It is the inner regexp which actually
finds (from the comment field) how many spaces it should look for.
This is how the variable indenting is done.

The outer pattern being constructed is, \s{##}, where ## is some user
specified number.  This just looks for ## whitespace characters.

The inner regexp is ^\d* which just looks for as many digits at the
beginning of the string as possible.

,----- [ Begin ] -----
| %SETPATTREGEXP=$(?i)^\d*\n\s*(((\"?)(.*?)\3\s*(\<.*?\>)?)\s*[;,]\s*)?$%-
| %REGEXPBLINDMATCH=$%COMMENT$%-
| %SUBPATT=$2$
`----- [  End  ] -----

This section finds and outputs the first entry from the list.

,----- [ Begin ] -----
| %COMMENT=_%-
| %-%-%SETPATTREGEXP=$(?i)^(\d*\n)\s*((\"?)(.*?)\3\s*(\<.*?\>)?\s*[;,]\s*)?(.*)$%-
| %-%-%REGEXPBLINDMATCH=$%COMMENT$%-
| %-%-%SUBPATT=$1$%-
| %-%-%SUBPATT=$6$_%-
`----- [  End  ] -----

This section finds the number of spaces, and everything after the
first entry in the list and sets the comment field to this new value.
So in effect we've just removed the first list item.

,----- [ Begin ] -----
| %QINCLUDE="print_recipient2"'%-
`----- [  End  ] -----

Lather. Rinse. Repeat.

The regexp that makes this whole thing tick:

,----- [ Begin ] -----
| (?i)^(\d*\n)\s*((\"?)(.*?)\3\s*(\<.*?\>)?\s*[;,]\s*)?(.*)
`----- [  End  ] -----

(?i)     - Make search case INsensitive.  It's not really needed in
           this expression.

^(\d*\n) - Look for all digits on the first line (that's where we set
           the number of spaces to be used for indenting). In this
           case we're storing that number in subpattern 1.

\s*      - Ignore excess space at the beginning of the first entry.

Now we have a little more complex section.  The section itself
consists of the following fragment which is where we extract the first
entry from the list.  This entry (if found) is stored in subpattern 2.

,----- [ Begin ] -----
| ((\"?)(.*?)\3\s*(\<.*?\>)?\s*[;,]\s*)?
`----- [  End  ] -----

More specifically:

(        - Start subpattern 2.

(\"?)    - See if there is an opening quote mark and store the result
           in subpattern 3.  If there is no quote mark, subpattern 3
           will be empty.  Capturing this seemingly insignificant
           character in it's own subpattern is useful later.

(.*?)    - Find the minimum number of any characters that will allow
           the rest of the regexp to be matched.  Store these
           characters in subpattern 4.  Note this is the name portion
           if one exists.

\3       - Find the same thing as subpattern 3.  So this requires us
           to have a closing quote mark if there was an opening quote
           mark.  If there wasn't one, then this command does nothing.
           An example where this is important (I don't know if this is
           RFC approved, but...):
           "Someone <[EMAIL PROTECTED]>" <[EMAIL PROTECTED]>
           This can be matched two ways if we didn't require the
           closing quote.  Remember, the alternative was \"? which
           looked for the existence of a quote mark, but it wasn't
           required, even if there was an opening quote mark.

\s*      - Find (and ignore) any whitespace between the name and
           address.  Note that these spaces will appear in subpattern
           2. 

(\<      - In subpattern 5 find a less than symbol (ie the opening to
           an address.)

.*?      - Find the minimum number of any characters.

 \>      - Find a greater than symbol (ie the closing of an address).
 
)?       - Find 1 or 0 instances of the address.  We want this because
           sometimes we'll only have the bare e-mail address.  In that
           case, subpattern 4 (the name subpattern) already captured
           the interesting bit.

\s*      - Ignore excess whitespace

[;,]\s*  - Find the list delimiter (a colon or a comma) and ignore any
           trailing whitespace.

)?       - Find one or 0 instance of subpattern 2.  In other words, it
           can find or ignore the first item in the list.  In all
           honesty, I don't remember why this was important, but with
           a little thought one could probably work it out.

Back to the last part of the main expression:
           
(.*)     - Find the maximum number of any characters that will allow
           the rest of the pattern to work and store them in
           subpattern 6.  This is to capture all the remaining items
           on the list so they can be passed on to the next round.

Ok, so maybe that wasn't so much of a short summary as it was a
detailed analysis, but hopefully it was useful.  I hope that helps
clear it up a bit.  As I said, this pattern is used 3 times, though
the division of the subpatterns is a bit different depending on what
specifically needs to be extracted. 

-- 
Thanks for writing,
 Januk Aggarwal




________________________________________________________
 Current version is 1.61 | "Using TBTECH" information:
http://www.silverstones.com/thebat/TBUDLInfo.html

Reply via email to