Hello Nick, On Monday, March 10, 2003 at 18:37 GMT +0000, legend has it that Nick Dutton [ND] recited the incantation:
ND> Success! I'd managed to convince myself that print_recipient2 wasn't ND> getting called - so I was pleasantly surprised to find that your new one ND> works a treat. Well, I'm confused, but glad it works. ND>>> I'm no regex expert, but I'm clear on what's going on up until ND>>> line 3 of print_recipient - please, no laughing! Ok, here is a short summary. I'm only going to analyze one regexp, but three are basically the same, just the subpatterning is *slightly* different. ,----- [ Begin ] ----- | %IF:'%- | %SETPATTREGEXP=$(?i)^\d*\n\s*((\"?)(.*?)\2\s*(\<.*?\>)?\s*[;,]\s*)$%- | %REGEXPMATCH=$%COMMENT$'<>'':'%- `----- [ End ] ----- This segment just looks to see if there is anything of interest in the comment field. ,----- [ Begin ] ----- | %SETPATTREGEXP="\s{%- | %-%-%SETPATTREGEXP=#^\d*#%- | %-%-%REGEXPMATCH=#%COMMENT#}"%- | %REGEXPMATCH=" "%- `----- [ End ] ----- This is a nested set of regexps. The outer one is trying to match a certain number of spaces. It is the inner regexp which actually finds (from the comment field) how many spaces it should look for. This is how the variable indenting is done. The outer pattern being constructed is, \s{##}, where ## is some user specified number. This just looks for ## whitespace characters. The inner regexp is ^\d* which just looks for as many digits at the beginning of the string as possible. ,----- [ Begin ] ----- | %SETPATTREGEXP=$(?i)^\d*\n\s*(((\"?)(.*?)\3\s*(\<.*?\>)?)\s*[;,]\s*)?$%- | %REGEXPBLINDMATCH=$%COMMENT$%- | %SUBPATT=$2$ `----- [ End ] ----- This section finds and outputs the first entry from the list. ,----- [ Begin ] ----- | %COMMENT=_%- | %-%-%SETPATTREGEXP=$(?i)^(\d*\n)\s*((\"?)(.*?)\3\s*(\<.*?\>)?\s*[;,]\s*)?(.*)$%- | %-%-%REGEXPBLINDMATCH=$%COMMENT$%- | %-%-%SUBPATT=$1$%- | %-%-%SUBPATT=$6$_%- `----- [ End ] ----- This section finds the number of spaces, and everything after the first entry in the list and sets the comment field to this new value. So in effect we've just removed the first list item. ,----- [ Begin ] ----- | %QINCLUDE="print_recipient2"'%- `----- [ End ] ----- Lather. Rinse. Repeat. The regexp that makes this whole thing tick: ,----- [ Begin ] ----- | (?i)^(\d*\n)\s*((\"?)(.*?)\3\s*(\<.*?\>)?\s*[;,]\s*)?(.*) `----- [ End ] ----- (?i) - Make search case INsensitive. It's not really needed in this expression. ^(\d*\n) - Look for all digits on the first line (that's where we set the number of spaces to be used for indenting). In this case we're storing that number in subpattern 1. \s* - Ignore excess space at the beginning of the first entry. Now we have a little more complex section. The section itself consists of the following fragment which is where we extract the first entry from the list. This entry (if found) is stored in subpattern 2. ,----- [ Begin ] ----- | ((\"?)(.*?)\3\s*(\<.*?\>)?\s*[;,]\s*)? `----- [ End ] ----- More specifically: ( - Start subpattern 2. (\"?) - See if there is an opening quote mark and store the result in subpattern 3. If there is no quote mark, subpattern 3 will be empty. Capturing this seemingly insignificant character in it's own subpattern is useful later. (.*?) - Find the minimum number of any characters that will allow the rest of the regexp to be matched. Store these characters in subpattern 4. Note this is the name portion if one exists. \3 - Find the same thing as subpattern 3. So this requires us to have a closing quote mark if there was an opening quote mark. If there wasn't one, then this command does nothing. An example where this is important (I don't know if this is RFC approved, but...): "Someone <[EMAIL PROTECTED]>" <[EMAIL PROTECTED]> This can be matched two ways if we didn't require the closing quote. Remember, the alternative was \"? which looked for the existence of a quote mark, but it wasn't required, even if there was an opening quote mark. \s* - Find (and ignore) any whitespace between the name and address. Note that these spaces will appear in subpattern 2. (\< - In subpattern 5 find a less than symbol (ie the opening to an address.) .*? - Find the minimum number of any characters. \> - Find a greater than symbol (ie the closing of an address). )? - Find 1 or 0 instances of the address. We want this because sometimes we'll only have the bare e-mail address. In that case, subpattern 4 (the name subpattern) already captured the interesting bit. \s* - Ignore excess whitespace [;,]\s* - Find the list delimiter (a colon or a comma) and ignore any trailing whitespace. )? - Find one or 0 instance of subpattern 2. In other words, it can find or ignore the first item in the list. In all honesty, I don't remember why this was important, but with a little thought one could probably work it out. Back to the last part of the main expression: (.*) - Find the maximum number of any characters that will allow the rest of the pattern to work and store them in subpattern 6. This is to capture all the remaining items on the list so they can be passed on to the next round. Ok, so maybe that wasn't so much of a short summary as it was a detailed analysis, but hopefully it was useful. I hope that helps clear it up a bit. As I said, this pattern is used 3 times, though the division of the subpatterns is a bit different depending on what specifically needs to be extracted. -- Thanks for writing, Januk Aggarwal ________________________________________________________ Current version is 1.61 | "Using TBTECH" information: http://www.silverstones.com/thebat/TBUDLInfo.html