Hello Peter, On Friday, October 31, 2003 at 15:23 GMT +0100, audiences applauded as Peter Fjelsten [PF] announced:
PF> I have figured this out: In the template for saving the file I have made PF> it so the date is saved in YYYYMMDD. It can be extracted with PF> [The order date] = (^Date:\s{4}).{8} Ok, so you've got this part. P>> [Multi-line comment field, including blank lines, that may or may not be P>> there] PF> PF> This poses a problem. It could look like this: It is not really a problem if you remember that you can use the other anchors. I'll use your notation and assume that you know how to figure out subpatterning and TB's macros to get it working in a template. [Comment] = (?ism)^(Date Ordered|Ordre modtaget):[^\n]*\n\s*(.*?)\s*\n(Products|Produkter): P>> Products#3 P>> ------------------------------------------------------ P>> [Number ordered] x Single tank adapter ([Item model name]) = 280dkk PF> PF> I have: PF> [Number ordered] = ^\d+(\sx\s) PF> [Item model name] = (\d{1,2}\sx\s.+)(\s\().+(\)\s) Yes, except that your subpatterning is not uniquely capturing the part of the string that you want. Also, have you put some sort of anchors to make sure your regexps aren't confused by similar text elsewhere in the message? Also, how are you dealing with the fact that there could be an arbitrary number of models ordered? If you haven't thought about it, you'll be best off using a recursive engine for this part. P>> Sub-Total#4: 3.640dkk P>> [Shipping method] (Shipping (5-7 days) to NO : 9.72 kg): [Shipping P>> Price] PF> PF> [Shipping method] = (Sub(-Total|total).+\n).+(\s\(\(D+) PF> [Shipping Price] = (Sub(-Total|total).+\n)(.*:\s).+(dkk$) Does this work for you? P>> Total: 3.865dkk P>> {3} PF> PF> How can I test for the text strings "Moms" or "DK moms/VAT" at the PF> position {3} and use that later? Well, your best bet is to capture that string and put it into a variable (ah the pride and joy of v2.0x). Then when you want to use the condition, use a %IF statement. PF> = (Delivery Address|Leveringsadresse)(.*\n)(-+\n)(^.+\n)(^.+\n)^.+ PF> P>> [Delivery Address3, if applicable] PF> PF> I don't know how to end the extraction of these addresses - how many of PF> them are there are and stop at the right time. I suspect that I can use PF> the Delivery Post Code as a stop clause, but how to do this, i.e. the PF> recursive element escapes me. Well, you can do them all at once by looking at the subpatterns. If you hard code it, then you must choose the maximum number of lines, ie how many times you repeat the .+\n part. For this, I don't think you'll be best served by a recursive engine. Your entire form is probably generated by a bot/webpage. So presumably the address lines aren't completely arbitrary. I'd use something like: (?i-s)(Delivery Address|Leveringsadresse)(.*\n)(-+\n)(.+\n)?(.+\n)?(.+\n)?(.+\n)?((\D{1,2}.\d{3,6})|\d{3,6})\s(.*?)\s*\n(.*?)\s*\n This looks ugly, it is horribly long, but it should* get all the address info at once. So: Name -> Subpatt 4 Add1 -> Subpatt 5 Add2 -> Subpatt 6 Add3 -> Subpatt 7 Post Code -> Subpatt 8 City -> Subpatt 10 Country -> Subpatt 11 * untested... The subpatterning should be right, but I recommend creating a test template with the regexp above, then a list of subpatterns below. Ie, something like: =====[Begin regexp test template]===== %IF:'%_Text'='':'%_Text="%ClipBoard"'%- %setpattregexp='...'%- %RegExpBlindMatch='%_Text'%- %_Text%- SubPatt 0 = <%SUBPATT="0"> 1 = <%SUBPATT="1"> 2 = <%SUBPATT="2"> 3 = <%SUBPATT="3"> 4 = <%SUBPATT="4"> %REM=' 5 = <%SUBPATT="5"> 6 = <%SUBPATT="6"> 7 = <%SUBPATT="7"> 8 = <%SUBPATT="8"> 9 = <%SUBPATT="9"> 10 = <%SUBPATT="10"> 11 = <%SUBPATT="11"> 12 = <%SUBPATT="12"> 13 = <%SUBPATT="13"> 14 = <%SUBPATT="14"> 15 = <%SUBPATT="15"> 16 = <%SUBPATT="16"> 17 = <%SUBPATT="17"> 18 = <%SUBPATT="18"> 19 = <%SUBPATT="19"> 20 = <%SUBPATT="20"> ' =====[ End regexp test template]===== Obviously you need to move the %Rem around to expose as many subpatterns as you need. I recommend exposing a couple more than you expect, just in case you counted wrong. P>> [Delivery Post code]{4} PF> PF> = (\n)\D{1,2}.\d{3,6}(\s)|(\n)\d{3,6}(\s) - except I have a problem PF> here. This will not find "SE-123 23". Can anybody help? Your expression is almost there, just some minor changes as I've included in the expression above. P>> [Delivery City] <snip> P>> [Delivery country] See above. P>> Billing Address#6 Copy the expression above, just change the anchor names. PF> I don't know how to make a QT that takes the RegExp above and saves it PF> in variables so it can be formatted as below. Well, your best bet is to do a whole bunch of smaller regexps when you need them. Otherwise your expression will get unwieldy and buggy. PF> In other words - I need the "shell" for all my little extractions - the PF> main QT that handles all of this. Most of them should use: FieldID=%- %SetPattRegexp="..."%- %RegexpBlindMatch="%Text"%- %Subpatt="..." There are a couple where you need more sophisticated templates, so just put the formatting/recursive templates in QTs and use: FieldID=%QInclude="..." <snip> PF> This is a problem. How do I make a statement: PF> PF> If "Moms" or "DK moms/VAT" is present in the order PF> Then make a FRAGTMOMSPLIGTIG=X, and let X be 0.8*[Shipping PF> Price] 1) PF> PF> If "Moms" or "DK moms/VAT" is NOT present in the order PF> Then make a FRAGTMOMSFRI=[Shipping Price] 1) Use %IF and %CALC. You can extract just the number part for use with %Calc, and use another subpattern to capture any other text. In the test condition for %IF, do the search for the "Moms" or "DK..." strings. P>> <VARE> P>> ANTAL=[Number ordered] PF> P>> VARENUMMER=[Item model name] PF> PF> How do I generate a <VARE> with ANTAL=^\d+(\sx\s) and PF> VARENUMMER=(\d{1,2}\sx\s.+)(\s\().+(\)\s) for each instance of ANTAL? You'll have to generate this section with a recursive template. The "easiest" thing to do is to grab all the items with one regexp, then feed that string to the recursive template. The recursive part can then analyze each line, pulling out the number and model name while adding the repeating tags. PF> Also I need to be able to distinguish the different orders in the mail. PF> I need to format each <ORDRE> within the scope of each mail saved to PF> disc: these can be limited by ===-===-===... to start each mail and PF> ---=---=--- to end each mail - and consequently <ORDRE>. How do I do PF> that? I'm not sure I follow. Are you using a filter to extract all this info from a message and append it to a file which would then contain all orders? If so, then just add those delimiters to the top and bottom of your template as necessary. PF> I am very proud of myself for having reached so far - the visual RegExp PF> 3.0 really helps, but now I am probably at my wits' end - please PF> somebody help me! You're doing well. I recommend using my template above to test your regexps with TB. Give it a simple name, like regexp, then insert it into a new message by typing: regexp<ctrl><space> Depending on which macros you want to test for %RegexpBlindMatch, you might want to reply to the message you want to process. Just remember not to send out these test messages. ;-) Good luck! -- Thanks for writing, Januk Aggarwal ________________________________________________________ http://www.silverstones.com/thebat/TBUDLInfo.html