If your input file is XHTML, or can be transformed into XHTML (I'm sorry - not been following the thread), then you can use XMLTask and specify the XPath to the element you wish to change/insert at.
http://www.oopsconsultancy.com/software/xmltask Brian On Wed, November 29, 2006 05:07, George Bills wrote: > Thanks Gilbert - except that doesn't work when tables span more than one > line. "byline true" splits the text into multiple tokens, and the regex > is applied independently to each token. So if the start of the > expression (<table>) is on line one, the middle of the expression > (<tr>blah</tr>...etc) is on line two, and the end of the expression > (</table>) is on line three, then no one individual line / token > matches, so nothing comes out (correct me if I'm wrong, but that's what > seemed to happen in my testing). If byline is false, the entire text is > one big token - so if I match the token, I get the entire token (the > original input) back. Also, I wanted the entire table, not just the > contents. I tried using "replace="\0"", but that just means that within > the token I'm replacing the matching text with the matching text - not > very useful. > > What I really wanted was a way of saying "give me the matching text and > only the matching text, not the token that matches". I sort of solved it > by writing a regular expression to match the entirety of input: > "(.*?)(<table[^/<]*class="summary"[^/>]>(.*?)</table>)" . With the Ant > encoding that ends up as: > "(.*?)(<table[^</]*class="summary"[^</]*>(.*?)</table>)(.*)". > I don't know that each table will only take one line of input (in fact, > they won't), but since I know that there's only one table in each input > file, I can match the entire file and use "replace="\2"" to replace the > entire match (all input) with the second matching group (the table). > > So, that works for one file. The problem I have now is getting it to > work for multiple files - each file that I concatenate has exactly one > summary table that I want to extract and place in a single HTML summary > file. I tried: > (A) Concatenating (<concat>) all of the files and applying a filterchain > - but the filterchain filters all the input once, not once per file. So > I concatenate the files first, then apply the regex - which means I only > get the one matching table from the entire concatenation, not one > matching table from each file that I concatenate. > (B) Copying (<copy>) all of the files to a single file - in this case, > the filterchain extracts the individual tables from each file - but I > only end up with one file, because I can't make it concatenate them all > to one destination (even with a mergemapper). "enablemultiplemappings" > doesn't seem to help. > > If there was some way of saying "for each file, apply the transform > *before* concatenating, not after", then that would work, but as far as > I can see, there isn't. Any ideas? > > Rebhan, Gilbert wrote: >> Hi, >> >> <target name="depends"> >> <echo file="Y:/test.html"> >> <![CDATA[ >> <html> >> <head> >> <title>summary</title> >> <link rel="stylesheet" href="summary.css" type="text/css"> >> </head> >> <body> >> <a name="overview"></a> >> <center> >> <table class="summary"> was wrong </table> >> </center> >> </html> >> ]]> >> </echo> >> </target> >> >> <target name="main" depends="depends"> >> >> <loadfile srcfile="Y:/test.html" property="summary"> >> <filterchain> >> <containsregex >> pattern='<table[^</]*>(.*?)</table>' >> replace="\1" >> byline="true" >> /> >> <tokenfilter> >> <!-- to get rid of whitespace in ${summary} --> >> <trim/> >> </tokenfilter> >> </filterchain> >> </loadfile> >> >> <echo>Summary == ${summary}</echo> >> >> </target> >> >> gives only the text = >> >> depends: >> main: >> [echo] Summary == was wrong >> BUILD SUCCESSFUL >> Total time: 407 milliseconds >> >> >> you have to use \1 and byline=true >> >> Regards, Gilbert >> >> -----Original Message----- >> From: George Bills [mailto:[EMAIL PROTECTED] >> Sent: Tuesday, November 28, 2006 6:14 AM >> To: Ant Users List >> Subject: Re: containsregex and concat >> >> Thanks: the regular expression works now, which is progress. >> Unfortunately I'm getting all of the concatenated text, not just the >> matching text. If I use replace: >> <filterchain> >> <!--<tokenfilter><filetokenizer />--> >> <containsregex flags="isg" >> pattern="${summary.regex}" >> replace="SUMMARYTABLE" >> byline="false" <!-- implies filetokenizer --> >> /> >> <!-- </tokenfilter>--> >> </filterchain> >> >> I end up getting something like: >> [concat] <html> >> [concat] <head> >> [concat] <title>summary</title> >> [concat] <link rel="stylesheet" href="summary.css" type="text/css"> >> [concat] </head> >> [concat] <body> >> [concat] <a name="overview"></a> >> [concat] <center> >> [concat] SUMMARYTABLE >> [concat] </center> >> [concat] ...more HTML here... >> [concat] </html> >> >> I'm assuming it's because the file is just one big token - but if I use >> a line tokenizer, will I be able to match regular expressions over >> multiple lines? >> >> Thanks for the help. >> >> Rebhan, Gilbert wrote: >> >>> Hi, >>> >>> <table[^>/]*>(.*?)</table> >>> >>> should match : >>> >>> <table class="summary">foobar</table> >>> >>> also with more than one attribute >>> >>> <table class="summary" foo="bar">foobar</table> >>> >>> >>> foobar is /1 (group 1) >>> >>> >>> Regards, Gilbert >>> >>> >>> -----Original Message----- >>> From: George Bills [mailto:[EMAIL PROTECTED] >>> Sent: Monday, November 27, 2006 6:41 AM >>> To: Ant Users List >>> Subject: Re: containsregex and concat >>> >>> Hrm, it probably isn't since advanced regexs are still black magic to >>> me. The "." was supposed to match any character, including a newline >>> (with the s flag), the * to say match 0-n of them and the ? to say be >>> lazy, match as little as possible (so that I don't pull in >>> <table>...</table><table>...</table> in one match). >>> >>> I just tried [^<], but it doesn't seem to work - I think because of >>> >> such >> >>> things as "<table><tr>...</tr></table>" - the opening bracket of <tr> >>> conflicts. I tried [.<>]*? to make sure that the "regex.body" >>> >> part >> >>> was matching the brackets, but that didn't work either. >>> >>> Also, <table class="summary"> was wrong - <table class="summary"(.*?)> >>> >> >> >>> is a little better since the tables can have more than the class >>> attribute (in fact, all of them do). But after changing that I'm >>> matching the entire document - <html> through to </html>. That might >>> just be because I'm using filetokenizer - if I make one match within >>> filetokenizer, do I end up getting the entire document? If so, how do >>> >> I >> >>> get only the matching text? >>> >>> Regex is now: <table class="summary".*?>.*?</table> >>> >>> Thanks for the help, I appreciate it. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Brian Agnew http://www.oopsconsultancy.com OOPS Consultancy Ltd Tel: +44 (0)7720 397526 Fax: +44 (0)20 8682 0012 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
