Chris,

After you extract the lines you are interested in, what do you want to do
with the data after that? are you delivering to another system? performing
more processing on the data?

Just want to make sure we fully understand the scenario so we can offer the
best possible solution.

Thanks,

Bryan

On Tue, Sep 8, 2015 at 2:43 PM, Christopher Wilson <wilson...@gmail.com>
wrote:

> I've moved the ball a bit closer to the goal - I enabled DOTALL Mode and
> increased the Capture Group Length to 4096.  That grabs everything from the
> first line beginning with "R" to some of the "S"'s.
>
> Having a bit of trouble terminating the regex though.
>
> Once I get that sorted I'll post the result, but I have to say that the
> capture group length could be problematic "in the wild".  In a perfect
> world you would know the length up front - but I can see plenty of cases
> where that's not going to be the case.
>
> -Chris
>
> On Tue, Sep 8, 2015 at 2:05 PM, Mark Payne <marka...@hotmail.com> wrote:
>
>> Agreed. Bryan's suggestion will give you the ability to match each line
>> against the regex,
>> rather than trying to match the entire file. It would result in a new
>> FlowFile for each line of
>> text, though, as he said. But if you need to rebuild a single file, those
>> could potentially be
>> merged together using a MergeContent processor, as well.
>>
>> ________________________________
>> > Date: Tue, 8 Sep 2015 13:03:08 -0400
>> > Subject: Re: ExtractText usage
>> > From: bbe...@gmail.com
>> > To: users@nifi.apache.org
>> >
>> > Chris,
>> >
>> > I think the issue is that ExtractText is not reading the file line by
>> > line, and then applying your pattern to each line. It is applying the
>> > pattern to the whole content of the file so you would need a regex that
>> > repeated the pattern you were looking for so that it captured multiple
>> > times.
>> >
>> > When I tested your example, it was actually extracting the first match
>> > 3 times which I think is because of the following...
>> > - It always puts the first match in the property base name, in this
>> > case "regex",
>> > - then it puts the entire match in index 0, in this case regex.0, and
>> > in this case it is only matching the first occurrence
>> > - and then all of the matches would be in order after that staring with
>> > index 1, which in this case there is only 1 match so it is just regex.1
>> >
>> > Another solution that might simpler is to put a SplitText processor
>> > between GetFile and ExtractText, and set the Line Split Count to 1.
>> > This will send 1 line at a time to your ExtractTextProcessor which
>> > would then match only the lines starting with 'R'.
>> > The downside is that all of the lines with 'R' would be in different
>> > FlowFiles, but this may or may not matter depending what you wanted to
>> > do with them after.
>> >
>> > -Bryan
>> >
>> >
>> > On Tue, Sep 8, 2015 at 12:12 PM, Christopher Wilson
>> > <wilson...@gmail.com<mailto:wilson...@gmail.com>> wrote:
>> > I'm trying to read a directory of .csv files which have 3 different
>> > schemas/list types (not my idea). The descriptor is in the first
>> > column of the csv file. I'm reading the files in using GetFile and
>> > passing them into ExtractText, but I'm only getting the first 3 (of 8)
>> > lines matching my first regex. What I want to do is grab all the lines
>> > beginning with "R" and dump them off to a file (for now). My end goal
>> > would be to loop through these grab lines, or blocks of lines, by regex
>> > and route them downstream based on that regex.
>> >
>> > Details and first 11 lines of a sample file below.
>> >
>> > Thanks in advance.
>> >
>> > -Chris
>> >
>> > NiFi version: 0.2.1
>> > OS: Ubuntu 14.01
>> > JVM: java-1.7.0-openjdk-amd64
>> >
>> > ExtractText:
>> >
>> > Enable Multiline = True
>> > Enable Unix Lines Mode = True
>> > regex = ^("R.*)$
>> >
>> >
>> > "H","USA","BP","20140502","9","D","BP"
>> > "R","1","TB","CLM"," "," ","3U"," ","47000","0","47000","0"," ","0","
>> > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","25000","25000","
>> > ","650","F","D","D","6"," "," "," ","1:20PM ","1:51PM ","0122"," ","Clm
>> > 25000","Fast","","16","87","
>> > ","","","64","117.39","2266","4648","11129","0","0","
>> > ","","112089","Good","Cloudy","","","Y"
>> > "R","2","TB","CLM"," ","B","3U"," ","34000","0","34000","0"," ","0","
>> > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","25000","25000","
>> > ","600","F","D","D","7"," "," "," ","1:51PM ","2:22PM ","0151"," ","Clm
>> > 25000N2L","Fast","","16","79","
>> > ","","","64","112.36","2444","4803","10003","0","0","
>> > ","","261868","Poor","Cloudy","","","Y"
>> > "R","3","TB","STK","S"," ","3U","
>> > ","100000","0","100000","0","A","100000"," ","0"," ","0"," ","0","
>> > ","0"," ","0"," ","0","0","0"," ","600","F","D","D","6"," ","Affirmed
>> > Success S.","AfrmdScsB","2:22PM ","2:53PM ","0222","
>> > ","AfrmdScsB100k","Fast","","16","88","
>> > ","","","64","110.54","2323","4618","5810","0","0","
>> > ","","259015","5","Clear","","","Y"
>> > "R","4","TB","MCL"," "," ","3U"," ","49200","0","49200","0"," ","0","
>> > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","40000","40000","
>> > ","850","F","D","D","8"," "," "," ","2:53PM ","3:24PM ","0253"," ","Md
>> > 40000","Fast","Y","30","72","
>> > ","","","64","145.58","2425","4829","11358","13909","0","
>> > ","","260343","9","Clear","0","","Y"
>> > "R","5","TB","ALW"," "," ","3U"," ","77000","0","77000","0"," ","0","
>> > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0","
>> > ","900","F","D","D","7"," "," "," ","3:24PM ","3:55PM ","0325"," ","Alw
>> > 77000N1X","Fast","Y","30","74","
>> > ","","","64","151.69","2330","4643","11156","13832","0","
>> > ","","302065","Good","Clear","","","Y"
>> > "R","6","TB","MSW","S","B","3U"," ","60000","1200","60000","0","
>> > ","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0","
>> > ","800","F","D","D","5"," "," "," ","3:55PM ","4:26PM ","0355"," ","Md
>> > Sp Wt 58k","Fast","","30","61","
>> > ","","","64","140.64","2481","4931","11477","0","0","
>> > ","","161404","Good","Clear","","","Y"
>> > "R","7","TB","CLM"," ","B","3U"," ","40000","0","40000","0"," ","0","
>> > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","20000","20000","
>> > ","800","F","D","D","6"," "," "," ","4:26PM ","4:57PM ","0427"," ","Clm
>> > 20000","Fast","","30","68","
>> > ","","","64","139.31","2337","4770","11402","0","0","
>> > ","","344306","Good","Clear","","","Y"
>> > "R","8","TB","ALW"," ","B","3U"," ","77000","0","77000","0"," ","0","
>> > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0","
>> > ","850","F","D","D","7"," "," "," ","4:57PM ","5:28PM ","0457"," ","Alw
>> > 77000N1X","Fast","","30","76","
>> > ","","","64","144.76","2416","4847","11365","13836","0","
>> > ","","213021","Good","Clear","","","Y"
>> > "R","9","TB","STR"," "," ","3U"," ","60000","0","60000","0"," ","0","
>> > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","40000","
>> > ","700","F","D","D","8"," "," "," ","5:28PM "," ","0528"," ","Alw
>> > 40000s","Fast","Y","16","81","
>> > ","","","64","124.66","2339","4740","11211","0","0","
>> > ","","332649","6,8","Clear","0","","Y"
>> >
>> "S","1","000008813341TB","Coolusive","20100124","KY","TB","Colt","Bay","Ice
>> > Cool Kitty","2003","TB","Elusive Quality","1993","TB","Tomorrows
>> > Cat","1995","TB","Gone
>> > West","1984","TB","122","0","L","","28200","Velasquez","Cornelio","H.","
>> > ","Jacobson","David"," ","Drawing Away Stable and Jacobson, David","
>> > "," ","265","N","
>> >
>> ","0","N","5","5","3","3","4","0","0","1","1","1","10","200","0","0","100","75","510","320","0","0","0","0","N","25000","4w
>> > into lane, held","chase 2o turn, bid 4w turning for home,took over,
>> > held
>> >
>> sway","7.30","3.80","2.70","Y","000000002103TE","TE","Barbara","Robert","
>> > ","000001976480O6","O6","Averill","Bradley","E.","
>> > ","N","0","N","","0","","87","Lansdon B. Robbins & Kevin
>> > Callahan","000000257611TE","000000002695JE"
>> >
>> >
>>
>>
>
>

Reply via email to