Mark,

How can I rerun the processors after changing some of the attributes? For 
example, when I change the Regex pattern and start the processors, nothing 
happens. 

Srujan Kotikela
FireHost - SECURE CLOUD HOSTING
North America | Europe | Asia Pacific

ComputerWorld: 100 Best Places to Work in IT  See Current Opportunities

This email and any files transmitted with it are confidential and intended 
solely 
for the use of the individual(s) to whom they are addressed. Do not 
disseminate, 
distribute or copy this e-mail without explicit permission to do so. Thank you.



-----Original Message-----
From: Mark Payne [mailto:[email protected]] 
Sent: Thursday, June 18, 2015 1:22 PM
To: [email protected]
Subject: RE: Extracting text using RegEx

Srujan,

When you pull the file via GetHTTP, it assigns a filename to the file. You can 
easily change the filename by using an UpdateAttribute Processor. Just add a 
new property with the name "filename" and whatever value you would like. Then, 
you can write both to the same directory.

With ExtractText, it will route the FlowFile to 'matched' or 'unmatched' 
depending on whether or not any regex that you provided matches. However, if 
the regex has a capturing group, the text that is extracted will be just what 
is captured by that group. For example, if your regex is ".*good-(bye).*" then 
it will route any FlowFIle containing "good-bye"
to 'matched' but will extract only the text "bye" because that is what is in 
the capturing group.

Once you have extracted the text, though, it is added to a FlowFile attribute, 
not the content. So you will want to use a ReplaceText to replace the content 
of the FlowFile before you use PutFile.

Does this make sense? If not, please let me know where I can help clarify, and 
I'll be happy to do so!

Thanks
-Mark

----------------------------------------
> From: [email protected]
> To: [email protected]
> Subject: RE: Extracting text using RegEx
> Date: Thu, 18 Jun 2015 18:08:58 +0000
>
> Hi Mark,
>
> I am trying to extract some text from a remote file/feed, downloaded via 
> HTTP. The flow I am contemplating is like this:
>
> GetHTTP ====> ExtractText == (matched) ==> PutFile
> ||
> (unmatched)
> ||
> V
> PutFile
>
> I am able to create this flow just fine. However, I have following issues:
>
> 1. I noticed that the 'file' configured for the GetHTTP processor goes into 
> the 'directory' configured in the 'PutFile' processor. This is leading me to 
> save the matched file and unmatched file in separate directories. Is there 
> way to have those 2 files in the same directory?
>
> 2. I don't seem to get the RegEx working. The ExtractText processor either 
> matches all input or no input. Are there any particular guidelines on how to 
> write regex for NiFi?
>
> Thanks,
> Srujan Kotikela
> FireHost - SECURE CLOUD HOSTING
> North America | Europe | Asia Pacific
>
> ComputerWorld: 100 Best Places to Work in IT See Current Opportunities
>
> This email and any files transmitted with it are confidential and 
> intended solely for the use of the individual(s) to whom they are 
> addressed. Do not disseminate, distribute or copy this e-mail without 
> explicit permission to do so. Thank you.
>
>
>
> -----Original Message-----
> From: Mark Payne [mailto:[email protected]]
> Sent: Tuesday, June 16, 2015 7:11 PM
> To: [email protected]
> Subject: RE: Extracting text using RegEx
>
> Srujan,
>
> I'm not sure how familiar you are with NiFi, so just a very quick note about 
> terminology to make sure you understand what i'm describing. A FlowFile is 
> the basic data record in NiFi. It consists of two parts:
> - FlowFile Attributes (Key/Value Pairs that are strings)
> - FlowFile Content (arbitrary stream of bytes)
>
> I think the flow that you would want would like this:
>
> GetHTTP -> ExtractText -> ReplaceText -> PutFile
>
> ExtractText will then evaluate the regex against the content pulled from the 
> HTTP service and put the result in a FlowFile Attribute. So let's say you add 
> a property named "desired.text" with a value "<body>(.*)</body>". This will 
> create an Attribute named "desired.text" and the value of that attribute will 
> be whatever is found between the <body> and </body> tags.
>
> We will then use ReplaceText with the following configuration:
> Regular Expression: .+
> Replacement Value: ${desired.text}
> All other properties: defaults.
>
> So what this is doing is replacing the content of the FlowFile with the 
> "desired.text" attribute.
>
> PutFile then writes the file to disk.
>
> Hope this helps! If this doesn't work out for you for some reason, or if 
> you've got more questions (or if I misunderstood what you're wanting to do), 
> please don't hesitate to shoot back and let me know!
>
> Thanks
> -Mark
>
> ________________________________
>> From: [email protected]
>> To: [email protected]
>> CC: [email protected]
>> Subject: Extracting text using RegEx
>> Date: Tue, 16 Jun 2015 17:56:38 +0000
>>
>>
>> Hi,
>>
>>
>>
>> I am trying to download a file (using GetHTTP) from a website and 
>> extract text from it matching a RegEx pattern (using ExtractText).
>>
>>
>>
>> I am able to download the file using GetHTTP and save it via PutFile.
>> I understand that ExtractText processor works only with a FlowFile. 
>> So I tried generating a flow file from GetHTTP and PutFile 
>> (separately), but it doesn't seem to work.
>>
>>
>>
>> Can anyone give me pointers (examples?) on what processors to be used 
>> to extract text from a file pulled down by GetHTTP and write the 
>> matched text to a separate file?
>>
>>
>>
>> Thanks,
>>
>> Srujan Kotikela
>>
>>
>>
>> Firehost - SECURE CLOUD HOSTING
>> North America | Europe | Asia Pacific
>>
>> ComputerWorld: 100 Best Places to Work in IT ­ See Current 
>> Opportunities
>>
>> <http://www.firehost.com/careers>This email and any files transmitted 
>> with it are confidential and intended solely for the use of the
>> individual(s) to whom they are addressed. Do not disseminate, 
>> distribute or copy this e-mail without explicit permission to do so.
>> Thank you.
>>
>>
>>
>>
>>
>>
>
                                          

Reply via email to