Srujan, Take a look at the template provided here https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
The third template mentioned shows how to pull using GetHTTP, extract text using ExtractText, route using RouteOnAttribute, and to put a file using PutFile. Thanks Joe On Tue, Jun 16, 2015 at 8:10 PM, Mark Payne <[email protected]> wrote: > Srujan, > > I'm not sure how familiar you are with NiFi, so just a very quick note about > terminology to make sure you understand > what i'm describing. A FlowFile is the basic data record in NiFi. It consists > of two parts: > - FlowFile Attributes (Key/Value Pairs that are strings) > - FlowFile Content (arbitrary stream of bytes) > > I think the flow that you would want would like this: > > GetHTTP -> ExtractText -> ReplaceText -> PutFile > > ExtractText will then evaluate the regex against the content pulled from the > HTTP service and put > the result in a FlowFile Attribute. So let's say you add a property named > "desired.text" > with a value "<body>(.*)</body>". This will create an Attribute named > "desired.text" > and the value of that attribute will be whatever is found between the > <body> and </body> tags. > > We will then use ReplaceText with the following configuration: > Regular Expression: .+ > Replacement Value: ${desired.text} > All other properties: defaults. > > So what this is doing is replacing the content of the FlowFile with the > "desired.text" attribute. > > PutFile then writes the file to disk. > > Hope this helps! If this doesn't work out for you for some reason, or if > you've got more > questions (or if I misunderstood what you're wanting to do), please don't > hesitate > to shoot back and let me know! > > Thanks > -Mark > > ________________________________ >> From: [email protected] >> To: [email protected] >> CC: [email protected] >> Subject: Extracting text using RegEx >> Date: Tue, 16 Jun 2015 17:56:38 +0000 >> >> >> Hi, >> >> >> >> I am trying to download a file (using GetHTTP) from a website and >> extract text from it matching a RegEx pattern (using ExtractText). >> >> >> >> I am able to download the file using GetHTTP and save it via PutFile. I >> understand that ExtractText processor works only with a FlowFile. So I >> tried generating a flow file from GetHTTP and PutFile (separately), but >> it doesn’t seem to work. >> >> >> >> Can anyone give me pointers (examples?) on what processors to be used >> to extract text from a file pulled down by GetHTTP and write the >> matched text to a separate file? >> >> >> >> Thanks, >> >> Srujan Kotikela >> >> >> >> Firehost - SECURE CLOUD HOSTING >> North America | Europe | Asia Pacific >> >> ComputerWorld: 100 Best Places to Work in IT See Current Opportunities >> >> <http://www.firehost.com/careers>This email and any files transmitted >> with it are confidential and intended solely >> for the use of the individual(s) to whom they are addressed. Do not >> disseminate, >> distribute or copy this e-mail without explicit permission to do so. >> Thank you. >> >> >> >> >> >> >
