Srujan,

Take a look at the template provided here
https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates

The third template mentioned shows how to pull using GetHTTP, extract
text using ExtractText, route using RouteOnAttribute, and to put a
file using PutFile.

Thanks
Joe

On Tue, Jun 16, 2015 at 8:10 PM, Mark Payne <[email protected]> wrote:
> Srujan,
>
> I'm not sure how familiar you are with NiFi, so just a very quick note about 
> terminology to make sure you understand
> what i'm describing. A FlowFile is the basic data record in NiFi. It consists 
> of two parts:
> - FlowFile Attributes (Key/Value Pairs that are strings)
> - FlowFile Content (arbitrary stream of bytes)
>
> I think the flow that you would want would like this:
>
> GetHTTP -> ExtractText -> ReplaceText -> PutFile
>
> ExtractText will then evaluate the regex against the content pulled from the 
> HTTP service and put
> the result in a FlowFile Attribute.  So let's say you add a property named 
> "desired.text"
> with a value "<body>(.*)</body>". This will create an Attribute named 
> "desired.text"
> and the value of that attribute will be whatever is found between the
> <body> and </body> tags.
>
> We will then use ReplaceText with the following configuration:
> Regular Expression: .+
> Replacement Value: ${desired.text}
> All other properties: defaults.
>
> So what this is doing is replacing the content of the FlowFile with the 
> "desired.text" attribute.
>
> PutFile then writes the file to disk.
>
> Hope this helps! If this doesn't work out for you for some reason, or if 
> you've got more
> questions (or if I misunderstood what you're wanting to do), please don't 
> hesitate
> to shoot back and let me know!
>
> Thanks
> -Mark
>
> ________________________________
>> From: [email protected]
>> To: [email protected]
>> CC: [email protected]
>> Subject: Extracting text using RegEx
>> Date: Tue, 16 Jun 2015 17:56:38 +0000
>>
>>
>> Hi,
>>
>>
>>
>> I am trying to download a file (using GetHTTP) from a website and
>> extract text from it matching a RegEx pattern (using ExtractText).
>>
>>
>>
>> I am able to download the file using GetHTTP and save it via PutFile. I
>> understand that ExtractText processor works only with a FlowFile. So I
>> tried generating a flow file from GetHTTP and PutFile (separately), but
>> it doesn’t seem to work.
>>
>>
>>
>> Can anyone give me pointers (examples?) on what processors to be used
>> to extract text from a file pulled down by GetHTTP and write the
>> matched text to a separate file?
>>
>>
>>
>> Thanks,
>>
>> Srujan Kotikela
>>
>>
>>
>> Firehost - SECURE CLOUD HOSTING
>> North America | Europe | Asia Pacific
>>
>> ComputerWorld: 100 Best Places to Work in IT ­ See Current Opportunities
>>
>> <http://www.firehost.com/careers>This email and any files transmitted
>> with it are confidential and intended solely
>> for the use of the individual(s) to whom they are addressed. Do not
>> disseminate,
>> distribute or copy this e-mail without explicit permission to do so.
>> Thank you.
>>
>>
>>
>>
>>
>>
>

Reply via email to