Srujan,
I'm not sure how familiar you are with NiFi, so just a very quick note about
terminology to make sure you understand
what i'm describing. A FlowFile is the basic data record in NiFi. It consists
of two parts:
- FlowFile Attributes (Key/Value Pairs that are strings)
- FlowFile Content (arbitrary stream of bytes)
I think the flow that you would want would like this:
GetHTTP -> ExtractText -> ReplaceText -> PutFile
ExtractText will then evaluate the regex against the content pulled from the
HTTP service and put
the result in a FlowFile Attribute. So let's say you add a property named
"desired.text"
with a value "<body>(.*)</body>". This will create an Attribute named
"desired.text"
and the value of that attribute will be whatever is found between the
<body> and </body> tags.
We will then use ReplaceText with the following configuration:
Regular Expression: .+
Replacement Value: ${desired.text}
All other properties: defaults.
So what this is doing is replacing the content of the FlowFile with the
"desired.text" attribute.
PutFile then writes the file to disk.
Hope this helps! If this doesn't work out for you for some reason, or if you've
got more
questions (or if I misunderstood what you're wanting to do), please don't
hesitate
to shoot back and let me know!
Thanks
-Mark
________________________________
> From: [email protected]
> To: [email protected]
> CC: [email protected]
> Subject: Extracting text using RegEx
> Date: Tue, 16 Jun 2015 17:56:38 +0000
>
>
> Hi,
>
>
>
> I am trying to download a file (using GetHTTP) from a website and
> extract text from it matching a RegEx pattern (using ExtractText).
>
>
>
> I am able to download the file using GetHTTP and save it via PutFile. I
> understand that ExtractText processor works only with a FlowFile. So I
> tried generating a flow file from GetHTTP and PutFile (separately), but
> it doesn’t seem to work.
>
>
>
> Can anyone give me pointers (examples?) on what processors to be used
> to extract text from a file pulled down by GetHTTP and write the
> matched text to a separate file?
>
>
>
> Thanks,
>
> Srujan Kotikela
>
>
>
> Firehost - SECURE CLOUD HOSTING
> North America | Europe | Asia Pacific
>
> ComputerWorld: 100 Best Places to Work in IT See Current Opportunities
>
> <http://www.firehost.com/careers>This email and any files transmitted
> with it are confidential and intended solely
> for the use of the individual(s) to whom they are addressed. Do not
> disseminate,
> distribute or copy this e-mail without explicit permission to do so.
> Thank you.
>
>
>
>
>
>