The 0.2.0 release is in the active voting stages. Here is what is on that release being voted upon:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12332286 If all goes well with the vote the release should be up in 4-5 days. Thanks Joe On Tue, Jun 23, 2015 at 12:44 PM, Chase Cunningham <[email protected]> wrote: > sure can you tell me more about the next release? any other new stuff that > will be upgraded? > > > On 6/22/15 3:41 PM, Mark Payne wrote: >> >> Thanks for the clarification. >> >> The ExecuteStreamCommand processor that I was suggesting expects that the >> data could can be streamed >> directly to the script that it is running. The next version of NiFI >> (0.2.0-incubating) provides the ability to avoid >> streaming data to Standard In. This change is available today if you are >> building from the codebase. If you are >> just downloading the newest build, it is likely a couple of weeks away >> from being delivered. >> >> With that change, you can use PutFile -> ExecuteStreamCommand so that you >> write the file to disk, and then >> use ExecuteStreamCommand to call the script that parses the data. You can >> then use the ${filename} as one >> of the parameters to the script in order to tell it which file to run >> against. From there, you can use GetFile to pick up >> the result, if you want to bring it back into your NiFi flow, or you can >> process it however makes sense outside >> of NiFi. >> >> Until that change is available, it may be a little more difficult, as the >> processor wants to stream the content of >> the FlowFile directly to the script. >> >> A possible workaround in the meantime would be to use PutFile -> >> ReplaceText -> ExecuteStreamCommand and >> configure ReplaceText to replace the regex ".*" with an empty value. In >> that case, it won't stream any data >> to the script, and you can just invoke the script using the filename as a >> parameter. >> >> Does this help at all? >> >> Thanks >> -Mark >> >> >> >> >> ---------------------------------------- >>> >>> Date: Mon, 22 Jun 2015 15:28:17 -0500 >>> From: [email protected] >>> To: [email protected] >>> Subject: Re: Extracting text using RegEx >>> >>> 1. nifi does http stuff to get text files >>> 2. files are put in directory in .txt format >>> 3. script runs to parse through files, each data point of value is parsed >>> 4. parsed data is written to files associated with data points inside >>> 5. data is sent to data repo for future indexing and use >>> >>> >>> >>> On 6/22/15 3:22 PM, Mark Payne wrote: >>>> >>>> Chase, >>>> >>>> I want to understand the use case better before I try to offer any >>>> advice. >>>> >>>> So you want to write the FlowFiles to a directory, and then run an >>>> external script to process those files, correct? >>>> Then, once the script has run, what does it do with the result? Does it >>>> write it to a file, write to standard out, >>>> interact directly with the database, etc? >>>> >>>> Thanks >>>> -Mark >>>> >>>> ---------------------------------------- >>>>> >>>>> Date: Mon, 22 Jun 2015 15:06:47 -0500 >>>>> From: [email protected] >>>>> To: [email protected] >>>>> Subject: Re: Extracting text using RegEx >>>>> >>>>> so i have nifi pulling in data in .txt format from about 30 different >>>>> sites....that data gets dumped to a directory call feedfiles...then i >>>>> have a script that will parse out the ip's, exe's, domains, etc..so >>>>> that >>>>> the parsed stuff can be allocated to a database for indexing... >>>>> >>>>> having trouble automating this activity from the nifi standpoint...help >>>>> is appreciated. >>>>> >>>>> On 6/22/15 2:55 PM, Mark Payne wrote: >>>>>> >>>>>> Chase, >>>>>> >>>>>> You could certainly use the ExecuteStreamCommand processor to >>>>>> accomplish that. >>>>>> >>>>>> You can see the usage guide/documentation for that processor at [1]. >>>>>> Give that a look and >>>>>> let me know if it meets your needs or not. >>>>>> >>>>>> Thanks >>>>>> -Mark >>>>>> >>>>>> [1] >>>>>> http://nifi.incubator.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExecuteStreamCommand/index.html >>>>>> >>>>>> >>>>>> ---------------------------------------- >>>>>>> >>>>>>> Date: Mon, 22 Jun 2015 14:21:00 -0500 >>>>>>> From: [email protected] >>>>>>> To: [email protected] >>>>>>> Subject: Re: Extracting text using RegEx >>>>>>> >>>>>>> how can one run a script within NIFI to accomplish parsing? >>>>>>> >>>>>>> On 6/22/15 12:41 PM, Mark Payne wrote: >>>>>>>> >>>>>>>> Srujan, >>>>>>>> >>>>>>>> My guess is that the issue you are seeing is due to the GetHTTP >>>>>>>> caching the ETag/LastModified value. When the >>>>>>>> processor receives the response for an HTTP GET request, it writes >>>>>>>> the ETag to conf/.httpCache-<processor id>. >>>>>>>> >>>>>>>> It does this so that even after a restart of nifi, we don't keep >>>>>>>> pulling the same content. If the content changes at any >>>>>>>> point, it will pull the new version of the content, though. >>>>>>>> >>>>>>>> You could trigger it to pull data either by copying and pasting the >>>>>>>> GetHTTP Processor and letting the new processor >>>>>>>> pull the data, or you could delete that file from the conf/ >>>>>>>> directory and restart. >>>>>>>> >>>>>>>> If this doesn't give you what you need, please feel free to let me >>>>>>>> know! >>>>>>>> >>>>>>>> Thanks >>>>>>>> -Mark >>>>>>>> >>>>>>>> ---------------------------------------- >>>>>>>>> >>>>>>>>> From: [email protected] >>>>>>>>> To: [email protected] >>>>>>>>> Subject: RE: Extracting text using RegEx >>>>>>>>> Date: Mon, 22 Jun 2015 15:11:18 +0000 >>>>>>>>> >>>>>>>>> Mark, >>>>>>>>> >>>>>>>>> How can I rerun the processors after changing some of the >>>>>>>>> attributes? For example, when I change the Regex pattern and start the >>>>>>>>> processors, nothing happens. >>>>>>>>> >>>>>>>>> Srujan Kotikela >>>>>>>>> FireHost - SECURE CLOUD HOSTING >>>>>>>>> North America | Europe | Asia Pacific >>>>>>>>> >>>>>>>>> ComputerWorld: 100 Best Places to Work in IT See Current >>>>>>>>> Opportunities >>>>>>>>> >>>>>>>>> This email and any files transmitted with it are confidential and >>>>>>>>> intended solely >>>>>>>>> for the use of the individual(s) to whom they are addressed. Do not >>>>>>>>> disseminate, >>>>>>>>> distribute or copy this e-mail without explicit permission to do >>>>>>>>> so. Thank you. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Mark Payne [mailto:[email protected]] >>>>>>>>> Sent: Thursday, June 18, 2015 1:22 PM >>>>>>>>> To: [email protected] >>>>>>>>> Subject: RE: Extracting text using RegEx >>>>>>>>> >>>>>>>>> Srujan, >>>>>>>>> >>>>>>>>> When you pull the file via GetHTTP, it assigns a filename to the >>>>>>>>> file. You can easily change the filename by using an UpdateAttribute >>>>>>>>> Processor. Just add a new property with the name "filename" and >>>>>>>>> whatever >>>>>>>>> value you would like. Then, you can write both to the same directory. >>>>>>>>> >>>>>>>>> With ExtractText, it will route the FlowFile to 'matched' or >>>>>>>>> 'unmatched' depending on whether or not any regex that you provided >>>>>>>>> matches. >>>>>>>>> However, if the regex has a capturing group, the text that is >>>>>>>>> extracted will >>>>>>>>> be just what is captured by that group. For example, if your regex is >>>>>>>>> ".*good-(bye).*" then it will route any FlowFIle containing "good-bye" >>>>>>>>> to 'matched' but will extract only the text "bye" because that is >>>>>>>>> what is in the capturing group. >>>>>>>>> >>>>>>>>> Once you have extracted the text, though, it is added to a FlowFile >>>>>>>>> attribute, not the content. So you will want to use a ReplaceText to >>>>>>>>> replace >>>>>>>>> the content of the FlowFile before you use PutFile. >>>>>>>>> >>>>>>>>> Does this make sense? If not, please let me know where I can help >>>>>>>>> clarify, and I'll be happy to do so! >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> -Mark >>>>>>>>> >>>>>>>>> ---------------------------------------- >>>>>>>>>> >>>>>>>>>> From: [email protected] >>>>>>>>>> To: [email protected] >>>>>>>>>> Subject: RE: Extracting text using RegEx >>>>>>>>>> Date: Thu, 18 Jun 2015 18:08:58 +0000 >>>>>>>>>> >>>>>>>>>> Hi Mark, >>>>>>>>>> >>>>>>>>>> I am trying to extract some text from a remote file/feed, >>>>>>>>>> downloaded via HTTP. The flow I am contemplating is like this: >>>>>>>>>> >>>>>>>>>> GetHTTP ====> ExtractText == (matched) ==> PutFile >>>>>>>>>> || >>>>>>>>>> (unmatched) >>>>>>>>>> || >>>>>>>>>> V >>>>>>>>>> PutFile >>>>>>>>>> >>>>>>>>>> I am able to create this flow just fine. However, I have following >>>>>>>>>> issues: >>>>>>>>>> >>>>>>>>>> 1. I noticed that the 'file' configured for the GetHTTP processor >>>>>>>>>> goes into the 'directory' configured in the 'PutFile' processor. >>>>>>>>>> This is >>>>>>>>>> leading me to save the matched file and unmatched file in separate >>>>>>>>>> directories. Is there way to have those 2 files in the same >>>>>>>>>> directory? >>>>>>>>>> >>>>>>>>>> 2. I don't seem to get the RegEx working. The ExtractText >>>>>>>>>> processor either matches all input or no input. Are there any >>>>>>>>>> particular >>>>>>>>>> guidelines on how to write regex for NiFi? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Srujan Kotikela >>>>>>>>>> FireHost - SECURE CLOUD HOSTING >>>>>>>>>> North America | Europe | Asia Pacific >>>>>>>>>> >>>>>>>>>> ComputerWorld: 100 Best Places to Work in IT See Current >>>>>>>>>> Opportunities >>>>>>>>>> >>>>>>>>>> This email and any files transmitted with it are confidential and >>>>>>>>>> intended solely for the use of the individual(s) to whom they are >>>>>>>>>> addressed. Do not disseminate, distribute or copy this e-mail >>>>>>>>>> without explicit permission to do so. Thank you. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Mark Payne [mailto:[email protected]] >>>>>>>>>> Sent: Tuesday, June 16, 2015 7:11 PM >>>>>>>>>> To: [email protected] >>>>>>>>>> Subject: RE: Extracting text using RegEx >>>>>>>>>> >>>>>>>>>> Srujan, >>>>>>>>>> >>>>>>>>>> I'm not sure how familiar you are with NiFi, so just a very quick >>>>>>>>>> note about terminology to make sure you understand what i'm >>>>>>>>>> describing. A >>>>>>>>>> FlowFile is the basic data record in NiFi. It consists of two parts: >>>>>>>>>> - FlowFile Attributes (Key/Value Pairs that are strings) >>>>>>>>>> - FlowFile Content (arbitrary stream of bytes) >>>>>>>>>> >>>>>>>>>> I think the flow that you would want would like this: >>>>>>>>>> >>>>>>>>>> GetHTTP -> ExtractText -> ReplaceText -> PutFile >>>>>>>>>> >>>>>>>>>> ExtractText will then evaluate the regex against the content >>>>>>>>>> pulled from the HTTP service and put the result in a FlowFile >>>>>>>>>> Attribute. So >>>>>>>>>> let's say you add a property named "desired.text" with a value >>>>>>>>>> "<body>(.*)</body>". This will create an Attribute named >>>>>>>>>> "desired.text" and >>>>>>>>>> the value of that attribute will be whatever is found between the >>>>>>>>>> <body> and >>>>>>>>>> </body> tags. >>>>>>>>>> >>>>>>>>>> We will then use ReplaceText with the following configuration: >>>>>>>>>> Regular Expression: .+ >>>>>>>>>> Replacement Value: ${desired.text} >>>>>>>>>> All other properties: defaults. >>>>>>>>>> >>>>>>>>>> So what this is doing is replacing the content of the FlowFile >>>>>>>>>> with the "desired.text" attribute. >>>>>>>>>> >>>>>>>>>> PutFile then writes the file to disk. >>>>>>>>>> >>>>>>>>>> Hope this helps! If this doesn't work out for you for some reason, >>>>>>>>>> or if you've got more questions (or if I misunderstood what you're >>>>>>>>>> wanting >>>>>>>>>> to do), please don't hesitate to shoot back and let me know! >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> -Mark >>>>>>>>>> >>>>>>>>>> ________________________________ >>>>>>>>>>> >>>>>>>>>>> From: [email protected] >>>>>>>>>>> To: [email protected] >>>>>>>>>>> CC: [email protected] >>>>>>>>>>> Subject: Extracting text using RegEx >>>>>>>>>>> Date: Tue, 16 Jun 2015 17:56:38 +0000 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I am trying to download a file (using GetHTTP) from a website and >>>>>>>>>>> extract text from it matching a RegEx pattern (using >>>>>>>>>>> ExtractText). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I am able to download the file using GetHTTP and save it via >>>>>>>>>>> PutFile. >>>>>>>>>>> I understand that ExtractText processor works only with a >>>>>>>>>>> FlowFile. >>>>>>>>>>> So I tried generating a flow file from GetHTTP and PutFile >>>>>>>>>>> (separately), but it doesn't seem to work. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Can anyone give me pointers (examples?) on what processors to be >>>>>>>>>>> used >>>>>>>>>>> to extract text from a file pulled down by GetHTTP and write the >>>>>>>>>>> matched text to a separate file? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Srujan Kotikela >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Firehost - SECURE CLOUD HOSTING >>>>>>>>>>> North America | Europe | Asia Pacific >>>>>>>>>>> >>>>>>>>>>> ComputerWorld: 100 Best Places to Work in IT See Current >>>>>>>>>>> Opportunities >>>>>>>>>>> >>>>>>>>>>> <http://www.firehost.com/careers>This email and any files >>>>>>>>>>> transmitted >>>>>>>>>>> with it are confidential and intended solely for the use of the >>>>>>>>>>> individual(s) to whom they are addressed. Do not disseminate, >>>>>>>>>>> distribute or copy this e-mail without explicit permission to do >>>>>>>>>>> so. >>>>>>>>>>> Thank you. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>> -- >>>>>>> Dr. Chase C Cunningham >>>>>>> CTRC (SW) USN Ret. >>>>>>> The Cynja LLC Proprietary Business and Technical Information >>>>>>> CONFIDENTIAL TREATMENT REQUIRED >>>>>>> >>>>> -- >>>>> Dr. Chase C Cunningham >>>>> CTRC (SW) USN Ret. >>>>> The Cynja LLC Proprietary Business and Technical Information >>>>> CONFIDENTIAL TREATMENT REQUIRED >>>>> >>> -- >>> Dr. Chase C Cunningham >>> CTRC (SW) USN Ret. >>> The Cynja LLC Proprietary Business and Technical Information >>> CONFIDENTIAL TREATMENT REQUIRED >>> >> > > > -- > Dr. Chase C Cunningham > CTRC (SW) USN Ret. > The Cynja LLC Proprietary Business and Technical Information > CONFIDENTIAL TREATMENT REQUIRED >
