Re: Apache Nifi Vs Spring XD, which one is better

2016-05-06 Thread Simon Ball
ExecuteSQL can certainly deal with millions of rows. Sqoop currently makes more sense if you want to distribute the query processing across a large number of nodes (if you have 100s millions of rows 10-100GBs+ or TBs of data), and write direct into hadoop. If you’re looking for functionality lik

Re: Monitoring duplicate file entries with ExecuteSQL

2016-04-22 Thread Simon Ball
ListFiles will maintain state on files processed. However, if any of the meta data for those files changes, they will be output from ListFiles again (modification time change for example). Simon > On Apr 22, 2016, at 6:49 PM, mfzeidan wrote: > > Thank you Simon. I am using the ListSFTP -> Fet

Re: Monitoring duplicate file entries with ExecuteSQL

2016-04-22 Thread Simon Ball
You need to quote the attribute in your SQL: '${filename}' to produce valid SQL. One alternative to this would be the use ListFile -> FetchFile together. With that pattern, ListFile maintains the state of which files have been processed for you within nifi, meaning you wouldn't need the separa

Re: Reg: Tailing a remote file

2016-04-21 Thread Simon Ball
> I need some example for " Rolling Filename Pattern". > > Also, as per my knowledge TailFile processor is used for tailing local file > then what does location attribute signifies, if I set it to remote. > > Regards, > Sourav Gulati > > -Original Messag

Re: Reg: Tailing a remote file

2016-04-20 Thread Simon Ball
Site-to-site is in pretty much every version of nifi released under apache. > On 20 Apr 2016, at 10:48, Sourav Gulati wrote: > > Thanks Simon. Which version of Nifi support site to site? > > Regards, > Sourav Gulati > > -Original Message- >

Re: Reg: Tailing a remote file

2016-04-20 Thread Simon Ball
The best way to do this would be to run a nifi agent on the remote system, using the TailFile processor, and then use site-to-site to get that into your core nifi. Simon > On 20 Apr 2016, at 09:49, Sourav Gulati wrote: > > Hi Team, > > We want to process logs from a file which is getting u

Re: Export a process group

2016-04-11 Thread Simon Ball
Hi Harish, There’s an issue of terms here. You wouldn’t be exporting a Flowfile so to speak (that is a unit of data flowing through nifi). It sounds like you’re looking to export sections of the flow configuration. In that case, the way to do that is to make the process groups you want to expor

Re: Text and metadata extraction processor

2016-03-31 Thread Simon Ball
integrated we'd probably > want to take care of that in the ExtractMediaAttributes configuration. > > Additionally, I've proposed the idea of a ProcessPDF processor which would > ascertain whether a PDF is 'text' or 'scanned'. If scanned, we would break > i

Re: Text and metadata extraction processor

2016-03-31 Thread Simon Ball
Just a thought… To keep consistent with other Nifi Parse patterns, would it make sense to based the extraction of content on the presence of a relation. So your tika processor would have an original relation which would have meta data attached as attributed, and an extracted relation which wou

Re: Import Kafka messages into Titan

2016-03-31 Thread Simon Ball
You don’t necessarily need a custom processor for this. To convert the JSON to key values for a graph for example, you can use EvaluateJsonPath on your incoming messages from Kafka, this will pull out the pieces you need, then use AttributesToJson to select these attributes back into JSON to pus

Re: How to only take new rows using ExecuteSQL processor?

2016-03-31 Thread Simon Ball
Hi Paul, In the scenario where you need complex joins and incremental loads, the best bet is probably to create a view in your database with the query required. The QueryDatabaseTable can operate against this view as long as the view has a suitable ‘id’ column in it. That would provide a work

Missing Download artefacts

2016-03-27 Thread Simon Ball
It seems like the apache mirrors have lost all the artefacts for the 0.5.1 release, but the website is still linked to this. Did I just catch a really inconvenient moment of the 0.6.0 release, or is this unintentional? Simon

Re: SNMP Processors

2016-02-18 Thread Simon Ball
Have you considered a trap listener, that would make a nice addition to the bundle. Simon - Simon Elliston Ball Product Solutions Architect Hortonworks - Powering the future of data On 18 Feb 2016, at 20:29, Pierre Villard mailto:pierre.villard...@gmail.com>> wrote: I created a JIRA (https:/

Re: SNMP Processors

2016-02-18 Thread Simon Ball
I’d also add that I’ve seen a fair number of other people interested in this. Very willing to collaborate on any this if you need, and looking forward to seeing the contribution. Simon > On 18 Feb 2016, at 18:56, Joe Witt wrote: > > Pierre, > > In my view the best and first test of interest

Re: Configuration Service

2016-01-27 Thread Simon Ball
;>> by the encapsulated components. Additionally, this would help with the >>>> portability of templates when needing to define different values for >>>> different environments. >>>> >>>> Matt >>>> >>>> [1] https://cw

Re: Configuration Service

2016-01-27 Thread Simon Ball
. Simon On 27 Jan 2016, at 09:26, Simon Ball mailto:sb...@hortonworks.com>> wrote: One of the problems with complex flows is repetition of common configuration. Many people also want to be able to configure things like connection strings into an environment specific location outside of th

Configuration Service

2016-01-27 Thread Simon Ball
One of the problems with complex flows is repetition of common configuration. Many people also want to be able to configure things like connection strings into an environment specific location outside of the Flow, and parameterise the flow. Things like the Kerberos|SSL|etc Context service help i

Re: Redesign User Interface (UI)

2016-01-06 Thread Simon Ball
The new UI looks fantastic, and seems to be heading in a very good direction. One thought I have is around the upgrade experience. Given that we will likely end up with different sized elements to the existing UI, people may find existing flow layouts end up somewhat jumbled with the repositioni