Hi, I have exactly the same use case to periodically get rows from some security appliances with just a read only access.
Currently (without NiFi), we use an SQL query to track the maximum value, depending on the DB/appliance/vendor, it could be a simple "SELECT getdate()" or "select max(SW_TIME) from log_table" or a more complex query with INNER JOIN. So, it would be great to have an option to customize the tracking query. Regards 2016-08-03 16:02 GMT+02:00 Conrad Crampton <conrad.cramp...@secdata.com>: > Hi, > Thanks for this. > I did think about a MV but unfortunately I haven’t access to create views – > just read access. That would have been my simplest option ;-) Life’s never > that easy though is it? > The only part of the sql I need to be dynamic is the date parameter (I could > even use the id column). Instead of using the MapCache (if that isn’t a good > idea), could I use the GetFile to just pull a single txt file with the > parameter (i.e. last run or max id value from the last run), which creates > flowfile, read that value and pass that into ExecuteSql (using the > aforementioned value as the parameter in the sql) as the select query can be > dynamically constructed from attributes/ flowfile content (as per docs)? And > then finally write text file back to file system to be picked up next time? > Thanks > Conrad > > On 03/08/2016, 14:02, "Matt Burgess" <mattyb...@gmail.com> wrote: > > Conrad, > > Is it possible to add a view (materialized or not) to the RDBMS? That > view could take care of the denormalization and then > QueryDatabaseTable could point at the view. The DB would take care of > the push-down filters, which functionally is like if you had a > QueryDatabaseTable for each table then did the joins. > > In NiFi 1.0 there is a GenerateTableFetch processor which is like > QueryDatabaseTable except it generates SQL instead of executing SQL. > That might be used in your a) option above but you'd have to reconcile > the SQL statements into a JOIN. A possible improvement to either/both > processors would be to add attributes for the maximum value columns > whose values are the maximum observed values. Then you wouldn't have > to parse or manipulate the SQL if you really just want the max values. > > I have been thinking about how QueryDatabaseTable and > GenerateTableFetch would work if they accepted incoming flow files (to > allow dynamic table names for example). It's a bit more tricky because > those processors run without input to get max values, so their > behavior would change when a flow file is present but would return to > the original behavior if no flow file is present. Since the > ListDatabaseTables processor is also in 1.0, it would be nice to use > that as input to the other two processors. > > I'm definitely interested in any thoughts or discussion around these > things :) > > Regards, > Matt > > On Wed, Aug 3, 2016 at 8:37 AM, Conrad Crampton > <conrad.cramp...@secdata.com> wrote: > > Hi, > > > > My use case is that I want to ship a load of rows from an RDMS > periodically > > and put in HDFS as Avro. > > > > QueryTable processor has functionality that would be great i.e. > maxcolumn > > value (there are couple of columns I could use for this from the data) > and > > it is this functionality I am looking for, BUT the data is not from one > > single table. The nature of the RDBMS is that the business view on the > data > > requires a bunch of joins from other tables/schemas to get the correct > Avro > > file so the options I appear to have are > > > > a) Use QueryTable for each table that make up the business view > and do > > the joins etc. in HDFS (Spark or something) – or potentially do the > > reconciliation within NiFi??? > > > > b) Use ExecuteSQL to run the complete SQL to get the rows which > can > > easily be put into HDFS as Avro given that the line will be the business > > (denormalised) data that is required. > > > > The problem with a) is the reconciliation (denormalisation) of the data > and > > the problem with b) is how to maintain the maxcolumn value so I only > get the > > data since the last run. > > > > > > > > In order to address b) can I use the DistrubutedMapCacheServer & Client > to > > hold a key/value pair of last run date and extract from this date as a > > parameter? > > > > > > > > Thanks for any suggestions. > > > > > > > > Conrad > > > > > > > > SecureData, combating cyber threats > > > > ________________________________ > > > > The information contained in this message or any of its attachments may > be > > privileged and confidential and intended for the exclusive use of the > > intended recipient. If you are not the intended recipient any > disclosure, > > reproduction, distribution or other dissemination or use of this > > communications is strictly prohibited. The views expressed in this > email are > > those of the individual and not necessarily of SecureData Europe Ltd. > Any > > prices quoted are only valid if followed up by a formal written quote. > > > > SecureData Europe Limited. Registered in England & Wales 04365896. > > Registered Address: SecureData House, Hermitage Court, Hermitage Lane, > > Maidstone, Kent, ME16 9NT > > > ***This email originated outside SecureData*** > > Click > https://www.mailcontrol.com/sr/N2EHwWuz2JbGX2PQPOmvUiQSa3+T5MHvUCPvvNtzgx8r0bDDMR!duWMwCi4iTZgUfaZqve+Bl7WqQH8FWVwkvA== > to report this email as spam. > > -- Yohann Lepage