Re: ExecuteSQL question

Yohann Lepage Wed, 03 Aug 2016 07:35:11 -0700

Hi,

I have exactly the same use case to periodically get rows from some
security appliances with just a read only access.


Currently (without NiFi), we use an SQL query  to track the maximum
value, depending on the DB/appliance/vendor, it could be a simple
"SELECT getdate()" or  "select max(SW_TIME) from log_table" or  a more
complex query with INNER JOIN.

So, it would be great to have an option to customize the tracking query.

Regards

2016-08-03 16:02 GMT+02:00 Conrad Crampton <conrad.cramp...@secdata.com>:
> Hi,
> Thanks for this.
> I did think about a MV but unfortunately I haven’t access to create views – 
> just read access. That would have been my simplest option ;-) Life’s never 
> that easy though is it?
> The only part of the sql I need to be dynamic is the date parameter (I could 
> even use the id column). Instead of using the MapCache (if that isn’t a good 
> idea), could I use the GetFile to just pull a single txt file with the 
> parameter (i.e. last run or max id value from the last run), which creates 
> flowfile, read that value and pass that into ExecuteSql (using the 
> aforementioned value as the parameter in the sql) as the select query can be 
> dynamically constructed from attributes/ flowfile content (as per docs)? And 
> then finally write text file back to file system to be picked up next time?
> Thanks
> Conrad
>
> On 03/08/2016, 14:02, "Matt Burgess" <mattyb...@gmail.com> wrote:
>
>     Conrad,
>
>     Is it possible to add a view (materialized or not) to the RDBMS? That
>     view could take care of the denormalization and then
>     QueryDatabaseTable could point at the view. The DB would take care of
>     the push-down filters, which functionally is like if you had a
>     QueryDatabaseTable for each table then did the joins.
>
>     In NiFi 1.0 there is a GenerateTableFetch processor which is like
>     QueryDatabaseTable except it generates SQL instead of executing SQL.
>     That might be used in your a) option above but you'd have to reconcile
>     the SQL statements into a JOIN. A possible improvement to either/both
>     processors would be to add attributes for the maximum value columns
>     whose values are the maximum observed values. Then you wouldn't have
>     to parse or manipulate the SQL if you really just want the max values.
>
>     I have been thinking about how QueryDatabaseTable and
>     GenerateTableFetch would work if they accepted incoming flow files (to
>     allow dynamic table names for example). It's a bit more tricky because
>     those processors run without input to get max values, so their
>     behavior would change when a flow file is present but would return to
>     the original behavior if no flow file is present. Since the
>     ListDatabaseTables processor is also in 1.0, it would be nice to use
>     that as input to the other two processors.
>
>     I'm definitely interested in any thoughts or discussion around these 
> things :)
>
>     Regards,
>     Matt
>
>     On Wed, Aug 3, 2016 at 8:37 AM, Conrad Crampton
>     <conrad.cramp...@secdata.com> wrote:
>     > Hi,
>     >
>     > My use case is that I want to ship a load of rows from an RDMS 
> periodically
>     > and put in HDFS as Avro.
>     >
>     > QueryTable processor has functionality that would be great i.e. 
> maxcolumn
>     > value (there are couple of columns I could use for this from the data) 
> and
>     > it is this functionality I am looking for, BUT the data is not from one
>     > single table. The nature of the RDBMS is that the business view on the 
> data
>     > requires a bunch of joins from other tables/schemas to get the correct 
> Avro
>     > file so the options I appear to have are
>     >
>     > a)       Use QueryTable for each table that make up the business view 
> and do
>     > the joins etc. in HDFS (Spark or something) – or potentially do the
>     > reconciliation within NiFi???
>     >
>     > b)       Use ExecuteSQL to run the complete SQL to get the rows which 
> can
>     > easily be put into HDFS as Avro given that the line will be the business
>     > (denormalised) data that is required.
>     >
>     > The problem with a) is the reconciliation (denormalisation) of the data 
> and
>     > the problem with b) is how to maintain the maxcolumn value so I only 
> get the
>     > data since the last run.
>     >
>     >
>     >
>     > In order to address b) can I use the DistrubutedMapCacheServer & Client 
> to
>     > hold a key/value pair of last run date and extract from this date as a
>     > parameter?
>     >
>     >
>     >
>     > Thanks for any suggestions.
>     >
>     >
>     >
>     > Conrad
>     >
>     >
>     >
>     > SecureData, combating cyber threats
>     >
>     > ________________________________
>     >
>     > The information contained in this message or any of its attachments may 
> be
>     > privileged and confidential and intended for the exclusive use of the
>     > intended recipient. If you are not the intended recipient any 
> disclosure,
>     > reproduction, distribution or other dissemination or use of this
>     > communications is strictly prohibited. The views expressed in this 
> email are
>     > those of the individual and not necessarily of SecureData Europe Ltd. 
> Any
>     > prices quoted are only valid if followed up by a formal written quote.
>     >
>     > SecureData Europe Limited. Registered in England & Wales 04365896.
>     > Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
>     > Maidstone, Kent, ME16 9NT
>
>
>      ***This email originated outside SecureData***
>
>     Click 
> https://www.mailcontrol.com/sr/N2EHwWuz2JbGX2PQPOmvUiQSa3+T5MHvUCPvvNtzgx8r0bDDMR!duWMwCi4iTZgUfaZqve+Bl7WqQH8FWVwkvA==
>   to report this email as spam.
>
>



-- 
Yohann Lepage

Re: ExecuteSQL question

Reply via email to