Re: ExecuteSQL Extract database tables multiple times.

Marcelo Valle Ávila Mon, 07 Mar 2016 00:39:07 -0800

Hello Matt.

I've read about the new SQL processor that you are developing (
https://issues.apache.org/jira/browse/NIFI-1575) and is really Interesting.
I have started to work in the same idea for a new processor.
We developed a very similar Flume Source (
https://github.com/keedio/flume-ng-sql-source) with the same behavior,
importing only the new rows added in a table (and some additional
features), and we want to use the same logic for the NiFi processor..


Maybe, if you found the Flume Source interesting, we can start working
together in https://issues.apache.org/jira/browse/NIFI-1575

Regards!
Marcelo

El sábado, 5 de marzo de 2016, Marcelo Valle Ávila <mva...@keedio.com>
escribió:

> Sorry, not see that you are using MS SQL Server.
> I deployed a host with MS SQL and the issue reproduces too.
>
> My enviroment:
>
> Nifi 0.5.1
> Java 7
> MS SQL Server 2008
>
> With Oracle doesn't works too, but with DB2 works perfect.
>
>
> 2016-03-05 22:45 GMT+01:00 Marcelo Valle Ávila <mva...@keedio.com
> <javascript:_e(%7B%7D,'cvml','mva...@keedio.com');>>:
>
>> Hello Ralf,
>>
>> I'm suffering the same behaviour, taking data from Oracle DB
>>
>> failed to process due to org.apache.avro.SchemaParseException: Empty name
>>
>> With NiFi 0.4.1 ExecuteSQL processor works fine, it seems that in 0.5.0
>> and 0.5.1 there is some bug with Oracle databases.
>>
>> I test Nifi 0.5.1 processor connecting with DB2 database and works fine.
>>
>> What Database engine are you using?
>>
>> Regards!
>>
>>
>> 2016-03-05 10:36 GMT+01:00 Ralf Meier <n...@cht3.com
>> <javascript:_e(%7B%7D,'cvml','n...@cht3.com');>>:
>>
>>> Hi,
>>>
>>> thanks Matt for clarifying things. I got it at the processor is working
>>> just fine with mysql.
>>> Now I tried to use it with MS SQL. But here I get some issues and could
>>> not figure out why it is not working.
>>>
>>> My Configuration is:
>>>
>>> Nifi: 0.5.0
>>> Java 8
>>> MS SQL 2014
>>>
>>> DBCPConnectionPool:
>>> Database Connection URL: jdbc:sqlserver://192.168.79.252:1433
>>> ;databaseName=testdb
>>> Class Name: com.microsoft.sqlserver.jdbc.SQLServerDriver
>>> Jar Url: file:///Users/rmeier/Downloads/tmp/sqljdbc42.jar
>>> Database user: sa
>>> Password: *********
>>>
>>> In the ExecuteSQL I have the following configuration:
>>> MY Connection Pooling.
>>> SQL select query: select * from tuser;
>>>
>>> Max Wait Time: 0 seconds
>>>
>>> But when I run the processor I get the following error:
>>>
>>> 10:30:02 CET ERROR
>>> ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273]
>>> ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273] failed to process
>>> due to org.apache.avro.SchemaParseException: Empty name; rolling back
>>> session: org.apache.avro.SchemaParseException: Empty name
>>>
>>> 10:30:02 CET ERROR
>>> ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273] Processor
>>> Administratively Yielded for 1 sec dure to processing failure
>>>
>>>
>>> Did somebody of you have an idea how to solve this issue and what is the
>>> root cause here fore?
>>>
>>> Thanks again for your help.
>>> Ralf
>>>
>>>
>>>
>>> Am 04.03.2016 um 21:17 schrieb Matt Burgess <mattyb...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','mattyb...@gmail.com');>>:
>>>
>>> Currently ExecuteSql will put all available rows into a single flow
>>> file. There is a Jira case (
>>> https://issues.apache.org/jira/browse/NIFI-1251) to allow the user to
>>> break up the result set into flow files containing a specified number of
>>> records.
>>>
>>> I'm not sure why you get 26 flow files, although if you let the flow run
>>> for 26 seconds you should see 26 flow files, each with the contents of the
>>> "users" table. This is because it will run every second (per your config)
>>> and execute the same query ("SELECT * FROM users") every time.  There is a
>>> new processor in the works (
>>> https://issues.apache.org/jira/browse/NIFI-1575) that will allow the
>>> user to specify "maximum value columns", where the max values for each
>>> specified column will be kept track of, so that each subsequent execution
>>> of the processor will only retrieve rows whose values for those columns are
>>> greater than the currently-held maximum value. An example would be a users
>>> table with a primary key user_id, which is strictly increasing. The
>>> processor would run once, fetching all available records, then unless a new
>>> row is added (with a higher user_id value), no flow files will be output.
>>> If rows are added in the meantime, then upon the next execution of the
>>> processor, only those "new" rows will be output.
>>>
>>> I'm happy to help you work through this if you'd like to provide more
>>> details about your table setup (columns, rows) and flow.
>>>
>>> Regards,
>>> Matt
>>>
>>> On Fri, Mar 4, 2016 at 3:04 PM, Ralf Meier <n...@cht3.com
>>> <javascript:_e(%7B%7D,'cvml','n...@cht3.com');>> wrote:
>>>
>>>> Hi,
>>>>
>>>> i tried to understand the executeSQL Processor.
>>>> I created a database with a table „users“. This table has two entries.
>>>>
>>>> The problem with the processor is that it selected multiple times the
>>>> entries from the table and created altogether 26 flow files even that only
>>>> two entries where available. In addition each flow file consist of the both
>>>> entires.
>>>>
>>>> I configured the executeSQL Processor the following way:
>>>> Settings: Didn’t changed anything here except of auto terminate on
>>>> failure:
>>>> Scheduling:
>>>>         Cron based: * * * * * ? (Run every minute)
>>>>         Concurrent tasks: 1
>>>> Properties:
>>>>         Database Connection Pooling Service: DBmysql
>>>>         SQL select query: Select * from user
>>>>         My Wait Time: 0 seconds
>>>>
>>>> Then I used a processor: convertAvroToJson and a PutFile Processor.
>>>>
>>>> If I runt the flow it creates 26 flow files and each of them has all
>>>> entries of the tables as json included.
>>>>
>>>> My goal is to extract the table ones. So that the entries are only
>>>> created ones as json as row not 26 times.
>>>> My understanding was that each row of the table will be one flow file
>>>> and therefore for each line of the table would be one json file on disk
>>>> (using PutFile).
>>>>
>>>> But it seems that this not right. What happens if I have millions of
>>>> entries in such a table? Will this be done with one flow file?
>>>>
>>>> How would I configure that Nifi extract the table ones?
>>>>
>>>> It would be great if somebody could help me with this ?
>>>>
>>>>
>>>>  BR
>>>> Ralf
>>>
>>>
>>>
>>>
>>
>

Re: ExecuteSQL Extract database tables multiple times.

Reply via email to