Re: ExecuteSQL Extract database tables multiple times.

Marcelo Valle Ávila Sat, 05 Mar 2016 13:46:16 -0800

Hello Ralf,

I'm suffering the same behaviour, taking data from Oracle DB


failed to process due to org.apache.avro.SchemaParseException: Empty name

With NiFi 0.4.1 ExecuteSQL processor works fine, it seems that in 0.5.0 and
0.5.1 there is some bug with Oracle databases.

I test Nifi 0.5.1 processor connecting with DB2 database and works fine.

What Database engine are you using?

Regards!


...................................................................

Marcelo Valle Ávila

mva...@keedio.com |+34 630 371 156

www.keedio.com

...................................................................


2016-03-05 10:36 GMT+01:00 Ralf Meier <n...@cht3.com>:

> Hi,
>
> thanks Matt for clarifying things. I got it at the processor is working
> just fine with mysql.
> Now I tried to use it with MS SQL. But here I get some issues and could
> not figure out why it is not working.
>
> My Configuration is:
>
> Nifi: 0.5.0
> Java 8
> MS SQL 2014
>
> DBCPConnectionPool:
> Database Connection URL: jdbc:sqlserver://192.168.79.252:1433
> ;databaseName=testdb
> Class Name: com.microsoft.sqlserver.jdbc.SQLServerDriver
> Jar Url: file:///Users/rmeier/Downloads/tmp/sqljdbc42.jar
> Database user: sa
> Password: *********
>
> In the ExecuteSQL I have the following configuration:
> MY Connection Pooling.
> SQL select query: select * from tuser;
>
> Max Wait Time: 0 seconds
>
> But when I run the processor I get the following error:
>
> 10:30:02 CET ERROR
> ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273]
> ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273] failed to process due
> to org.apache.avro.SchemaParseException: Empty name; rolling back session:
> org.apache.avro.SchemaParseException: Empty name
>
> 10:30:02 CET ERROR
> ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273] Processor
> Administratively Yielded for 1 sec dure to processing failure
>
>
> Did somebody of you have an idea how to solve this issue and what is the
> root cause here fore?
>
> Thanks again for your help.
> Ralf
>
>
>
> Am 04.03.2016 um 21:17 schrieb Matt Burgess <mattyb...@gmail.com>:
>
> Currently ExecuteSql will put all available rows into a single flow file.
> There is a Jira case (https://issues.apache.org/jira/browse/NIFI-1251) to
> allow the user to break up the result set into flow files containing a
> specified number of records.
>
> I'm not sure why you get 26 flow files, although if you let the flow run
> for 26 seconds you should see 26 flow files, each with the contents of the
> "users" table. This is because it will run every second (per your config)
> and execute the same query ("SELECT * FROM users") every time.  There is a
> new processor in the works (
> https://issues.apache.org/jira/browse/NIFI-1575) that will allow the user
> to specify "maximum value columns", where the max values for each specified
> column will be kept track of, so that each subsequent execution of the
> processor will only retrieve rows whose values for those columns are
> greater than the currently-held maximum value. An example would be a users
> table with a primary key user_id, which is strictly increasing. The
> processor would run once, fetching all available records, then unless a new
> row is added (with a higher user_id value), no flow files will be output.
> If rows are added in the meantime, then upon the next execution of the
> processor, only those "new" rows will be output.
>
> I'm happy to help you work through this if you'd like to provide more
> details about your table setup (columns, rows) and flow.
>
> Regards,
> Matt
>
> On Fri, Mar 4, 2016 at 3:04 PM, Ralf Meier <n...@cht3.com> wrote:
>
>> Hi,
>>
>> i tried to understand the executeSQL Processor.
>> I created a database with a table „users“. This table has two entries.
>>
>> The problem with the processor is that it selected multiple times the
>> entries from the table and created altogether 26 flow files even that only
>> two entries where available. In addition each flow file consist of the both
>> entires.
>>
>> I configured the executeSQL Processor the following way:
>> Settings: Didn’t changed anything here except of auto terminate on
>> failure:
>> Scheduling:
>>         Cron based: * * * * * ? (Run every minute)
>>         Concurrent tasks: 1
>> Properties:
>>         Database Connection Pooling Service: DBmysql
>>         SQL select query: Select * from user
>>         My Wait Time: 0 seconds
>>
>> Then I used a processor: convertAvroToJson and a PutFile Processor.
>>
>> If I runt the flow it creates 26 flow files and each of them has all
>> entries of the tables as json included.
>>
>> My goal is to extract the table ones. So that the entries are only
>> created ones as json as row not 26 times.
>> My understanding was that each row of the table will be one flow file and
>> therefore for each line of the table would be one json file on disk (using
>> PutFile).
>>
>> But it seems that this not right. What happens if I have millions of
>> entries in such a table? Will this be done with one flow file?
>>
>> How would I configure that Nifi extract the table ones?
>>
>> It would be great if somebody could help me with this ?
>>
>>
>>  BR
>> Ralf
>
>
>
>

Re: ExecuteSQL Extract database tables multiple times.

Reply via email to