Re: ExecuteSQL Extract database tables multiple times.

Ralf Meier Sat, 05 Mar 2016 01:36:53 -0800

Hi,

thanks Matt for clarifying things. I got it at the processor is working just 
fine with mysql. 
Now I tried to use it with MS SQL. But here I get some issues and could not 
figure out why it is not working.


My Configuration is:

Nifi: 0.5.0
Java 8
MS SQL 2014

DBCPConnectionPool:
Database Connection URL: 
jdbc:sqlserver://192.168.79.252:1433;databaseName=testdb
Class Name: com.microsoft.sqlserver.jdbc.SQLServerDriver
Jar Url: file:///Users/rmeier/Downloads/tmp/sqljdbc42.jar 
<file:///Users/rmeier/Downloads/tmp/sqljdbc42.jar>
Database user: sa
Password: *********

In the ExecuteSQL I have the following configuration:
MY Connection Pooling.
SQL select query: select * from tuser;

Max Wait Time: 0 seconds

But when I run the processor I get the following error:

10:30:02 CET ERROR
ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273]
ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273] failed to process due to 
org.apache.avro.SchemaParseException: Empty name; rolling back session: 
org.apache.avro.SchemaParseException: Empty name

10:30:02 CET ERROR
ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273] Processor Administratively 
Yielded for 1 sec dure to processing failure


Did somebody of you have an idea how to solve this issue and what is the root 
cause here fore?

Thanks again for your help.
Ralf



> Am 04.03.2016 um 21:17 schrieb Matt Burgess <mattyb...@gmail.com>:
> 
> Currently ExecuteSql will put all available rows into a single flow file. 
> There is a Jira case (https://issues.apache.org/jira/browse/NIFI-1251 
> <https://issues.apache.org/jira/browse/NIFI-1251>) to allow the user to break 
> up the result set into flow files containing a specified number of records.
> 
> I'm not sure why you get 26 flow files, although if you let the flow run for 
> 26 seconds you should see 26 flow files, each with the contents of the 
> "users" table. This is because it will run every second (per your config) and 
> execute the same query ("SELECT * FROM users") every time.  There is a new 
> processor in the works (https://issues.apache.org/jira/browse/NIFI-1575 
> <https://issues.apache.org/jira/browse/NIFI-1575>) that will allow the user 
> to specify "maximum value columns", where the max values for each specified 
> column will be kept track of, so that each subsequent execution of the 
> processor will only retrieve rows whose values for those columns are greater 
> than the currently-held maximum value. An example would be a users table with 
> a primary key user_id, which is strictly increasing. The processor would run 
> once, fetching all available records, then unless a new row is added (with a 
> higher user_id value), no flow files will be output. If rows are added in the 
> meantime, then upon the next execution of the processor, only those "new" 
> rows will be output.
> 
> I'm happy to help you work through this if you'd like to provide more details 
> about your table setup (columns, rows) and flow.
> 
> Regards,
> Matt
> 
> On Fri, Mar 4, 2016 at 3:04 PM, Ralf Meier <n...@cht3.com 
> <mailto:n...@cht3.com>> wrote:
> Hi,
> 
> i tried to understand the executeSQL Processor.
> I created a database with a table „users“. This table has two entries.
> 
> The problem with the processor is that it selected multiple times the entries 
> from the table and created altogether 26 flow files even that only two 
> entries where available. In addition each flow file consist of the both 
> entires.
> 
> I configured the executeSQL Processor the following way:
> Settings: Didn’t changed anything here except of auto terminate on failure:
> Scheduling:
>         Cron based: * * * * * ? (Run every minute)
>         Concurrent tasks: 1
> Properties:
>         Database Connection Pooling Service: DBmysql
>         SQL select query: Select * from user
>         My Wait Time: 0 seconds
> 
> Then I used a processor: convertAvroToJson and a PutFile Processor.
> 
> If I runt the flow it creates 26 flow files and each of them has all entries 
> of the tables as json included.
> 
> My goal is to extract the table ones. So that the entries are only created 
> ones as json as row not 26 times.
> My understanding was that each row of the table will be one flow file and 
> therefore for each line of the table would be one json file on disk (using 
> PutFile).
> 
> But it seems that this not right. What happens if I have millions of entries 
> in such a table? Will this be done with one flow file?
> 
> How would I configure that Nifi extract the table ones?
> 
> It would be great if somebody could help me with this ?
> 
> 
>  BR
> Ralf
>

Re: ExecuteSQL Extract database tables multiple times.

Reply via email to