Re: Issue with GenerateTableFetch bulk load when using MS SQL Server database type

Matt Burgess Thu, 27 Jul 2017 10:47:47 -0700

This can be considered a bug, at least in the sense that it is not
consistent with how the Generic DB Adapter handles it (i.e. no
requirement for an order by clause). Not to say that it's a good idea
to not have an ORDER BY clause when you are getting pages of rows, but
in this case, the incremental functionality requires a max-value
column, and if not using incremental, it could be presumed that the
user is getting all rows (certainly GenerateTableFetch would generate
the SQL to get all rows, no matter the order). So if we say that the
requirement for "ordering when using limits/offsets" is on the
processor and not the DB adapter, then we can remove the requirement
from the DB adapter(s), and let it be enforced at a processor level if
it makes sense to do so.


How does that sound to everyone?  A Jira case (bug or improvement,
whatever) to remove the requirement on ordering being necessary for
pagination, in the MSSQL 2012+ adapter (and any others where it is
imposed)?

Thanks,
Matt


On Mon, Jul 24, 2017 at 9:55 AM, Bryan Quinn <bryan.qu...@asavie.com> wrote:
> Hi Matt.
>
> I’ll have a look at option a. What’s the timeframe for a fix and prod
> release for this assuming it’s a bug? Ballpark.
>
>
>
> Thanks,
>
> Bryan
>
>
>
> From: Matt Burgess [mailto:mattyb...@gmail.com]
> Sent: Monday 24 July 2017 14:40
> To: users@nifi.apache.org
> Subject: Re: Issue with GenerateTableFetch bulk load when using MS SQL
> Server database type
>
>
>
> Bryan,
>
>
>
> This is likely a bug, I will investigate and write it up if so. In the
> meantime, are you planning on doing the bulk fetch in parallel (with a
> Remote Process Group into ExecuteSQL across a NiFi cluster)? If not, you may
> find that QueryDatabaseTable is a good alternative, it can be configured to
> fetch N rows at a time. The trade off is that it is not parallel or
> concurrent (each N rows will be fetched one batch at a time), but then each
> batch can be processed concurrently or in parallel.
>
>
>
> Another alternative, if you know the approximate size of the database, is to
> use GenerateFlowFile to write the SQL statements yourself, then SplitText to
> get one statement per flow file, then ExecuteSQL (concurrently or in
> parallel). This solution does not scale with your database size but with
> some copy and paste (and updating as the DB grows), can emulate
> GenerateTableFetch in the short term.
>
>
>
> Regards,
>
> Matt
>
>
> On Jul 24, 2017, at 8:55 AM, Bryan Quinn <bryan.qu...@asavie.com> wrote:
>
> Hi,
>
> I’m trying to use the GenerateTableFetch component for ‘MS SQL 2012+’
> database – for a full table export each time its run.
>
> However I’m forced to set a column name in the ‘Maximum-value Columns’ field
> when the component is run (error only shows during execution if I omit this
> setting).
>
> I’m seeing the following error:
>
> IllegalArgumentException: Order by clause cannot be null or empty when using
> row paging
>
>
>
> So it looks like it only works with incremental mode – which I don’t want.
>
>
>
> Can someone see why this is the case or is it a bug?
>
> I’m not getting this error if I set the database type to ‘Generic’ – but
> unfortunately the sql it generates isn’t valid for MS SQL server.
>
> I’m using Nifi v1.3.0
>
>
>
> Thanks,
>
> Bryan
>
>
>
> ---
>
> Bryan Quinn | Asavie
>
> Principal Analytics Architect
>
> E: bryan.qu...@asavie.com
>
> W: www.asavie.com | T: @asavie
>
>

Re: Issue with GenerateTableFetch bulk load when using MS SQL Server database type

Reply via email to