Thanks, Matt.

On Wed, Jun 22, 2022 at 10:46 AM Matt Burgess <mattyb...@apache.org> wrote:
>
> Mike,
>
> I recommend QueryDatabaseTableRecord with judicious choices for the
> following properties:
>
> Fetch Size: This should be tuned to return the most number of rows
> without causing network issues such as timeouts. Can be set to the
> same value as Max Rows Per Flow File ensuring one fetch per outgoing
> FlowFile
> Max Rows Per Flow File: This should be set to a reasonable number of
> rows per FlowFile, maybe 100K or even 1M if that doesn't cause issues
> (see above)
> Output Batch Size: This is the key to doing full selects on huge
> tables, as it allows FlowFiles to be committed to the session and
> passed downstream while the rest of the fetch is being processed. In
> your case if you set Max Rows to 100K then this could be 10, or if you
> set it to 1M it could be 1. Note that with this property set, the
> maxvalue.* and fragment.count attributes will not be set on these
> FlowFiles, so you can't merge them.  I believe the maxvalue state will
> still be updated even if this property is used, so it should turn into
> an incremental fetch after the first full fetch is complete.
>
> Regards,
> Matt
>
> On Wed, Jun 22, 2022 at 10:00 AM Mike Thomsen <mikerthom...@gmail.com> wrote:
> >
> > We have a table with 68M records that will blow up to over 250M soon,
> > and need to do a full table fetch on it. What's the best practice for
> > efficiently doing a partial or full select on it?
> >
> > Thanks,
> >
> > Mike

Reply via email to