Thanks, Matt.
On Wed, Jun 22, 2022 at 10:46 AM Matt Burgess <mattyb...@apache.org> wrote: > > Mike, > > I recommend QueryDatabaseTableRecord with judicious choices for the > following properties: > > Fetch Size: This should be tuned to return the most number of rows > without causing network issues such as timeouts. Can be set to the > same value as Max Rows Per Flow File ensuring one fetch per outgoing > FlowFile > Max Rows Per Flow File: This should be set to a reasonable number of > rows per FlowFile, maybe 100K or even 1M if that doesn't cause issues > (see above) > Output Batch Size: This is the key to doing full selects on huge > tables, as it allows FlowFiles to be committed to the session and > passed downstream while the rest of the fetch is being processed. In > your case if you set Max Rows to 100K then this could be 10, or if you > set it to 1M it could be 1. Note that with this property set, the > maxvalue.* and fragment.count attributes will not be set on these > FlowFiles, so you can't merge them. I believe the maxvalue state will > still be updated even if this property is used, so it should turn into > an incremental fetch after the first full fetch is complete. > > Regards, > Matt > > On Wed, Jun 22, 2022 at 10:00 AM Mike Thomsen <mikerthom...@gmail.com> wrote: > > > > We have a table with 68M records that will blow up to over 250M soon, > > and need to do a full table fetch on it. What's the best practice for > > efficiently doing a partial or full select on it? > > > > Thanks, > > > > Mike