Hi,

Hard to give advice for me with this description.

But how do you pull information from your database ?
I mean, did you take a look to the parameter Output batch size ?

Do you need to have only one FlowFile for the executed query or can you
split results , like 1000 rows per FlowFile ?



Le mar. 8 avr. 2025 à 16:30, Eric An <[email protected]> a écrit :

> Hi,
>
> We are on version 1.23.2 and have some questions surrounding the ETL data
> pipeline.
>
> It connects to the Oracle DB to extract/pull data incrementally
> (minutes/hours), does transformations and loads to S3.
>
> However, we are seeing a bottleneck at Oracle/pulling the data, so when we
> assign more threads to that it creates a bottleneck at the transformation
> stage since it's monopolizing the threads.
>
> Is there a way to dynamically assign the threads?  The data in Oracle is
> not uniformly distributed so some days/hours have much more data.  For
> those days/hours, having access to more threads helps tremendously
> extracting the data.
>
> Any advice/recommendation on how to approach performance tuning in this
> scenario?  Should we just divy up the available threads to the extract and
> transformation processors evenly?  Not sure what would be the best way to
> optimally assign the number of threads for each processor to maximize
> throughput of the pipeline.
>
>
> Best,
> Eric
>

Reply via email to