Hi, Hard to give advice for me with this description.
But how do you pull information from your database ? I mean, did you take a look to the parameter Output batch size ? Do you need to have only one FlowFile for the executed query or can you split results , like 1000 rows per FlowFile ? Le mar. 8 avr. 2025 à 16:30, Eric An <[email protected]> a écrit : > Hi, > > We are on version 1.23.2 and have some questions surrounding the ETL data > pipeline. > > It connects to the Oracle DB to extract/pull data incrementally > (minutes/hours), does transformations and loads to S3. > > However, we are seeing a bottleneck at Oracle/pulling the data, so when we > assign more threads to that it creates a bottleneck at the transformation > stage since it's monopolizing the threads. > > Is there a way to dynamically assign the threads? The data in Oracle is > not uniformly distributed so some days/hours have much more data. For > those days/hours, having access to more threads helps tremendously > extracting the data. > > Any advice/recommendation on how to approach performance tuning in this > scenario? Should we just divy up the available threads to the extract and > transformation processors evenly? Not sure what would be the best way to > optimally assign the number of threads for each processor to maximize > throughput of the pipeline. > > > Best, > Eric >
