Hi, Do I understand correctly that: 1. The workload varies across the jobs but stays the same for the same job 2. With a small number of slots per TM you are concerned about uneven resource utilization when running low- and high-intensive jobs on the same cluster simultaneously?
If so, wouldn't reducing parallelism of low-intensive jobs help? Other options to consider are putting subtasks of high-intensive job into different slot-sharing groups; or breaking operator chains explicitly [1] There are also a number of improvements coming in 1.13 release: [2][3][4]. I'm pulling in Till and Robert who knows this area better. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups [2] https://issues.apache.org/jira/browse/FLINK-21267 [3] https://issues.apache.org/jira/browse/FLINK-10404 [4] https://issues.apache.org/jira/browse/FLINK-14187 Regards, Roman On Fri, Mar 12, 2021 at 5:03 AM Sush Bankapura <sushrutha.bankap...@man-es.com> wrote: > > Hi, > > We have multiple jobs that need to be deployed to a Flink cluster. > Parallelism for jobs vary and dependent on the type of work being done and > so are the memory requirements. All jobs currently use the same state > backend. Since the workloads handled by each job is different, the scaling > pattern also varies. We run all our jobs in a single Flink cluster (7 VMs > with the same instance configuration) > > Most of what I have read in the Flink documentation indicates any of the > following for setting the task slots > > 1. As a rule of thumb, a good default number of task slots will be the number > of CPU cores. With hyper-threading, each slot then takes 2 or more hardware > thread contexts. If you are doing any Blocking IO operations in Flink job, it > is suggested to have more number of slots than the core. > > 2. A Flink cluster needs exactly as many task slots as the highest > parallelism used in the job. No need to calculate how many tasks (with > varying parallelism) a program contains in total. > > I did not find documentation for the task slot setting for the scenario I > have enumerated. While setting a lower value for the task slots seems to work > better for jobs which need to process high amounts of traffic than the other > jobs which process lower amounts of traffic, but this will be inefficient if > the slots are assigned to jobs which work on lower volumes of traffic. > > Depending on the workload handled by each Flink job. rt seems that we would > need to set as many clusters. > > 1. Is this the only option available? > 2. Are there any guidelines on deciding on the number of task slots in such > an environment? > > Thanks, > Sushruth