Hello, 

We are working on a project where we want to gather information about the job 
performance across different task level parallelism settings.
Essentially, we want to see how the throughput of a single task varies across 
different parallelism settings, e.g. for a job of 5 tasks: 1-1-1-1-1 vs. 
1-2-1-1-1 vs. 2-2-2-2-2. 

We are running flink on Kubernetes, a job with 5 tasks, slot sharing is 
enabled, operator chasing is disabled and each task manager has one slot.

So, the number of task managers is always the number of the highest parallelism 
and wen can fit the entire job into one task manager slot. 

We are then running the job against multiple parallelism configs (such as those 
above), collect the relevant metrics and try to get some useful information out 
of them. 

We are now wondering how independent our results are from one another. More 
specifically, if we now look at the parallelism of the second task, is its 
performance independent of the parallelism of the other tasks? So, will a the 
second task perform the same in (1-2-1-1-1) as in (2-2-2-2-2)? 

Our take on it is the following: With our setup, (1-2-1-1-1) should result in 
one task manager holding the entire job and a second task manager that only 
runs the second task. (2-2-2-2-2) will run two task managers with the entire 
job. So, theoretically, the second task should have much more resources 
available in the first setup as it has the entire resources of that task 
manager to its disposal. Does that assumption hold or will flink assign a 
certain amount of resources to a task in a task manager no matter how many 
other tasks are running on that same task manager slot? 

We would highly appreciate any help. 

Best, 
Jan

Reply via email to