Evenly Spreading Out Source Tasks

2021-03-10 Thread Aeden Jameson
I have a cluster with 18 task managers 4 task slots each running a job whose source/sink(s) are declared with FlinkSQL using the Kafka connector. The topic being read has 36 partitions. The problem I'm observing is that the subtasks for the sources are not evenly distributed. For example, 1 tas

Re: Evenly Spreading Out Source Tasks

2021-03-11 Thread Arvid Heise
Hi Aeden, the option that you mentioned should have actually caused your desired behavior. Can you double-check that it's set for the job (you can look at the config in the Flink UI to be 100% sure). Another option is to simply give all task managers 2 slots. In that way, the scheduler can only e

Re: Evenly Spreading Out Source Tasks

2021-03-11 Thread Aeden Jameson
Hi Arvid, Thanks for responding. I did check the configuration tab of the job manager and the setting cluster.evenly-spread-out-slots: true is there. However I'm still observing unevenness in the distribution of source tasks. Perhaps this additional information could shed light. Version: 1.12.1

Re: Evenly Spreading Out Source Tasks

2021-03-12 Thread Matthias Pohl
Hi Aeden, just to be sure: All task managers have the same hardware/memory configuration, haven't they? I'm not 100% sure whether this affects the slot selection in the end, but it looks like this parameter has also an influence on the slot matching strategy preferring slots with less utilization o

Re: Evenly Spreading Out Source Tasks

2021-03-12 Thread Aeden Jameson
Hi Matthias, Yes, all the task managers have the same hardware/memory configuration. Aeden On Fri, Mar 12, 2021 at 3:25 AM Matthias Pohl wrote: > > Hi Aeden, > just to be sure: All task managers have the same hardware/memory > configuration, haven't they? I'm not 100% sure whether this affects

Re: Evenly Spreading Out Source Tasks

2021-03-14 Thread Chesnay Schepler
Is this a brand-new job, with the cluster having all 18 TMs at the time of submission? (or did you add more TMs while the job was running) On 3/12/2021 5:47 PM, Aeden Jameson wrote: Hi Matthias, Yes, all the task managers have the same hardware/memory configuration. Aeden On Fri, Mar 12, 202

Re: Evenly Spreading Out Source Tasks

2021-03-14 Thread Xintong Song
Hi Aeden, IIUC, the topic being read has 36 partitions means that your source task has a parallelism of 36. What's the parallelism of other tasks? Is the job taking use of all the 72 (18 TMs * 4 slots/TM) slots? I'm afraid currently there's no good way to guarantee subtasks of a task are spread o

Re: Evenly Spreading Out Source Tasks

2021-03-15 Thread Aeden Jameson
Hi Xintong, Thanks for replying. Yes, you understood my scenario. Every task has the same parallelism since we're using FlinkSql unless there is a way to change the parallelism of the source task that I have missed. Your explanation of the setting makes sense and is what I ended up concluding

Re: Evenly Spreading Out Source Tasks

2021-03-15 Thread Xintong Song
If all the tasks have the same parallelism 36, your job should only allocate 36 slots. The evenly-spread-out-slots option should help in your case. Is it possible for you to share the complete jobmanager logs? Thank you~ Xintong Song On Tue, Mar 16, 2021 at 12:46 AM Aeden Jameson wrote: >

Re: Evenly Spreading Out Source Tasks

2021-03-17 Thread Aeden Jameson
There may be a slight misunderstanding: all the FlinkSql tasks _were_ set at a parallelism of 72 -- 18 nodes 4 slots. I was hoping that the setting cluster.evenly-spread-out-slots would spread out the active kafka consumers evenly among the TM's given the topic has 36 partitions, but I now realize