Hi, Sidney, What is the data generation rate of your Kafka topic? Is it a lot bigger than 6000?
Best, Yangze Guo Best, Yangze Guo On Tue, Nov 3, 2020 at 8:45 AM Sidney Feiner <sidney.fei...@startapp.com> wrote: > > Hey, > I'm writing a Flink app that does some transformation on an event consumed > from Kafka and then creates time windows keyed by some field, and apply an > aggregation on all those events. > When I run it with parallelism 1, I get a throughput of around 1.6K events > per second (so also 1.6K events per slot). With parallelism 5, that goes down > to 1.2K events per slot, and when I increase the parallelism to 10, it drops > to 600 events per slot. > Which means that parallelism 5 and parallelism 10, give me the same total > throughput (1.2x5 = 600x10). > > I noticed that although I have 3 Task Managers, all the all the tasks are run > on the same machine, causing it's CPU to spike and probably, this is the > reason that the throughput dramatically decreases. After increasing the > parallelism to 15 and now tasks run on 2/3 machines, the average throughput > per slot is still around 600. > > What could cause this dramatic decrease in performance? > > Extra info: > > Flink version 1.9.2 > Flink High Availability mode > 3 task managers, 66 slots total > > > Execution plan: > > > Any help would be much appreciated > > > Sidney Feiner / Data Platform Developer > M: +972.528197720 / Skype: sidney.feiner.startapp > >