Hi, Yes, in your case half of the Kafka source tasks wouldn’t read/process any records (you can check that in web UI). This shouldn’t harm you, unless your records will be redistributed after the source. For example:
source.keyBy(..).process(new MyVeryHeavyOperator()).print() Should be fine, because `keyBy(…)` will redistribute records. However source.map(new MyVeryHeavyOperator()).print() Will mean that half of `MyVeryHeavyOperator`s will be idling as well. To solve that, you might want to consider using dataStream.rebalance(); Piotrek > On 21 Sep 2018, at 13:25, Taher Koitawala <taher.koitaw...@gslab.com> wrote: > > Hi All, > Let's say a topic in kafka has 5 partitions. If I spawn 10 Task > Managers with 1 slot each and parallelism is 10 then how will records be read > from the kafka topic if I use the FlinkKafkaConsumer to read. > > Will 5 TM's read and the rest be ideal in that case? Is over subscribing the > number of TM's than the number of partitions in the Kafka topic guarantee > high throughput? > > Regards, > Taher Koitawala > GS Lab Pune > +91 8407979163