No problem :) Piotrek
> On 21 Sep 2018, at 15:04, Taher Koitawala <taher.koitaw...@gslab.com> wrote: > > Thanks a lot for the explanation. That was exactly what I thought should > happen. However, it is always good to a clear confirmation. > > > Regards, > Taher Koitawala > GS Lab Pune > +91 8407979163 > > > On Fri, Sep 21, 2018 at 6:26 PM Piotr Nowojski <pi...@data-artisans.com > <mailto:pi...@data-artisans.com>> wrote: > Hi, > > Yes, in your case half of the Kafka source tasks wouldn’t read/process any > records (you can check that in web UI). This shouldn’t harm you, unless your > records will be redistributed after the source. For example: > > source.keyBy(..).process(new MyVeryHeavyOperator()).print() > > Should be fine, because `keyBy(…)` will redistribute records. However > > source.map(new MyVeryHeavyOperator()).print() > > Will mean that half of `MyVeryHeavyOperator`s will be idling as well. To > solve that, you might want to consider using > > dataStream.rebalance(); > > Piotrek > >> On 21 Sep 2018, at 13:25, Taher Koitawala <taher.koitaw...@gslab.com >> <mailto:taher.koitaw...@gslab.com>> wrote: >> >> Hi All, >> Let's say a topic in kafka has 5 partitions. If I spawn 10 Task >> Managers with 1 slot each and parallelism is 10 then how will records be >> read from the kafka topic if I use the FlinkKafkaConsumer to read. >> >> Will 5 TM's read and the rest be ideal in that case? Is over subscribing the >> number of TM's than the number of partitions in the Kafka topic guarantee >> high throughput? >> >> Regards, >> Taher Koitawala >> GS Lab Pune >> +91 8407979163 >