Re: How does flink read data from kafka number of TM's are more than topic partitions

Piotr Nowojski Fri, 21 Sep 2018 05:56:46 -0700

Hi,

Yes, in your case half of the Kafka source tasks wouldn’t read/process any 
records (you can check that in web UI). This shouldn’t harm you, unless your 
records will be redistributed after the source. For example:


source.keyBy(..).process(new MyVeryHeavyOperator()).print()

Should be fine, because `keyBy(…)` will redistribute records. However

source.map(new MyVeryHeavyOperator()).print()

Will mean that half of `MyVeryHeavyOperator`s will be idling as well. To solve 
that, you might want to consider using 

dataStream.rebalance();

Piotrek

> On 21 Sep 2018, at 13:25, Taher Koitawala <taher.koitaw...@gslab.com> wrote:
> 
> Hi All,
>          Let's say a topic in kafka has 5 partitions. If I spawn 10 Task 
> Managers with 1 slot each and parallelism is 10 then how will records be read 
> from the kafka topic if I use the FlinkKafkaConsumer to read.
> 
> Will 5 TM's read and the rest be ideal in that case? Is over subscribing the 
> number of TM's than the number of partitions in the Kafka topic guarantee 
> high throughput?
>  
> Regards,
> Taher Koitawala
> GS Lab Pune
> +91 8407979163

Re: How does flink read data from kafka number of TM's are more than topic partitions

Reply via email to