Re: How does flink read data from kafka number of TM's are more than topic partitions

Piotr Nowojski Fri, 21 Sep 2018 06:44:29 -0700

No problem :)

Piotrek


> On 21 Sep 2018, at 15:04, Taher Koitawala <taher.koitaw...@gslab.com> wrote:
> 
> Thanks a lot for the explanation. That was exactly what I thought should 
> happen. However, it is always good to a clear confirmation.
> 
> 
> Regards,
> Taher Koitawala
> GS Lab Pune
> +91 8407979163
> 
> 
> On Fri, Sep 21, 2018 at 6:26 PM Piotr Nowojski <pi...@data-artisans.com 
> <mailto:pi...@data-artisans.com>> wrote:
> Hi,
> 
> Yes, in your case half of the Kafka source tasks wouldn’t read/process any 
> records (you can check that in web UI). This shouldn’t harm you, unless your 
> records will be redistributed after the source. For example:
> 
> source.keyBy(..).process(new MyVeryHeavyOperator()).print()
> 
> Should be fine, because `keyBy(…)` will redistribute records. However
> 
> source.map(new MyVeryHeavyOperator()).print()
> 
> Will mean that half of `MyVeryHeavyOperator`s will be idling as well. To 
> solve that, you might want to consider using 
> 
> dataStream.rebalance();
> 
> Piotrek
> 
>> On 21 Sep 2018, at 13:25, Taher Koitawala <taher.koitaw...@gslab.com 
>> <mailto:taher.koitaw...@gslab.com>> wrote:
>> 
>> Hi All,
>>          Let's say a topic in kafka has 5 partitions. If I spawn 10 Task 
>> Managers with 1 slot each and parallelism is 10 then how will records be 
>> read from the kafka topic if I use the FlinkKafkaConsumer to read.
>> 
>> Will 5 TM's read and the rest be ideal in that case? Is over subscribing the 
>> number of TM's than the number of partitions in the Kafka topic guarantee 
>> high throughput?
>>  
>> Regards,
>> Taher Koitawala
>> GS Lab Pune
>> +91 8407979163
>

Re: How does flink read data from kafka number of TM's are more than topic partitions

Reply via email to