Re: Kafka Streams - partition assignment for the input topic

Matthias J. Sax Fri, 20 Mar 2020 10:21:15 -0700

Partition assignment, or move specific "task placement" for Kafka
Streams, is a hard-coded algorithm (cf.
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignor.java).
The algorithm actually tires to assign different tasks from the same
sub-topology to different instances and thus, your 6 input topic
partitions should ideally get balanced over your 3 instance (ie, 2 each,
one for each thread).


However, the algorithm needs to trade-off load balancing and stickiness
(to avoid unnecessary, expensive state migration) and thus, the
placement strategy is best effort only. Also, in older versions there
was some issue that got fixed in newer version (ie, 2.0.x and newer).
Not sure what version you are on (as you linked to 1.0 docs, maybe
upgrade resolves your issue?).

Compare:

 - https://issues.apache.org/jira/browse/KAFKA-6039
 - https://issues.apache.org/jira/browse/KAFKA-7144

If you still observe issues in never version, please comment on the
tickets ofr create a new ticket describing the problem. Or even better,
do a PR to help improving the "task placement" algorithm. :)


-Matthias


On 3/20/20 6:47 AM, Stephen Young wrote:
> Thanks Guozhang. That's really helpful!
> 
> Are you able to explain a bit more about how it would work for my use case? 
> As I understand it this 'repartition' method enables us to materialize a 
> stream to a new topic with a custom partitioning strategy.
> 
> But my problem is not how the topic is partitioned. My issue is that the 
> partitions of the source topic need to be spread equally amongst all the 
> available threads. How could 'repartition' help with this?
> 
> Stephen
> 
> On 2020/03/19 23:20:54, Guozhang Wang <wangg...@gmail.com> wrote: 
>> Hi Stephen,
>>
>> We've deprecated the partition-grouper API due to its drawbacks in
>> upgrading compatibility (consider if you want to change the num.partitions
>> while evolving your application), and instead we're working on KIP-221 for
>> the same purpose of your use case:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+DSL+with+Connecting+Topic+Creation+and+Repartition+Hint
>>
>>
>> Guozhang
>>
>> On Wed, Mar 18, 2020 at 7:48 AM Stephen Young
>> <wintersg...@googlemail.com.invalid> wrote:
>>
>>> I have a question about partition assignment for a kafka streams app. As I
>>> understand it the more complex your topology is the greater the number of
>>> internal topics kafka streams will create. In my case the app has 8 graphs
>>> in the topology. There are 6 partitions for each graph (this matches the
>>> number of partitions of the input topic). So there are 48 partitions that
>>> the app needs to handle. These get balanced equally across all 3 servers
>>> where the app is running (each server also has 2 threads so there are 6
>>> available instances of the app).
>>>
>>> The problem for me is that the partitions of the input topic have the
>>> heaviest workload. But these 6 partitions are not distributed evenly
>>> amongst the instances. They are just considered 6 partitions amongst the 48
>>> the app needs to balance. But this means if a server gets most or all of
>>> these 6 partitions, it ends up exhausting all of the resources on that
>>> server.
>>>
>>> Is there a way of equally balancing these 6 specific partitions amongst the
>>> available instances? I thought writing a custom partition grouper might
>>> help here:
>>>
>>>
>>> https://kafka.apache.org/10/documentation/streams/developer-guide/config-streams.html#partition-grouper
>>>
>>> But the advice seems to be to not do this otherwise you risk breaking the
>>> app.
>>>
>>> Thanks!
>>>
>>
>>
>> -- 
>> -- Guozhang
>>

signature.asc
Description: OpenPGP digital signature

Re: Kafka Streams - partition assignment for the input topic

Reply via email to