Re: uneven distribution of events across kafka topic partitions for small number of unique keys

Dave Klein Mon, 22 Nov 2021 04:33:14 -0800

I’m sorry.  I misread your message.  I thought you were asking about increasing 
the number of partitions on a topic after there were keyed events in it.


> On Nov 22, 2021, at 3:07 AM, Pushkar Deole <[email protected]> wrote:
> 
> Dave,
> 
> i am not sure i get your point... it is not about lesser partitions, the
> issue is about the duplicate hash caused by default partitioner for 2
> different string, which might be landing the 2 different keys into same
> partition
> 
>> On Sun, Nov 21, 2021 at 9:33 PM Dave Klein <[email protected]> wrote:
>> 
>> Another possibility, if you can pause processing, is to create a new topic
>> with the higher number of partitions, then consume from the beginning of
>> the old topic and produce to the new one. Then continue processing as
>> normal and all events will be in the correct partitions.
>> 
>> Regards,
>> Dave
>> 
>>>> On Nov 21, 2021, at 7:38 AM, Pushkar Deole <[email protected]> wrote:
>>> 
>>> Thanks Luke, I am sure this problem would have been faced by many others
>>> before so would like to know if there are any existing custom algorithms
>>> that can be reused,
>>> 
>>> Note that we also have requirement to maintain key level ordering,  so
>> the
>>> custom partitioner should support that as well
>>> 
>>>> On Sun, Nov 21, 2021, 18:29 Luke Chen <[email protected]> wrote:
>>>> 
>>>> Hello Pushkar,
>>>> Default distribution algorithm is by "hash(key) % partition_count", so
>>>> there's possibility to have the uneven distribution you saw.
>>>> 
>>>> Yes, there's a way to solve your problem: custom partitioner:
>>>> 
>> https://kafka.apache.org/documentation/#producerconfigs_partitioner.class
>>>> 
>>>> You can check the partitioner javadoc here
>>>> <
>>>> 
>> https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/producer/Partitioner.html
>>>>> 
>>>> for reference. You can see some examples from built-in partitioners, ex:
>>>> 
>>>> 
>> clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java.
>>>> Basically, you want to focus on the "partition" method, to define your
>> own
>>>> algorithm to distribute the keys based on the events, ex: key-1 ->
>>>> partition-1, key-2 -> partition-2... etc.
>>>> 
>>>> Thank you.
>>>> Luke
>>>> 
>>>> 
>>>> On Sat, Nov 20, 2021 at 2:55 PM Pushkar Deole <[email protected]>
>>>> wrote:
>>>> 
>>>>> Hi All,
>>>>> 
>>>>> We are experiencing some uneven distribution of events across topic
>>>>> partitions for a small set of unique keys: following are the details:
>>>>> 
>>>>> 1. topic with 6 partitions
>>>>> 2. 8 unique keys used to produce events onto the topic
>>>>> 
>>>>> Used 'key' based partitioning while producing events onto the above
>> topic
>>>>> Observation: only 3 partitions were utilized for all the events
>>>> pertaining
>>>>> to those 8 unique keys.
>>>>> 
>>>>> Any idea how can the load be even across partitions while using key
>> based
>>>>> partitioning strategy? Any help would be greatly appreciated.
>>>>> 
>>>>> Note: we cannot use round robin since key level ordering matters for us
>>>>> 
>>>> 
>> 
>>

Re: uneven distribution of events across kafka topic partitions for small number of unique keys

Reply via email to