Dave,

i am not sure i get your point... it is not about lesser partitions, the
issue is about the duplicate hash caused by default partitioner for 2
different string, which might be landing the 2 different keys into same
partition

On Sun, Nov 21, 2021 at 9:33 PM Dave Klein <[email protected]> wrote:

> Another possibility, if you can pause processing, is to create a new topic
> with the higher number of partitions, then consume from the beginning of
> the old topic and produce to the new one. Then continue processing as
> normal and all events will be in the correct partitions.
>
> Regards,
> Dave
>
> > On Nov 21, 2021, at 7:38 AM, Pushkar Deole <[email protected]> wrote:
> >
> > Thanks Luke, I am sure this problem would have been faced by many others
> > before so would like to know if there are any existing custom algorithms
> > that can be reused,
> >
> > Note that we also have requirement to maintain key level ordering,  so
> the
> > custom partitioner should support that as well
> >
> >> On Sun, Nov 21, 2021, 18:29 Luke Chen <[email protected]> wrote:
> >>
> >> Hello Pushkar,
> >> Default distribution algorithm is by "hash(key) % partition_count", so
> >> there's possibility to have the uneven distribution you saw.
> >>
> >> Yes, there's a way to solve your problem: custom partitioner:
> >>
> https://kafka.apache.org/documentation/#producerconfigs_partitioner.class
> >>
> >> You can check the partitioner javadoc here
> >> <
> >>
> https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/producer/Partitioner.html
> >>>
> >> for reference. You can see some examples from built-in partitioners, ex:
> >>
> >>
> clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java.
> >> Basically, you want to focus on the "partition" method, to define your
> own
> >> algorithm to distribute the keys based on the events, ex: key-1 ->
> >> partition-1, key-2 -> partition-2... etc.
> >>
> >> Thank you.
> >> Luke
> >>
> >>
> >> On Sat, Nov 20, 2021 at 2:55 PM Pushkar Deole <[email protected]>
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> We are experiencing some uneven distribution of events across topic
> >>> partitions for a small set of unique keys: following are the details:
> >>>
> >>> 1. topic with 6 partitions
> >>> 2. 8 unique keys used to produce events onto the topic
> >>>
> >>> Used 'key' based partitioning while producing events onto the above
> topic
> >>> Observation: only 3 partitions were utilized for all the events
> >> pertaining
> >>> to those 8 unique keys.
> >>>
> >>> Any idea how can the load be even across partitions while using key
> based
> >>> partitioning strategy? Any help would be greatly appreciated.
> >>>
> >>> Note: we cannot use round robin since key level ordering matters for us
> >>>
> >>
>
>

Reply via email to