Dave, i am not sure i get your point... it is not about lesser partitions, the issue is about the duplicate hash caused by default partitioner for 2 different string, which might be landing the 2 different keys into same partition
On Sun, Nov 21, 2021 at 9:33 PM Dave Klein <[email protected]> wrote: > Another possibility, if you can pause processing, is to create a new topic > with the higher number of partitions, then consume from the beginning of > the old topic and produce to the new one. Then continue processing as > normal and all events will be in the correct partitions. > > Regards, > Dave > > > On Nov 21, 2021, at 7:38 AM, Pushkar Deole <[email protected]> wrote: > > > > Thanks Luke, I am sure this problem would have been faced by many others > > before so would like to know if there are any existing custom algorithms > > that can be reused, > > > > Note that we also have requirement to maintain key level ordering, so > the > > custom partitioner should support that as well > > > >> On Sun, Nov 21, 2021, 18:29 Luke Chen <[email protected]> wrote: > >> > >> Hello Pushkar, > >> Default distribution algorithm is by "hash(key) % partition_count", so > >> there's possibility to have the uneven distribution you saw. > >> > >> Yes, there's a way to solve your problem: custom partitioner: > >> > https://kafka.apache.org/documentation/#producerconfigs_partitioner.class > >> > >> You can check the partitioner javadoc here > >> < > >> > https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/producer/Partitioner.html > >>> > >> for reference. You can see some examples from built-in partitioners, ex: > >> > >> > clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java. > >> Basically, you want to focus on the "partition" method, to define your > own > >> algorithm to distribute the keys based on the events, ex: key-1 -> > >> partition-1, key-2 -> partition-2... etc. > >> > >> Thank you. > >> Luke > >> > >> > >> On Sat, Nov 20, 2021 at 2:55 PM Pushkar Deole <[email protected]> > >> wrote: > >> > >>> Hi All, > >>> > >>> We are experiencing some uneven distribution of events across topic > >>> partitions for a small set of unique keys: following are the details: > >>> > >>> 1. topic with 6 partitions > >>> 2. 8 unique keys used to produce events onto the topic > >>> > >>> Used 'key' based partitioning while producing events onto the above > topic > >>> Observation: only 3 partitions were utilized for all the events > >> pertaining > >>> to those 8 unique keys. > >>> > >>> Any idea how can the load be even across partitions while using key > based > >>> partitioning strategy? Any help would be greatly appreciated. > >>> > >>> Note: we cannot use round robin since key level ordering matters for us > >>> > >> > >
