Hi Karthick, The choice has to be yours depending on what you want to achieve. I understand you want to achieve even distribution of messages across your partitions. This depends on the following factors:
- The frequency of keys - Hashing logic itself What you can control is the hashing logic - one of the ways could be hardcoding the keys and corresponding partition number in your logic (this is assuming that you have a small pool of distinct keys). This will definitively ensure that your algorithm is not 'biased' when returning the partition number. For example: key1 : partition 0 key2 : partition 1 key3 : partition 2 key4 : partition 3 key5 : partition 4 key6 : partition 0 . . . However, if your data contains a high number of specific keys, skewness cannot be entirely avoided. For example: if you have key1, key2 being produced most of the times, then you will observe partitions 0 and 1 to be loaded more than the other partitions. You need to identify the reason for skewness. Is it the hashing algorithm or frequency of keys itself that is causing skewness? If it is the frequency of keys, then there is not much that can be done with just one topic alone. In which case you will have to get creative with your topic design - for example you can have separate topics for certain high frequency keys! Moreover, first you should assess why you have 96 partitions. In my experience that is way too high. Thanks On Tue, Aug 20, 2024 at 4:36 PM Karthick <ibmkarthickma...@gmail.com> wrote: > Hi Akash Jain > Thanks for the reply seeking help for the same to choose hashing logics. > Please refer/suggest any. > > On Sat, Aug 17, 2024 at 10:21 AM Akash Jain <akashjain0...@gmail.com> > wrote: > > > Hi Karthick. You could implement your own custom partitioner. > > > > On Saturday, August 17, 2024, Karthick <ibmkarthickma...@gmail.com> > wrote: > > > > > Hi Team, > > > > > > I'm using Kafka partitioning to maintain field-based ordering across > > > partitions, but I'm experiencing data skewness among the partitions. I > > have > > > 96 partitions, and I'm sending data with 500 distinct keys that are > used > > > for partitioning. While monitoring the Kafka cluster, I noticed that a > > few > > > partitions are underutilized while others are overutilized. > > > > > > This seems to be a hashing problem. Can anyone suggest a better hashing > > > technique or partitioning strategy to balance the load more > effectively? > > > > > > Thanks in advance for your help. > > > > > >