It is also important to note that since the release 2.4 of Apache Kafka the DefaultPartitioner now implements a sticky partitioning strategy rather than round-robin based on the key. This means that if you need fine control over which partition records will end up given the key -- you ought to write your own partitioner class.

More information about this here <https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner>.

Thanks,

-- Ricardo

On 7/7/20 9:54 AM, Vinicius Scheidegger wrote:
Hi Victoria,

If processing order is not a requirement you could define a random key and
your load would be randomly distributed across partitions.
So far I was unable to find a solution to perfectly distribute the load
across partitions when records are created from multiple producers - random
distribution might be good enough though.

I hope it helps,

Vinicius Scheidegger


On Tue, Jul 7, 2020 at 7:52 AM Victoria Zuberman <
victoria.zuber...@imperva.com> wrote:

Hi,

I have userId as a key.
Many users have moderate amounts of data but some users have more and some
users have huge amount of data.

I have been thinking about the following aspects of partitioning:

   1.  If two or more large users will fall into same partition I might end
up with large partition/s (unbalanced with other partitions)
   2.  If smaller users fall in the same partition as a huge user the small
users might get slower processing due to the amount of data the huge user
has
   3.  If the order of the messages is not critical, maybe I would want to
allow several consumers to work on the data of the same huge user,
therefore I would like to partition one userId into several partitions

I have some ideas how to partition to solve those issues that but if you
have something that worked well for you at production I would love to hear.
Also, any links to relevant blogposts/etc will be welcome

Thanks,
Victoria
-------------------------------------------
NOTICE:
This email and all attachments are confidential, may be proprietary, and
may be privileged or otherwise protected from disclosure. They are intended
solely for the individual or entity to whom the email is addressed.
However, mistakes sometimes happen in addressing emails. If you believe
that you are not an intended recipient, please stop reading immediately. Do
not copy, forward, or rely on the contents in any way. Notify the sender
and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or
destroy any copy of this email and its attachments. The sender reserves and
asserts all rights to confidentiality, as well as any privileges that may
apply. Any disclosure, copying, distribution or action taken or omitted to
be taken by an unintended recipient in reliance on this message is
prohibited and may be unlawful.
Please consider the environment before printing this email.

Reply via email to