Hi Victoria, If processing order is not a requirement you could define a random key and your load would be randomly distributed across partitions. So far I was unable to find a solution to perfectly distribute the load across partitions when records are created from multiple producers - random distribution might be good enough though.
I hope it helps, Vinicius Scheidegger On Tue, Jul 7, 2020 at 7:52 AM Victoria Zuberman < victoria.zuber...@imperva.com> wrote: > Hi, > > I have userId as a key. > Many users have moderate amounts of data but some users have more and some > users have huge amount of data. > > I have been thinking about the following aspects of partitioning: > > 1. If two or more large users will fall into same partition I might end > up with large partition/s (unbalanced with other partitions) > 2. If smaller users fall in the same partition as a huge user the small > users might get slower processing due to the amount of data the huge user > has > 3. If the order of the messages is not critical, maybe I would want to > allow several consumers to work on the data of the same huge user, > therefore I would like to partition one userId into several partitions > > I have some ideas how to partition to solve those issues that but if you > have something that worked well for you at production I would love to hear. > Also, any links to relevant blogposts/etc will be welcome > > Thanks, > Victoria > ------------------------------------------- > NOTICE: > This email and all attachments are confidential, may be proprietary, and > may be privileged or otherwise protected from disclosure. They are intended > solely for the individual or entity to whom the email is addressed. > However, mistakes sometimes happen in addressing emails. If you believe > that you are not an intended recipient, please stop reading immediately. Do > not copy, forward, or rely on the contents in any way. Notify the sender > and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or > destroy any copy of this email and its attachments. The sender reserves and > asserts all rights to confidentiality, as well as any privileges that may > apply. Any disclosure, copying, distribution or action taken or omitted to > be taken by an unintended recipient in reliance on this message is > prohibited and may be unlawful. > Please consider the environment before printing this email. >