Hi Victoria,

If processing order is not a requirement you could define a random key and
your load would be randomly distributed across partitions.
So far I was unable to find a solution to perfectly distribute the load
across partitions when records are created from multiple producers - random
distribution might be good enough though.

I hope it helps,

Vinicius Scheidegger


On Tue, Jul 7, 2020 at 7:52 AM Victoria Zuberman <
victoria.zuber...@imperva.com> wrote:

> Hi,
>
> I have userId as a key.
> Many users have moderate amounts of data but some users have more and some
> users have huge amount of data.
>
> I have been thinking about the following aspects of partitioning:
>
>   1.  If two or more large users will fall into same partition I might end
> up with large partition/s (unbalanced with other partitions)
>   2.  If smaller users fall in the same partition as a huge user the small
> users might get slower processing due to the amount of data the huge user
> has
>   3.  If the order of the messages is not critical, maybe I would want to
> allow several consumers to work on the data of the same huge user,
> therefore I would like to partition one userId into several partitions
>
> I have some ideas how to partition to solve those issues that but if you
> have something that worked well for you at production I would love to hear.
> Also, any links to relevant blogposts/etc will be welcome
>
> Thanks,
> Victoria
> -------------------------------------------
> NOTICE:
> This email and all attachments are confidential, may be proprietary, and
> may be privileged or otherwise protected from disclosure. They are intended
> solely for the individual or entity to whom the email is addressed.
> However, mistakes sometimes happen in addressing emails. If you believe
> that you are not an intended recipient, please stop reading immediately. Do
> not copy, forward, or rely on the contents in any way. Notify the sender
> and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or
> destroy any copy of this email and its attachments. The sender reserves and
> asserts all rights to confidentiality, as well as any privileges that may
> apply. Any disclosure, copying, distribution or action taken or omitted to
> be taken by an unintended recipient in reliance on this message is
> prohibited and may be unlawful.
> Please consider the environment before printing this email.
>

Reply via email to