Hard to give a generic answer.

1. We recommend to over-partitions your input topics to start with (to
avoid that you need to add new partitions later on); problem avoidance
is the best strategy. There will be some overhead for this obviously on
the broker side, but it's not too big.

2. Not sure why you would need a new cluster? You can just create a new
topic in the same cluster and let Kafka Streams read from there.

3. Depending on your state requirements, you could also run two
applications in parallel -- the new one reads from the new input topic
with more partitions and you configure your producer to write to the new
topic (or maybe even to dual writes to both). If your new application is
ramped up, you can stop the old one.

4. If you really need to add new partitions, you need to fix up all
topics manually -- including all topics Kafka Streams created for you.
Adding partitions messes up all your state shared as key-based
partitioning changes. This implies that you application must be stopped!
Thus, if you have zero downtime requirements you can't do this at all.

5. If you have a stateless application all those issues go away though
and you can even add new partitions during runtime.


Hope this helps.


-Matthias



On 12/8/17 11:02 AM, Dmitry Minkovsky wrote:
> I am about to put a topology into production and I am concerned that I
> don't know how to repartition/rebalance the topics in the event that I need
> to add more partitions.
> 
> My inclination is that I should spin up a new cluster and run some kind of
> consumer/producer combination that takes data from the previous cluster and
> writes it to the new cluster. A new instance of the Kafka Streams
> application then works against this new cluster. But I'm not sure how to
> best execute this, or whether this approach is sound at all. I am imagining
> many things may go wrong. Without going into further speculation, what is
> the best way to do this?
> 
> Thank you,
> Dmitry
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to