Hard to give a generic answer. 1. We recommend to over-partitions your input topics to start with (to avoid that you need to add new partitions later on); problem avoidance is the best strategy. There will be some overhead for this obviously on the broker side, but it's not too big.
2. Not sure why you would need a new cluster? You can just create a new topic in the same cluster and let Kafka Streams read from there. 3. Depending on your state requirements, you could also run two applications in parallel -- the new one reads from the new input topic with more partitions and you configure your producer to write to the new topic (or maybe even to dual writes to both). If your new application is ramped up, you can stop the old one. 4. If you really need to add new partitions, you need to fix up all topics manually -- including all topics Kafka Streams created for you. Adding partitions messes up all your state shared as key-based partitioning changes. This implies that you application must be stopped! Thus, if you have zero downtime requirements you can't do this at all. 5. If you have a stateless application all those issues go away though and you can even add new partitions during runtime. Hope this helps. -Matthias On 12/8/17 11:02 AM, Dmitry Minkovsky wrote: > I am about to put a topology into production and I am concerned that I > don't know how to repartition/rebalance the topics in the event that I need > to add more partitions. > > My inclination is that I should spin up a new cluster and run some kind of > consumer/producer combination that takes data from the previous cluster and > writes it to the new cluster. A new instance of the Kafka Streams > application then works against this new cluster. But I'm not sure how to > best execute this, or whether this approach is sound at all. I am imagining > many things may go wrong. Without going into further speculation, what is > the best way to do this? > > Thank you, > Dmitry >
signature.asc
Description: OpenPGP digital signature