Hello, sounds like you have this all figured out actually. A couple notes: > For now, we just need to handle DR requirements, i.e., we would not need active-active
If your infrastructure is sufficiently advanced, active/active can be a lot easier to manage than active/standby. If you are starting from scratch I'd arc in that direction. Instead of migrating A->B->C->D..., active/active is more like having one big cluster. > secondary.primary.topic1 I'd recommend using regex subscriptions where possible, so that apps don't need to worry about these potentially complex topic names. > An additional question. If the topic is compacted, i.e.., the topic keeps > forever, does switchover operations would imply add an additional path in > the topic name? I think that's right. You could always clean things up manually, but migrating between clusters a bunch of times would leave a trail of replication hops. Also, you might look into implementing a custom ReplicationPolicy. For example, you could squash "secondary.primary.topic1" into something shorter if you like. Ryanne On Mon, Feb 10, 2020 at 1:24 PM benitocm <benit...@gmail.com> wrote: > Hi, > > After having a look to the talk > > https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0 > and the > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382 > I am trying to understand how I would use it > in the setup that I have. For now, we just need to handle DR requirements, > i.e., we would not need active-active > > My requirements, more or less, are the following: > > 1) Currently, we have just one Kafka cluster "primary" where all the > producers are producing to and where all the consumers are consuming from. > 2) In case "primary" crashes, we would need to have other Kafka cluster > "secondary" where we will move all the producer and consumers and keep > working. > 3) Once "primary" is recovered, we would need to move to it again (as we > were in #1) > > To fullfill #2, I have thought to have a new Kafka cluster "secondary" and > setup a replication procedure using MM2. However, it is not clear to me how > to proceed. > > I would describe the high level details so you guys can point my > misconceptions: > > A) Initial situation. As in the example of the KIP-382, in the primary > cluster, we will have a local topic: "topic1" where the producers will > produce to and the consumers will consume from. MM2 will create in the > primary the remote topic "primary.topic1" where the local topic in the > primary will be replicated. In addition, the consumer group information of > primary will be also replicated. > > B) Kafka primary cluster is not available. Producers are moved to produce > into the topic1 that it was manually created. In addition, consumers need > to connect to > secondary to consume the local topic "topic1" where the producers are now > producing and from the remote topic "primary.topic1" where the producers > were producing before, i.e., consumers will need to aggregate.This is so > because some consumers could have lag so they will need to consume from > both. In this situation, local topic "topic1" in the secondary will be > modified with new messages and will be consumed (its consumption > information will also change) but the remote topic "primary.topic1" will > not receive new messages but it will be consumed (its consumption > information will change) > > At this point, my conclusion is that consumers needs to consume from both > topics (the new messages produced in the local topic and the old messages > for consumers that had a lag) > > C) primary cluster is recovered (here is when the things get complicated > for me). In the talk, the new primary is renamed a primary-2 and the MM2 is > configured to active-active replication. > The result is the following. The secondary cluster will end up with a new > remote topic (primary-2.topic1) that will contain a replica of the new > topic1 created in the primary-2 cluster. The primary-2 cluster will have 3 > topics. "topic1" will be a new topic where in the near future producers > will produce, "secondary.topic1" contains the replica of the local topic > "topic1" in the secondary and "secondary.primary.topic1" that is "topic1" > of the old primary (got through the secondary). > > D) Once all the replicas are in sync, producers and consumers will be moved > to the primary-2. Producers will produce to local topic "topic1" of > primary-2 cluster. The consumers > will connect to primary-2 to consume from "topic1" (new messages that come > in), "secondary.topic1" (messages produced during the outage) and from > "secondary.primary.topic1" (old messages) > > If topics have a retention time, e.g. 7 days, we could remove > "secondary.primary.topic1" after a few days, leaving the situation as at > the beginning. However, if another problem happens in the middle, the > number of topics could be a little difficult to handle. > > An additional question. If the topic is compacted, i.e.., the topic keeps > forever, does switchover operations would imply add an additional path in > the topic name? > > I would appreciate some guidance with this. > > Regards >