Hello, sounds like you have this all figured out actually. A couple notes:

> For now, we just need to handle DR requirements, i.e., we would not need
active-active

If your infrastructure is sufficiently advanced, active/active can be a lot
easier to manage than active/standby. If you are starting from scratch I'd
arc in that direction. Instead of migrating A->B->C->D..., active/active is
more like having one big cluster.

> secondary.primary.topic1

I'd recommend using regex subscriptions where possible, so that apps don't
need to worry about these potentially complex topic names.

> An additional question. If the topic is compacted, i.e.., the topic keeps
> forever, does switchover operations would imply add an additional path in
> the topic name?

I think that's right. You could always clean things up manually, but
migrating between clusters a bunch of times would leave a trail of
replication hops.

Also, you might look into implementing a custom ReplicationPolicy. For
example, you could squash "secondary.primary.topic1" into something shorter
if you like.

Ryanne

On Mon, Feb 10, 2020 at 1:24 PM benitocm <benit...@gmail.com> wrote:

> Hi,
>
> After having a look to the talk
>
> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
> and the
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382
> I am trying to understand how I would use it
> in the setup that I have. For now, we just need to handle DR requirements,
> i.e., we would not need active-active
>
> My requirements, more or less, are the following:
>
> 1) Currently, we have just one Kafka cluster "primary" where all the
> producers are producing to and where all the consumers are consuming from.
> 2) In case "primary" crashes, we would need to have other Kafka cluster
> "secondary" where we will move all the producer and consumers and keep
> working.
> 3) Once "primary" is recovered, we would need to move to it again (as we
> were in #1)
>
> To fullfill #2, I have thought to have a new Kafka cluster "secondary" and
> setup a replication procedure using MM2. However, it is not clear to me how
> to proceed.
>
> I would describe the high level details so you guys can point my
> misconceptions:
>
> A) Initial situation. As in the example of the KIP-382, in the primary
> cluster, we will have a local topic: "topic1" where the producers will
> produce to and the consumers will consume from. MM2 will create in  the
> primary the remote topic "primary.topic1" where the local topic in the
> primary will be replicated. In addition, the consumer group information of
> primary will be also replicated.
>
> B) Kafka primary cluster is not available. Producers are moved to produce
> into the topic1 that it was manually created. In addition, consumers need
> to connect to
> secondary to consume the local topic "topic1" where the producers are now
> producing and from the remote topic  "primary.topic1" where the producers
> were producing before, i.e., consumers will need to aggregate.This is so
> because some consumers could have lag so they will need to consume from
> both. In this situation, local topic "topic1" in the secondary will be
> modified with new messages and will be consumed (its consumption
> information will also change) but the remote topic "primary.topic1" will
> not receive new messages but it will be consumed  (its consumption
> information will change)
>
> At this point, my conclusion is that consumers needs to consume from both
> topics (the new messages produced in the local topic and the old messages
> for consumers that had a lag)
>
> C) primary cluster is recovered (here is when the things get complicated
> for me). In the talk, the new primary is renamed a primary-2 and the MM2 is
> configured to active-active replication.
> The result is the following. The secondary cluster will end up with a new
> remote topic (primary-2.topic1) that will contain a replica of the new
> topic1 created in the primary-2 cluster. The primary-2 cluster will have 3
> topics. "topic1" will be a new topic where in the near future producers
> will produce, "secondary.topic1" contains the replica of the local topic
> "topic1" in the secondary and "secondary.primary.topic1" that is "topic1"
> of the old primary (got through the secondary).
>
> D) Once all the replicas are in sync, producers and consumers will be moved
> to the primary-2. Producers will produce to local topic "topic1" of
> primary-2 cluster. The consumers
> will connect to primary-2 to consume from "topic1" (new messages that come
> in), "secondary.topic1" (messages produced during the outage) and from
> "secondary.primary.topic1" (old messages)
>
> If topics have a retention time, e.g. 7 days, we could remove
> "secondary.primary.topic1" after a few days, leaving the situation as at
> the beginning. However, if another problem happens in the middle, the
> number of topics could be a little difficult to handle.
>
> An additional question. If the topic is compacted, i.e.., the topic keeps
> forever, does switchover operations would imply add an additional path in
> the topic name?
>
> I would appreciate some guidance with this.
>
> Regards
>

Reply via email to