Sachin, assuming you are using something like MM2, I recommend the following approaches:
1) have an external system monitor the clusters and trigger a failover by terminating the existing consumer group and launching a replacement. This can be done manually or can be automated if your infrastructure is sufficiently advanced. MM2's checkpoints make it possible to do this without losing progress or skipping records. 2) add failover logic around your KafkaConsumers to detect failure and reconfigure. 3) run consumer groups in both clusters, i.e. "active/active", with each configured to process records originating in their local cluster only. Set up health checks and a load balancer s.t. producers send to the healthiest cluster. In this approach, no intervention is required to failover or failback. Under normal operation, your secondary consumer group doesn't process anything, but will step in and process new records whenever the secondary cluster becomes active. Ryanne On Mon, Nov 11, 2019, 5:55 AM Sachin Kale <sachinpk...@gmail.com> wrote: > Hi, > > We are working on a prototype where we write to two Kafka cluster > (primary-secondary) and read from one of them (based on which one is > primary) to increase the availability. There is a flag which is used to > determine which cluster is primary and other becomes secondary. On > detecting primary cluster is down, secondary is promoted to primary. > > How do we detect cluster downtime failures in Kafka Consumer? I tried > different things but poll() makes sure to mask all the exceptions and > returns 0 records. > > > -Sachin- >