Well, why not set up the zk ensemble across 3 AZs? In this case, for 5-node ensemble, we can have 2, 2, 1 zk servers per AZ. Then if any AZ is down, the ensemble is still up. It will also work for 7-node ensemble, we can have 3, 2, 2 zk servers per AZ.
HTH, Zhewei On Thu, Aug 5, 2021 at 07:38 Ivan Kelly <[email protected]> wrote: > *typo, the exception is when there's an _equal_ number of participants > > On Thu, Aug 5, 2021 at 3:37 PM Ivan Kelly <[email protected]> wrote: > > > > > Promoting the 2 observers to participant will be a manual step (as > part of disaster recovery) to get the cluster up. During this manual step, > if needed, we can shutdown/terminate the old AZ instances. > > > We also have puppet managing configuration. Puppet module will be > updated to reflect new cluster instances. So when, if the AZ comes up, > puppet will see that these instances are no longer part of the zookeeper > cluster and module will stop zookeeper service. > > What guarantee do you have that all clients will have switched over to > > the new cluster? Even if puppet will shutdown the old cluster, it will > > take time to see that it needs to be shut down, which creates an > > opportunity for clients to connect and do stuff. > > > > > A side question: Will observers will always in sync with entire > cluster? in other words when observers will be in sync with the quorum > participants? > > By in sync, I take it that you mean that any write that was > > acknowledged by the initial cluster exists in the failover cluster. > > No, they may not be in-sync. The observer will always have a prefix of > > the log of the participants. This prefix may be the entire log, or it > > may be missing the latest writes. This is true even if you have a > > participant in the failover AZ. For a write to be acknowledged, it has > > to hit a majority of the quorum. > > > > With 2 AZs, 1 AZ will always have a majority, so if it goes down, > > writes will be missing from the other AZ. The exception to this is > > where there's an even number of participant in each AZ. In this case, > > you one AZ goes down, you can no longer form a majority, but all > > writes will exist on both AZs. Maybe this could be a path forward, > > since you accept that you will have manual failover. I'm not sure how > > well this scenario is supported in the tooling though. > > > > -Ivan >
