Hi,

I think that one of the problems with the proposed method is that you may
end-up having a majority of servers that don't have the latest state
(imagine that there is a minority failure while your replaced
node hasn't been brought up do date yet).

Have you considered using dynamic reconfiguration ? Removing the nodes
logically first, then replacing them and adding back in ? You can do
multiple servers at a time this way.
Or, you can give new servers higher ids, add them using reconfig, and later
remove the old servers. Reconfiguration ensures that a quorum always has
the data.

Alex



On Mon, Apr 1, 2019 at 2:51 PM David Anderson <[email protected]> wrote:

> Hi,
>
> I have a running Zookeeper (3.5) cluster where the machines need to be
> replaced. I was thinking of just setting the same ID on each new
> machine, and then doing a rolling replacement: take down old ID 1,
> start new ID 1, let it rejoin the cluster and replicate the state,
> then continue with the other replicas.
>
> I'm finding conflicting information on the internet about the safety
> of this. The Apache Kafka FAQ says to do exactly this when replacing a
> failed Zookeeper replica, and the new machine will just replicate the
> state before participating in the quorum. Other places on the internet
> say that reusing the ID without also copying over the state directory
> will break assumptions that ZAB makes about replicas, with bad (but
> nondescript) consequences.
>
> So, is it safe to reuse IDs in the way I described? If not, what's the
> suggested procedure for a rolling replacement of all cluster replicas?
>
> Thanks,
> - Dave
>

Reply via email to