Lets say you have nodes A, B, C. Only B and C have latest data. You're
trying to replace B.
You replace B with a new server but before its in sync, C fails. What
happens ?

Option 1 (no reconfiguration): A and B are both registered as voting
members, they form a majority out of 3, B syncs from A and they happily
continue together. Since neither have the latest data, this is data loss.
Option 2 (with reconfiguration): By logically removing B first, you're
bringing A up to date. So A and C both have the latest data now. A is going
to be stalled while C is down and will not form a quorum with B, since B
isn't registered to be able to vote. If C never recovers, you can recover
manually by updating config files.


On Mon, Apr 1, 2019 at 5:10 PM David Anderson <d...@tockhq.com> wrote:

> On Mon, Apr 1, 2019 at 4:48 PM Alexander Shraer <shra...@gmail.com> wrote:
>
> > Hi,
> >
> > I think that one of the problems with the proposed method is that you may
> > end-up having a majority of servers that don't have the latest state
> > (imagine that there is a minority failure while your replaced
> > node hasn't been brought up do date yet).
>
>
> > Have you considered using dynamic reconfiguration ? Removing the nodes
> > logically first, then replacing them and adding back in ? You can do
> > multiple servers at a time this way.
>
>
> Does dynamic reconfiguration as you suggest here buy me anything in a
> 3-node cluster? No matter what I'm going to be at N+0 during the
> transition, so doesn't it just add more steps for the same result?


> Or, you can give new servers higher ids, add them using reconfig, and later
> > remove the old servers. Reconfiguration ensures that a quorum always has
> > the data.
> >
>
> My admittedly terrible motivation for avoiding that is that I want to
> preserve hostnames, to avoid reconfiguring clients. This is in a cloud
> environment where DNS is tied to instance name, so I can't play tricks at
> the network layer - at some point I have to delete the old instances and
> set up new ones with the same name. I suppose I could do a careful dance
> where I grow to 5 nodes, then do a rolling removal/readd of the first 3, so
> that I can stay at N+1 during the replacement, and just trust that clients
> can reach at least one of the first 3 replicas to discover the entire
> cluster.
>
> - Dave
>
>
> > Alex
> >
> >
> >
> > On Mon, Apr 1, 2019 at 2:51 PM David Anderson <d...@tockhq.com> wrote:
> >
> > > Hi,
> > >
> > > I have a running Zookeeper (3.5) cluster where the machines need to be
> > > replaced. I was thinking of just setting the same ID on each new
> > > machine, and then doing a rolling replacement: take down old ID 1,
> > > start new ID 1, let it rejoin the cluster and replicate the state,
> > > then continue with the other replicas.
> > >
> > > I'm finding conflicting information on the internet about the safety
> > > of this. The Apache Kafka FAQ says to do exactly this when replacing a
> > > failed Zookeeper replica, and the new machine will just replicate the
> > > state before participating in the quorum. Other places on the internet
> > > say that reusing the ID without also copying over the state directory
> > > will break assumptions that ZAB makes about replicas, with bad (but
> > > nondescript) consequences.
> > >
> > > So, is it safe to reuse IDs in the way I described? If not, what's the
> > > suggested procedure for a rolling replacement of all cluster replicas?
> > >
> > > Thanks,
> > > - Dave
> > >
> >
>

Reply via email to