Mahadev Konar wrote:
Hi Thomas,
 Here is what would happen in the scenario you mentioned.

Great - thanks Mahadev.

Not to drag this on more than necessary, please bear with me for one
more example of 'amnesia' that comes to mind. I have a set of ZooKeeper
servers A, B, C.
- C is currently not running, A is the leader, B is the follower.
- A proposes zxid1 to A and B, both acknowledge.
- A asks A to commit (which it persists), but before the same commit
request reaches B, all servers go down (say a power failure).
In this case, the zookeeper protocol says that zxid1 would be available only
if the client gets a success. So zxid1 may or may not get committed if A and
B come up later. ( this is a different scenario then what you mention
later).

The general scenario I was interested in was a minority of servers losing state, and trying to understand what other correlated events could cause issues. Just to be clear, since A has sent the commit to B (or is it when A has got its own commit), it *could have* sent a success back to the client before everything went down, correct?
- Later, B and C come up (A is slow to reboot), but B has lost all state
due to disk failure.
This is how zookeeper would work in this scenario ---

Now since we have B and C come up and B has the most recent state but loses
it, then zookeeper is clueless about this. So C would say I have the some
zxid say zxid-n and B would say that I have zxid = 0 (since its stateless)
and C would become a leader (since it has the highest zxid).

This would lead to loss of data and loss of state in zookeeper. That's what
I meant when I mentioned that zookeeper relies heavily on the state being
persisted on disk.

OK good, my understanding was correct then.
- C becomes the new leader and perhaps continues with some more new
transactions.

Now if A comes back again, C would say that its the leader and ask A to
truncate all the transactions that A had to come to sync with C.

I wasn't aware that C would ask A to truncate even committed transactions (the zookeeper internals doc/slides talks about proposals - I suspect I may have some terminology confusion here). Another possibility is C is now at zxid2 >= zxid1, in which case A could possibly *not* get rid of the committed transaction?
Again, you can see that how persistence loss can trigger state loss in
zookeeper. If its just minority of servers failing then this can be taken
care of by zookeeper but in this scenario is C failing and then being
brought up with an inconsisten state with another failure of A and data loss
of B -- which zookeeper cannot handle.

I hope this helps.
Yes thanks. Not sure if this makes sense, but is it worthwhile to have a 'safe' mode when a server comes up with no state (I think it should be simple to distinguish between having a clean disk 'no state'/corrupt state and 'empty state')? In this case, I think it could simply wait till it sees a successful propose/commit cycle to know that it is safe for it to take a snapshot and start participating in the ensemble.

In the scenario I previously described, when B and C comes up, B would not respond to C, but just watch - C would not be able to establish quorum until A came up; at which point B has witnessed a successful leader activation, and can join. If one is willing to sacrifice liveness for safety in situations where 1 or more nodes have amnesia, would this be a viable option?

Reply via email to