I think you're right - there is a bug here. As I mentioned, when a server starts-up it locally commits all ops it has ever received (see ZKDataBase.loadDataBase). More importantly - the same happens in the Leader.lead() method (zk.loadData()). So when execution reaches the code you quoted maxCommittedLog reflects all transactions this leader has seen before becoming a leader, and everything works. In your scenario everyone see the same set of transactions, so there is no problem.
The problem is in leader election - if the server doesn't reboot before running leader election (the usual case) then only the transactions for which it received a commit count and it might not be elected leader, even if it has seen more transactions than the others. This may lead to transactions being dropped. I opened a JIRA for this. Thanks, Alex > -----Original Message----- > From: Yang [mailto:[email protected]] > Sent: Thursday, July 21, 2011 11:12 AM > To: Alexander Shraer > Subject: Re: what would happen with this case ? (ZAB protocol question) > > "Any operation that was truly committed (acked by majority), will be > known to one of the servers participating in the leader election" > ------ this is where I'm having difficulty: in the example I gave, the > commit on the dead leader is "Known/seen" by surviving nodes, but the > code snippet I showed seems to suggest that only seen COMMITTED txns > are replayed from new leader, not the seen transactions. > > > thanks > Yang > > > > On Thu, Jul 21, 2011 at 11:04 AM, Alexander Shraer > <[email protected]> wrote: > > Hi, > > > > If I understand it correctly, when a server starts-up it locally > commits all ops it has ever received (see ZKDataBase.loadDataBase) . > Leader election then chooses the node that has the most ops committed > to be the leader. It is possible that a minority of servers are down > during leader election, but a majority (or quorum) do participate in > leader election. Any operation that was truly committed (acked by > majority), will be known to one of the servers participating in the > leader election, so the elected leader will at least know all truly > committed ops. If a server wakes up later and connects to this leader, > his log is truncated to match the leader's. But this is safe to do, > because as explained above none of the truncated ops could have been > previously acked by a quorum. > > > > Alex > > > > > > > >> -----Original Message----- > >> From: Yang [mailto:[email protected]] > >> Sent: Wednesday, July 20, 2011 12:29 AM > >> To: [email protected] > >> Subject: Re: what would happen with this case ? (ZAB protocol > question) > >> > >> I found that my question is basically the same as > >> > >> http://zookeeper-user.578899.n2.nabble.com/Q-about-ZK-internal-how- > >> commit-is-being-remembered-td4464847.html > >> > >> but reading that thread still leaves me unclear as to my original > >> question. > >> > >> the following snippet from LearnerHandler.run() seems to be what the > >> newly-elected leader is doing, basically bringing up every follower > to > >> its max committed proposal, and discard the rest. > >> ---- if this is a correct understanding, then the P1 commit in my > >> original question seems to be lost. ?? > >> > >> Thanks > >> Yang > >> > >> > >> > >> final long maxCommittedLog = > >> leader.zk.getZKDatabase().getmaxCommittedLog(); > >> final long minCommittedLog = > >> leader.zk.getZKDatabase().getminCommittedLog(); > >> LinkedList<Proposal> proposals = > >> leader.zk.getZKDatabase().getCommittedLog(); > >> if (proposals.size() != 0) { > >> if ((maxCommittedLog >= peerLastZxid) > >> && (minCommittedLog <= peerLastZxid)) { > >> packetToSend = Leader.DIFF; > >> zxidToSend = maxCommittedLog; > >> for (Proposal propose: proposals) { > >> if (propose.packet.getZxid() > > >> peerLastZxid) { > >> queuePacket(propose.packet); > >> QuorumPacket qcommit = new > >> QuorumPacket(Leader.COMMIT, propose.packet.getZxid(), > >> null, null); > >> queuePacket(qcommit); > >> } > >> } > >> } else if (peerLastZxid > maxCommittedLog) { > >> packetToSend = Leader.TRUNC; > >> zxidToSend = maxCommittedLog; > >> updates = zxidToSend; > >> } > >> } else { > >> // just let the state transfer happen > >> } > >> > >> On Tue, Jul 19, 2011 at 2:44 PM, Yang <[email protected]> wrote: > >> > like the first figure in the ZAB paper described, > >> > say we have node A B C, A is leader now > >> > > >> > all 3 nodes see proposals P1, P2, an all acked both, > >> > A sees acks for P1, and commits it, but right after this A dies. > >> > > >> > now B is elected, B does not see any commit, so (according to my > >> > possibly wrong understanding from the code) > >> > B throws away P1 P2, and starts a new epoch. > >> > is this the current behavior of code? > >> > > >> > but then the commit of P1 on A is lost? > >> > > >> > Thanks > >> > Yang > >> > > >
