[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Henry Robinson updated ZOOKEEPER-368: ------------------------------------- Attachment: ZOOKEEPER-368.patch I'm attaching a first cut at this JIRA. I'd like comments on the broad approach - I'm aware there are more than a few rough edges in the code that need smoothing out. I've introduced a PeerType enum to QuorumPeers that denote the peer as either a PARTICIPANT or an OBSERVER. I've also extended PeerState with an OBSERVING state. It is possible for PARTICIPANT nodes to be in the OBSERVING state if they have joined the ensemble but aren't part of the current view (there are a few references to views in this patch that reflect my work on the dynamic cluster membership stuff, however they're typically placeholder code). As a result, I've update the FollowerHandler code to send the current view to a new follower during the initial handshaking. Observers hear about committed proposals through INFORM messages that the Leader sends to them. Apart from that, they operate much like Followers (and therefore share the same code) - when they connect, they sync. Eventually I envisage adding plugins to observers so that the proposals they see can be published according to whatever protocol is required. Observers don't participate in leader elections, and therefore only use the LeaderElection class which (by my reading) only deals with finding out who the current leader is. It is the only election class in this patch that correctly updates the PeerState depending on the current PeerType once a leader has been found. I haven't yet completely convinced myself that Observers don't actually actively participate in elections in this patch, so I'll be working to make sure of that. A node can be configured as an observer by having peerType=observer in its config file, otherwise it defaults to participant. > Observers > --------- > > Key: ZOOKEEPER-368 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 > Project: Zookeeper > Issue Type: New Feature > Components: quorum > Reporter: Flavio Paiva Junqueira > Attachments: ZOOKEEPER-368.patch > > > Currently, all servers of an ensemble participate actively in reaching > agreement on the order of ZooKeeper transactions. That is, all followers > receive proposals, acknowledge them, and receive commit messages from the > leader. A leader issues commit messages once it receives acknowledgments from > a quorum of followers. For cross-colo operation, it would be useful to have a > third role: observer. Using Paxos terminology, observers are similar to > learners. An observer does not participate actively in the agreement step of > the atomic broadcast protocol. Instead, it only commits proposals that have > been accepted by some quorum of followers. > One simple solution to implement observers is to have the leader forwarding > commit messages not only to followers but also to observers, and have > observers applying transactions according to the order followers agreed upon. > In the current implementation of the protocol, however, commit messages do > not carry their corresponding transaction payload because all servers > different from the leader are followers and followers receive such a payload > first through a proposal message. Just forwarding commit messages as they > currently are to an observer consequently is not sufficient. We have a couple > of options: > 1- Include the transaction payload along in commit messages to observers; > 2- Send proposals to observers as well. > Number 2 is simpler to implement because it doesn't require changing the > protocol implementation, but it increases traffic slightly. The performance > impact due to such an increase might be insignificant, though. > For scalability purposes, we may consider having followers also forwarding > commit messages to observers. With this option, observers can connect to > followers, and receive messages from followers. This choice is important to > avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.