[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733951#action_12733951
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--------------------------------------------------

t seems to me that determining that a server is an observer based on its 
weight, and switching between observer and follower during ensemble changes are 
separate issues. Suppose that we go with the zero-weight option. Once we read a 
weight zero, we know that the server is an observer and we have to execute the 
corresponding code. From Ben's suggestion, the difference between a follower 
and an observer can be as simple as an if statement that guards the block that 
sends acks. In Henry's patch, there are separate classes for observers and 
followers.

It seems that we also have to make sure that during the transition from an old 
ensemble to a new ensemble we have a quorum available for the new ensemble, 
perhaps counting new members in, so it could be tricky to undergo this 
transition correctly if we have to switch from observer to follower. I think 
that what has to happen is that the update reflecting the view change has to be 
the first committed operation of the new ensemble. Consequently, an observer 
has to make sure that it has seen all updates before committing the ensemble 
update, and be ready to ack and commit the ensemble update once the leader of 
the new ensemble proposes it.

Henry, could you sketch out your thoughts on how your modifications will handle 
ensemble changes? I wonder if it is best to try to reach agreement on this 
issue first before submitting a patch. 

> Observers
> ---------
>
>                 Key: ZOOKEEPER-368
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
>             Project: Zookeeper
>          Issue Type: New Feature
>          Components: quorum
>            Reporter: Flavio Paiva Junqueira
>            Assignee: Henry Robinson
>         Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to