[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733789#action_12733789
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-----------------------------------------

i'm very sensitive to the work already done issue! i've totally been there.

the con argument for the increased chatter is actually quite minimal since the 
COMMIT message is just a few bytes that gets merged into an existing TCP 
stream.the restriction only weight-0 followers subscribing to a portion of the 
tree is a bit hacky, but it eliminates the need for a bunch of new code.

to be honest, there are two things that really concern me:

1) the amount of new code we have to add if we don't use weight-0 followers and 
the the new test cases that we have to write. since observers use a different 
code path we have to add a lot more tests.
2) one use of observers is to do graceful change over for ensemble changes. 
changing from a weight-0 follower to a follower that is a voting participant 
just means that the follower will start sending ACKs when it gets the proposal 
that it starts voting. we can do that very fast on the fly with no interruption 
to the follower. if we try to convert an observer, the new follower must switch 
from observer to follower and sync up to the leader before it can commit the 
new ensemble message. this increases the interruption of the change and the 
likelihood of failure.

btw, we could setup a phone conference if it would help. (everyone would be 
invited of course. we have global access numbers.)

> Observers
> ---------
>
>                 Key: ZOOKEEPER-368
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
>             Project: Zookeeper
>          Issue Type: New Feature
>          Components: quorum
>            Reporter: Flavio Paiva Junqueira
>            Assignee: Henry Robinson
>         Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to