[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776566#action_12776566
 ] 

Henry Robinson commented on ZOOKEEPER-368:
------------------------------------------

Flavio, Ben - thanks for the comments! Feels like we're getting close with this 
one.

To Flavio's specific points:

1. In order to make this work with FLE, the easiest thing is to have a 
ResponderThread be running all the time. However, a ResponderThread currently 
only runs when electionAlg=0. To make the responder thread run for all 
electionAlg types is easy, but this introduces a UDP dependency which some 
installations do not want. So we need to make ResponderThread be both UDP and 
TCP compliant. This is easy enough (I have written this code), but it also 
makes configuration yet more complicated because there is yet another port that 
needs specifying (there is some port re-use in the code currently that's a bit 
sketchy I think, and that doesn't work in all cases, we need another dedicated 
port). We will have to discuss whether we want to require strings of the form 
server.id:address:port:port:port:learnertype or if it's time to break out the 
per-server configuration into a more structured format. At this point, I feel 
like this is complicated enough, and orthogonal to Observers, to warrant its 
own JIRA - it would make the Observers patch too complicated. Also, this 
feature requires getting the race condition bug fixed. 

I've created https://issues.apache.org/jira/browse/ZOOKEEPER-578 for this issue.

So we can block the Observers patch on this feature, or we can get a reduced 
Observers patch in (and prevent another cycle of refactoring when trunk gets 
updated and the patch no longer applies). Either is good; but I'm probably in 
favour of getting the patch in now and updating once the ResponderThread JIRA 
gets closed. The change to re-enable Observers for all election types is pretty 
trivial.

2. I think this is a great idea - I'd point out that the hardcoded 
quorum.size() / 2 usages predate the Observers patch! For example, see 
termPredicate(..) in AuthFastLeaderElection.java and lookForLeader in 
LeaderElection.java. This should therefore be a separate JIRA (I'm trying to 
avoid having several issues fixed by this patch).

I've created https://issues.apache.org/jira/browse/ZOOKEEPER-577 for this 
issue. 

3. Yes, will do.

4. Yep, will do.

Ben - I didn't take great notes at that meeting (jetlag!), but my recollection 
is: we were trying to reconcile having Observers change roles and join the 
ensemble as voting members with the complications of doing so. Zero-weight 
followers are a great way to do that. However, we decided that actually that 
might not be a feature we wanted. At that point, the optimisations you can make 
with Observers, particularly for WANs such as batching and the single-message 
INFORM protocol, means it makes sense to logically separate out Observers in 
the code. We could have special-cased handling of 0-weight clients, but we felt 
that since this would involve a step-change in the behaviour of peers as the 
weight went from 0 to 0+ it would be a bit counter intuitive. 



> Observers
> ---------
>
>                 Key: ZOOKEEPER-368
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
>             Project: Zookeeper
>          Issue Type: New Feature
>          Components: quorum
>            Reporter: Flavio Paiva Junqueira
>            Assignee: Henry Robinson
>         Attachments: obs-refactor.patch, observer-refactor.patch, observers 
> sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to