[
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776566#action_12776566
]
Henry Robinson commented on ZOOKEEPER-368:
------------------------------------------
Flavio, Ben - thanks for the comments! Feels like we're getting close with this
one.
To Flavio's specific points:
1. In order to make this work with FLE, the easiest thing is to have a
ResponderThread be running all the time. However, a ResponderThread currently
only runs when electionAlg=0. To make the responder thread run for all
electionAlg types is easy, but this introduces a UDP dependency which some
installations do not want. So we need to make ResponderThread be both UDP and
TCP compliant. This is easy enough (I have written this code), but it also
makes configuration yet more complicated because there is yet another port that
needs specifying (there is some port re-use in the code currently that's a bit
sketchy I think, and that doesn't work in all cases, we need another dedicated
port). We will have to discuss whether we want to require strings of the form
server.id:address:port:port:port:learnertype or if it's time to break out the
per-server configuration into a more structured format. At this point, I feel
like this is complicated enough, and orthogonal to Observers, to warrant its
own JIRA - it would make the Observers patch too complicated. Also, this
feature requires getting the race condition bug fixed.
I've created https://issues.apache.org/jira/browse/ZOOKEEPER-578 for this issue.
So we can block the Observers patch on this feature, or we can get a reduced
Observers patch in (and prevent another cycle of refactoring when trunk gets
updated and the patch no longer applies). Either is good; but I'm probably in
favour of getting the patch in now and updating once the ResponderThread JIRA
gets closed. The change to re-enable Observers for all election types is pretty
trivial.
2. I think this is a great idea - I'd point out that the hardcoded
quorum.size() / 2 usages predate the Observers patch! For example, see
termPredicate(..) in AuthFastLeaderElection.java and lookForLeader in
LeaderElection.java. This should therefore be a separate JIRA (I'm trying to
avoid having several issues fixed by this patch).
I've created https://issues.apache.org/jira/browse/ZOOKEEPER-577 for this
issue.
3. Yes, will do.
4. Yep, will do.
Ben - I didn't take great notes at that meeting (jetlag!), but my recollection
is: we were trying to reconcile having Observers change roles and join the
ensemble as voting members with the complications of doing so. Zero-weight
followers are a great way to do that. However, we decided that actually that
might not be a feature we wanted. At that point, the optimisations you can make
with Observers, particularly for WANs such as batching and the single-message
INFORM protocol, means it makes sense to logically separate out Observers in
the code. We could have special-cased handling of 0-weight clients, but we felt
that since this would involve a step-change in the behaviour of peers as the
weight went from 0 to 0+ it would be a bit counter intuitive.
> Observers
> ---------
>
> Key: ZOOKEEPER-368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
> Project: Zookeeper
> Issue Type: New Feature
> Components: quorum
> Reporter: Flavio Paiva Junqueira
> Assignee: Henry Robinson
> Attachments: obs-refactor.patch, observer-refactor.patch, observers
> sync benchmark.png, observers.patch, ZOOKEEPER-368.patch,
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch,
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch,
> ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching
> agreement on the order of ZooKeeper transactions. That is, all followers
> receive proposals, acknowledge them, and receive commit messages from the
> leader. A leader issues commit messages once it receives acknowledgments from
> a quorum of followers. For cross-colo operation, it would be useful to have a
> third role: observer. Using Paxos terminology, observers are similar to
> learners. An observer does not participate actively in the agreement step of
> the atomic broadcast protocol. Instead, it only commits proposals that have
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding
> commit messages not only to followers but also to observers, and have
> observers applying transactions according to the order followers agreed upon.
> In the current implementation of the protocol, however, commit messages do
> not carry their corresponding transaction payload because all servers
> different from the leader are followers and followers receive such a payload
> first through a proposal message. Just forwarding commit messages as they
> currently are to an observer consequently is not sufficient. We have a couple
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the
> protocol implementation, but it increases traffic slightly. The performance
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding
> commit messages to observers. With this option, observers can connect to
> followers, and receive messages from followers. This choice is important to
> avoid increasing the load on the leader with the number of observers.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.