I speak strictly from my experience with Zookeeper and not an any official 
capacity of the project or of exhibitor.

Exhibitor works great and allows you to easily automate clustering zookeeper 
nodes into an ensemble and discovering the individual nodes in the ensemble via 
an http call. We ran into a problem, though, after we implemented Exhibitor 
across our infrastructure. Every so often our Zookeeper ensembles lost the data 
they stored. While I cannot say this was caused by Exhibitor, we have Solr 
clouds where Exhibitor was not used and they never had this problem. My 
suspicion is that there was a problem with a zookeeper node and Exhibitor 
removed that node from the ensemble then did a rolling restart. When that node 
recovered for some reason the data was corrupted or lost. Exhibitor pulled that 
node back into the ensemble and did a rolling restart. That node became leader 
and when the others joined synced from that. Those nodes then dumped their data 
stored to be in sync with the leader. This is my speculation, I have had a very 
hard time replicating this and have not heard of anyone else having this 
problem. Again, I am not definitively saying Exhibitor is the cause of this but 
since we removed Exhibitor this problem has not occurred.

Zookeeper 3.5.x branch adds discovery functionality and does automated 
clustering. It’s great, but from what I understand is still in alpha. 

Prior to the 3.5.x branch I know of no way to discover what nodes are actually 
in the ensemble. The 4 letter commands will tell you whether a node is in an 
ensemble, whether it is a leader or follower, but it will not tell you what 
ensemble it is in or list any other node information. If someone has a way to 
do this please post, because I have looked all over. 

We make use of Scalr and that adds an additional layer to automation. I run 
orchestration scripts in Scalr that discover the other running zookeeper nodes 
in (what Scalr calls) the same Farm Role. This script configures each node with 
the information for the other nodes and does a restart of Zookeeper to bring 
them into an ensemble. Then it collects this information and stores the IP 
addresses into a Global Variable in scalr that is available then to Solr. 
Changes to the ensemble are reflected in this variable that is then passed to 
the Solr cloud where a restart of the service will update the zookeeper 
information in Solr. We are working towards moving this functionality to Consul 
where it will register ther zookeeper ensemble information allowing Solr to 
pull it from Consul as opposed to relying on Global Variables. What I am 
getting at is that outside the 3.5.x branch, automating this takes a bit of 
work.


-- 
Daniel S Washko
Solutions Architect



[email protected]  <http://www.gannett.com/>
        
On 7/11/17, 6:58 PM, "Luigi Tagliamonte" <[email protected]> wrote:

    Hello, Zookeeper Users!
    I'm currently configuring/exploring zookeeper.
    I'm reading a lot about ensembles and scaling and I got some question that
    I'd like to submit to an expert audience.
    I need zookeeper as Kafka dependency so my deployment goal is the ensemble
    reliability especially because last Kafka version uses zookeeper only to
    store the leader partition.
    
    Here are my questions:
    
    - To manage the ensemble I decided to use exhibitor - what do you think
    about? Should I look to something else?
    
    - Is there a way to discover all the servers of an ensemble apart from
    use 4LTR? I wonder if it is possible to do something like in Cassandra were
    you contact one node and you can get the whole cluster info from it. should
    I configure just a DNS per zookeeper server, this doesn't scale well in a
    dynamic env like servers in autoscaling.
    
    - is there any white paper that shows a real scalable and reliable
    Zookeeper installation? Any resources are welcome!
    
    Thank you all in advance!
    Regards
    

Reply via email to