Re: Getting rid of zookeeper

Jan Høydahl Wed, 10 Jun 2020 00:30:49 -0700

Curator is just on the client (solr) side, to make it easier to integrate with 
Zookeeper, right?

If you study Elastic, they had terrible cluster stability a few years ago since 
everything
was too «dynamic» and «zero config». That led to the system outsmarting itself 
when facing
real-life network partitions and other failures. Solr did not have these issues 
exactly because
it relies on Zookeeper which is very static and hard to change (on purpose), 
and thus delivers
a strong, stable quorum. So what did Elastic do a couple years ago? They 
adopted the same
best practice as ZK, recommending 3 or 5 (statically defined) master nodes that 
owns the
cluster state.

Solr could get rid of ZK the same way as KAFKA. But while KAFKA already has a
distributed log they could replace ZK with (hey, Kafka IS a log), Solr would 
need to add
such a log, and it would need to be embedded in the Solr process to avoid that 
extra runtime.
I believe it could be done with Apache Ratis 
(https://ratis.incubator.apache.org <https://ratis.incubator.apache.org/>) 
which 
is a RAFT Java library. But I’m doubtful if the project has the bandwidth and 
dedication right
now to embark on such a project. It would probably be a multi-year effort, 
first building
abstractions on top of ZK, then moving one piece of ZK dependency over to RAFT 
at a time,
needing both systems in parallel, before at the end ZK could go away.

I’d like to see it happen. Especially for smaller deployments it would be 
fantastic.

Jan

> 10. jun. 2020 kl. 01:03 skrev Erick Erickson <erickerick...@gmail.com>:
> 
> The intermediate solution is to migrate to Curator. I don’t know all the ins 
> and outs
> of that and whether or not it would be easier to setup and maintain.
> 
> I do know that Zookeeper is deeply embedded in Solr and taking replacing it 
> with
> most anything would be a major pain.
> 
> I’m also certain that rewriting Zookeeper is a rat-hole that would take a 
> major
> effort. If anyone would like to try it, all patches welcome.
> 
> FWIW,
> er...@curmudgeon.com
> 
>> On Jun 9, 2020, at 6:01 PM, Dave <hastings.recurs...@gmail.com> wrote:
>> 
>> Is it horrible that I’m already burnt out from just reading that?
>> 
>> I’m going to stick to the classic solr master slave set up for the 
>> foreseeable future, at least that let’s me focus more on the search theory 
>> rather than the back end system non stop. 
>> 
>>> On Jun 9, 2020, at 5:11 PM, Vincenzo D'Amore <v.dam...@gmail.com> wrote:
>>> 
>>> My 2 cents, I have few solrcloud productions installations, I would share
>>> some thoughts of what I learned in the latest 4/5 years (fwiw) just as they
>>> come out of my mind.
>>> 
>>> - to configure a SolrCloud *production* Cluster you have to be a zookeeper
>>> expert even if you only need Solr.
>>> - the Zookeeper ensemble (3 or 5 zookeeper nodes) is recommended to run on
>>> separate machines but for many customers this is too expensive. And for the
>>> rest it is expensive just to have the instances (i.e. dockers). It is
>>> expensive even to have people that know Zookeeper or even only train them.
>>> - given the high availability function of a zookeeper cluster you have
>>> to monitor it and promptly backup and restore. But it is hard to monitor
>>> (and configure the monitoring) and it is even harder to backup and restore
>>> (when it is running).
>>> - You can't add or remove nodes in zookeeper when it is up. Only the latest
>>> version should finally give the possibility to add/remove nodes when it is
>>> running, but afak this is not still supported by SolrCloud (out of the box).
>>> - many people fail when they try to run a SolrCloud cluster because it is
>>> hard to set up, for example: SolrCloud zkcli runs poorly on windows.
>>> - it is hard to admin the zookeeper remotely, basically there are no
>>> utilities that let you easily list/read/write/delete files on a zookeeper
>>> filesystem.
>>> - it was really hard to create a zookeeper ensemble in kubernetes, only
>>> recently appeared few solutions. This was so counter-productive for the
>>> Solr project because now the world is moving to Kubernetes, and there is
>>> basically no support.
>>> - well, after all these troubles, when the solrcloud clusters are
>>> configured correctly then, well, they are solid (rock?). And even if few
>>> Solr nodes/replicas went down the entire cluster can restore itself almost
>>> automatically, but how much work.
>>> 
>>> Believe me, I like Solr, but at the end of this long journey, sometimes I
>>> would really use only paas/saas instead of having to deal with all these
>>> troubles.
>

Re: Getting rid of zookeeper

Reply via email to