[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785326#action_12785326
 ] 

Mark Harwood commented on SOLR-1277:
------------------------------------

Not intimately familiar with Solr but just thought I'd add some comments based 
on my experiences building a Zookeeper-managed Lucene cluster.

* Zookeeper can be used to hold the definitive config plan for a cluster (how 
many logical indexes, numbers of shards, replicas etc). "Dumb" search servers 
can watch and respond to centralised config changes in Zookeeper to assume 
roles.
* An admin console can be used to:
    a) Change and publish a desired config plan to Zookeeper
    b) Monitor the implementation of a config plan (which servers are active, 
which server is the current shard master, what data version servers hold etc) 

Overall this works well but what I think was a step too far was trying to use 
Zookeeper to coordinate distributed transactions across a cluster (writers 
synching commits, all readers consistently synched at the same version).
This transaction management is a complex beast and when you encounter issues 
like the rogue GC mentioned earlier things start to fall apart quickly. As far 
as "C.A.P theorem" goes (Consistency, Availability, Partitioning - pick 2) I'm 
definitely favouring Availability over Consistency when managing a Partitioned 
system. 

Cheers,
Mark



> Implement a Solr specific naming service (using Zookeeper)
> ----------------------------------------------------------
>
>                 Key: SOLR-1277
>                 URL: https://issues.apache.org/jira/browse/SOLR-1277
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
> SOLR-1277.patch, zookeeper-3.2.1.jar
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The goal is to give Solr server clusters self-healing attributes
> where if a server fails, indexing and searching don't stop and
> all of the partitions remain searchable. For configuration, the
> ability to centrally deploy a new configuration without servers
> going offline.
> We can start with basic failover and start from there?
> Features:
> * Automatic failover (i.e. when a server fails, clients stop
> trying to index to or search it)
> * Centralized configuration management (i.e. new solrconfig.xml
> or schema.xml propagates to a live Solr cluster)
> * Optionally allow shards of a partition to be moved to another
> server (i.e. if a server gets hot, move the hot segments out to
> cooler servers). Ideally we'd have a way to detect hot segments
> and move them seamlessly. With NRT this becomes somewhat more
> difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to