[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785326#action_12785326 ]
Mark Harwood commented on SOLR-1277: ------------------------------------ Not intimately familiar with Solr but just thought I'd add some comments based on my experiences building a Zookeeper-managed Lucene cluster. * Zookeeper can be used to hold the definitive config plan for a cluster (how many logical indexes, numbers of shards, replicas etc). "Dumb" search servers can watch and respond to centralised config changes in Zookeeper to assume roles. * An admin console can be used to: a) Change and publish a desired config plan to Zookeeper b) Monitor the implementation of a config plan (which servers are active, which server is the current shard master, what data version servers hold etc) Overall this works well but what I think was a step too far was trying to use Zookeeper to coordinate distributed transactions across a cluster (writers synching commits, all readers consistently synched at the same version). This transaction management is a complex beast and when you encounter issues like the rogue GC mentioned earlier things start to fall apart quickly. As far as "C.A.P theorem" goes (Consistency, Availability, Partitioning - pick 2) I'm definitely favouring Availability over Consistency when managing a Partitioned system. Cheers, Mark > Implement a Solr specific naming service (using Zookeeper) > ---------------------------------------------------------- > > Key: SOLR-1277 > URL: https://issues.apache.org/jira/browse/SOLR-1277 > Project: Solr > Issue Type: New Feature > Affects Versions: 1.4 > Reporter: Jason Rutherglen > Assignee: Grant Ingersoll > Priority: Minor > Fix For: 1.5 > > Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, > SOLR-1277.patch, zookeeper-3.2.1.jar > > Original Estimate: 672h > Remaining Estimate: 672h > > The goal is to give Solr server clusters self-healing attributes > where if a server fails, indexing and searching don't stop and > all of the partitions remain searchable. For configuration, the > ability to centrally deploy a new configuration without servers > going offline. > We can start with basic failover and start from there? > Features: > * Automatic failover (i.e. when a server fails, clients stop > trying to index to or search it) > * Centralized configuration management (i.e. new solrconfig.xml > or schema.xml propagates to a live Solr cluster) > * Optionally allow shards of a partition to be moved to another > server (i.e. if a server gets hot, move the hot segments out to > cooler servers). Ideally we'd have a way to detect hot segments > and move them seamlessly. With NRT this becomes somewhat more > difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.