[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791517#action_12791517
 ] 

Mark Miller commented on SOLR-1277:
-----------------------------------

bq. Seems like we need to handle these types of failures anyway

Right, but I was seeing them as two types of failures, possibly. One, the node 
is really gone - its not coming back, or its not coming back for many minutes. 
Two, GC took 8 seconds and the timeout is 5 seconds.

The way things work now, if someone searched during the GC, theyd get all the 
results back, the search would just take longer. They'd see the hour glass 
spinning, know the results where slow for this search, but still coming. I 
was/am not sure if we wanted to replicate that.

One option obviously is to treat a 15 second timeout the same as if the node 
went down. It seems to depend though - in a lot of cases, if I have large GC's 
often enough, I'd prefer slower search at those moments over users seeing daily 
failures / partial results. Thats the behavior you currently get without 
ZooKeeper.

It just depends on whether we treat pauses slightly over the default timeout as 
outages. I can see that making sense in many cases, but not in others, 
depending on who is using the system and how much redundancy they have setup.

> Implement a Solr specific naming service (using Zookeeper)
> ----------------------------------------------------------
>
>                 Key: SOLR-1277
>                 URL: https://issues.apache.org/jira/browse/SOLR-1277
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
> SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The goal is to give Solr server clusters self-healing attributes
> where if a server fails, indexing and searching don't stop and
> all of the partitions remain searchable. For configuration, the
> ability to centrally deploy a new configuration without servers
> going offline.
> We can start with basic failover and start from there?
> Features:
> * Automatic failover (i.e. when a server fails, clients stop
> trying to index to or search it)
> * Centralized configuration management (i.e. new solrconfig.xml
> or schema.xml propagates to a live Solr cluster)
> * Optionally allow shards of a partition to be moved to another
> server (i.e. if a server gets hot, move the hot segments out to
> cooler servers). Ideally we'd have a way to detect hot segments
> and move them seamlessly. With NRT this becomes somewhat more
> difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to