about <1>. This shouldn't be happening, so I wouldn't concentrate there first. The most common reason is that you have a short Zookeeper timeout and the replicas go into a stop-the-world garbage collection that exceeds the timeout. So the first thing to do is to see if that's happening. Here are a couple of good places to start:
http://lucidworks.com/blog/garbage-collection-bootcamp-1-0/ http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr <2> Partial answer is that ZK does a keep-alive type thing and if the solr nodes it knows about don't reply, it marks the nodes as down. Best, Erick On Tue, May 5, 2015 at 5:42 AM, Sai Sreenivas K <sa...@myntra.com> wrote: > Could you clarify on the following questions, > 1. Is there a way to avoid all the nodes simultaneously getting into > recovery state when a bulk indexing happens ? Is there an api to disable > replication on one node for a while ? > > 2. We recently changed the host name on nodes in solr.xml. But the old host > entries still exist in the clusterstate.json marked as active state. Though > live_nodes has the correct information. Who updates clusterstate.json if > the node goes down in an ungraceful fashion without notifying its down > state ? > > Thanks, > Sai Sreenivas K