Re: Leaders in Recovery Failed state

Anand Mahajan Mon, 09 Feb 2015 01:10:52 -0800

Hi Eric, Sorry I did not reply earlier. I see this page cached here - 
on gmane.org but the original post I posted on Solr Users list 
does not show your comment - 
http://lucene.472066.n3.nabble.com/Leaders-in-Recovery-Failed-state-td4180610.html


I'm on Solr 4.10.1  - The last time this had happened I removed 
the replica for the affected Shards (the shards where the Leaders 
were shown as Down) - deleted the Replica data directories and 
then added the replica back using the Collections API - the 
did the trick then (but I'n not sure if that was the right way to do it). 
Also the problem seemed to have rooted from the fact that the 
Zookeeper instances were on the same 
machines as the Solr servlet containers and perhaps the
 Zookeeper instances were starved of resource (CPU & disk) - 
I have had since moved the Zookeeper instances out 
to separate servers and that makes the boot time fast - 
but not all shards come online when all the solr cloud instances 
are reboot. A few servers from the Solr Cluster went 
down again and I have the same issues where for 3 shards 
the Leaders are shown as down and the logs in the log files for 
these instances as below - 

INFO  - 2015-02-09 05:18:13.696; org.apache.solr.handler.
admin.CoreAdminHandler; In 
WaitForState(recovering): collection=collection1, shard=shard10, 
thisCore=collection1_shard10_replica2, 
leaderDoesNotNeedRecovery=false, isLeader? 
true, live=true, checkLive=true, currentState=recovering, localState=down, 
nodeName=10.68.77.8:8983_solr, coreNodeName=core_node28, 
onlyIfActiveCheckResult=true, nodeProps: core_node28:
{"state":"recovering","core":"collection1_shard10_replica1",
"node_name":"10.68.77.8:89
83_solr","base_url":"http://10.68.77.8:8983/solr"}

I have tried deleting the replica for these shards - 
but this time the Delete Replica Async 
requests are showing in "submitted" state for very long now (over 3 hours) - 
last time when I did this these requests finished fairly quickly.

Any pointers are greatly appreciated.

Thanks,
Anand

Re: Leaders in Recovery Failed state

Reply via email to