The actual state is a mix of the clusterstate.json and the ephemeral live nodes - a node may be listed as active or whatever, and if it's live node is not up, it doesn't matter - it's considered down.
- Mark On May 14, 2013, at 8:08 AM, Furkan KAMACI <furkankam...@gmail.com> wrote: > Node is shown as down at admin page. It says there is one replica for that > shard but leader is dead (no new leader is selected!) however when I check > zookeeper information from /clusterstate.json at admin page I see that: > > "shard2":{ > "range":"b3330000-e665ffff", > "state":"active", > "replicas":{ > "10.***.**.*1:8983_solr_collection1":{ > "shard":"shard2", > *"state":"active",* > "core":"collection1", > "collection":"collection1", > "node_name":"10.***.**.*1:8983_solr", > "base_url":"http://10.***.**.*1:8983/solr", > "leader":"true"}, > "10.***.**.**2:8983_solr_collection1":{ > "shard":"shard2", > *"state":"active",* > "core":"collection1", > "collection":"collection1", > "node_name":"10.***.***.**2:8983_solr", > "base_url":"http://10.***.***.**2:8983/solr"}}}, > > I mean dead node is still listed as active! > > I have exceptions and warning at my solr log: > > ... > INFO: Updating cluster state from ZooKeeper... > May 14, 2013 2:31:12 PM org.apache.solr.cloud.ZkController > publishAndWaitForDownStates > WARNING: Timed out waiting to see all nodes published as DOWN in our cluster > ... > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController getLeader > SEVERE: Error getting leader from zk > org.apache.solr.common.SolrException: There is conflicting information > about the leader of shard: shard2 our state > says:http://10.***.***.*1:8983/solr/collection1/ > but zookeeper says:http://10.***.***.**2:8983/solr/collection1/ > at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:849) > at org.apache.solr.cloud.ZkController.register(ZkController.java:776) > at org.apache.solr.cloud.ZkController.register(ZkController.java:727) > at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908) > at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892) > at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841) > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638) > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:722) > > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController publish > INFO: publishing core=collection1 state=down > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController publish > INFO: numShards not found on descriptor - reading it from system property > May 14, 2013 2:32:14 PM org.apache.solr.common.SolrException log > SEVERE: :org.apache.solr.common.SolrException: Error getting leader from zk > for shard shard2 > at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:864) > at org.apache.solr.cloud.ZkController.register(ZkController.java:776) > at org.apache.solr.cloud.ZkController.register(ZkController.java:727) > at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908) > at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892) > at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841) > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638) > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:722) > > and after that it closes main searcher. > > How can I get rid of this error and why there is a mismatch between admin > page's graph and clusterstate?