Hi!We are running Solr 4.4.0 on a 3 node linux cluster and have about 2 collections storing product data with no problems. Yesterday, I attempted to create another one of these collections using the Collections API, but I had forgotten to upload the config to the zookeeper prior to making the call and it failed spectacularly as expected :).. The API command I ran was to create a 3 shard collection with a replicationfactor of 2 (maxShardsPerNode) set to 2 since the default understandably causes issues on 3 node clusters.Since I ran that command however, I see the following message in the red 'SolrCore Initialization Failures' when I load up the admin for 2 out of 3 of the nodes (the following is from one of the boxes):MyNewCollection_shard1_replica2: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection MyNewCollection found:[MyFirstCollection, MySecondCollection]MyNewCollection_shard3_replica1: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection MyNewCollection found:[MyFirstCollection, MySecondCollection]My first question is, how do I get this to go away since the cores never actually got created? I looked in the solr directory and I do not see folders with the core names (which I'm under the impression that the implicit core walking uses to determine what cores to attempt to load).Second, and a bit stranger, is that also since I messed up that command, I now appear to be seeing errors from the admin log (every 2 seconds) when attempting to update documents in the other 2 collections that were working fine prior to the command being run. Specifically, I'm seeing these messages repeating over and over near constantly:14:07:11ERRORSolrCmdDistributorshard update error StdNode: http://10.0.1.29:8983/solr/MyFirstCollection_shard1_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://10.0.1.29:8983/solr/MyFirstCollection_shard1_replica2 returned non ok status:503, message:Service Unavailable14:07:11ERRORSolrCoreRequest says it is coming from leader, but we are the leader: distrib.from=http://10.0.1.30:8983/solr/MyFirstCollection_shard1_replica1/&update.distrib=FROMLEADER&wt=javabin&version=214:07:11ERRORSolrCoreorg.apache.solr.common.SolrException: Request says it is coming from leader, but we are the leader14:07:11WARNRecoveryStrategyStopping recovery for zkNodeName=core_node1core=MyFirstCollection_shard1_replica214:07:11WARNRecoveryStrategyWe have not yet recovered - but we are now the leader! core=MyFirstCollection_shard1_replica2The first error worries me much, as I think I'm losing data, but I can directly query that shard from that machine with no issues and the cloud view from ALL of the machines shows totally green.I'm not sure how the failed command got the system into this state and I'm kicking myself for making that mistake to begin with but I'm completely at a loss for how to attempt to recover since these are live collections that I can't take down without incurring significant downtime.Any ideas? Will reloading the cores that are throwing these messages help? can the zookeeper and solr not have the same idea as to who the leader is for that shard? and if so, how do I re-introduce consistency there?Appreciate any help that can be offered.Thanks,--Dave
-- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-4-0-Shard-Update-Errors-503-but-cloud-graph-shows-all-green-tp4094139.html Sent from the Solr - User mailing list archive at Nabble.com.