On 1/9/2015 4:54 PM, Lindsay Martin wrote: > I am experiencing a problem where Solr nodes go into recovery following an > update cycle.
<snip> > For background, here are some details about our configuration: > * Solr 4.10.2 (problem also observed with Solr 4.6.1) > * 12 shards with 2 nodes per shard > * a single updater running in a separate subnet is posting updates using the > SolrJ CloudSolrServer client. Updates are triggered hourly. > * system is under continuous query load > * autoCommit is set to 821 seconds > * autoSoftCommit is set to 303 seconds I would suspect some kind of performance problem that likely results in the zkClientTimeout expiring. I have a standard set of questions for performance problems. Questions about zookeeper: How many ZK nodes? Is zookeeper on separate hardware? If it's on the same hardware as Solr, is its database on the same disk spindles as the Solr index, or separate spindles? Is zookeeper standalone or embedded in Solr? If it's standalone, do you happen to know the java max heap for the zookeeper processes? Questions about Solr and the hardware: How many total Solr servers? How much RAM is installed on each one? What is the max size of the Java heap? Are you running more than one Solr (JVM/container) instance per machine? If you add up all the "index" directories on a server, how much disk space does it take? Is the amount of disk space used similar on all of the servers? Thanks, Shawn