Re: Unexplained leader initiated recovery after updates

Shawn Heisey Fri, 09 Jan 2015 17:01:13 -0800

On 1/9/2015 4:54 PM, Lindsay Martin wrote:
> I am experiencing a problem where Solr nodes go into recovery following an 
> update cycle.


<snip>

> For background, here are some details about our configuration:
> * Solr 4.10.2 (problem also observed with Solr 4.6.1)
> * 12 shards with 2 nodes per shard
> * a single updater running in a separate subnet is posting updates using the 
> SolrJ CloudSolrServer client. Updates are triggered hourly.
> * system is under continuous query load
> * autoCommit is set to 821 seconds
> * autoSoftCommit is set to 303 seconds

I would suspect some kind of performance problem that likely results in
the zkClientTimeout expiring.  I have a standard set of questions for
performance problems.

Questions about zookeeper:

How many ZK nodes?  Is zookeeper on separate hardware?  If it's on the
same hardware as Solr, is its database on the same disk spindles as the
Solr index, or separate spindles?  Is zookeeper standalone or embedded
in Solr?  If it's standalone, do you happen to know the java max heap
for the zookeeper processes?

Questions about Solr and the hardware:

How many total Solr servers?  How much RAM is installed on each one?
What is the max size of the Java heap?  Are you running more than one
Solr (JVM/container) instance per machine?

If you add up all the "index" directories on a server, how much disk
space does it take?  Is the amount of disk space used similar on all of
the servers?

Thanks,
Shawn

Re: Unexplained leader initiated recovery after updates

Reply via email to