I have uncovered some additional details in the shard leader log:

2015-01-11 09:38:00.693 [qtp268575911-3617101] INFO
org.apache.solr.update.processor.LogUpdateProcessor  – [listings]
webapp=/solr path=/update
params{distrib.from=http://solr05.search.abebooks.com:8983/solr/listings/&u
pdate.distrib=TOLEADER&wt=javabin&version=2} {add=[14065572860
(1490024273004199936)]} 0 707
2015-01-11 09:38:00.913 [updateExecutor-1-thread-35734] ERROR
org.apache.solr.update.StreamingSolrServers  – error
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessi
onInputBuffer.java:160)
        at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java
:84)
        at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSession
InputBuffer.java:273)
        at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpRe
sponseParser.java:140)
        at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpRe
sponseParser.java:57)
        at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.j
ava:260)
        at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(Abs
tractHttpClientConnection.java:283)
        at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(Def
aultClientConnection.java:251)
        at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader
(ManagedClientConnectionImpl.java:197)
        at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestE
xecutor.java:271)
        at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.ja
va:123)
        at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultReques
tDirector.java:682)
        at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDi
rector.java:486)
        at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient
.java:863)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:82)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:106)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:57)
        at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(Con
currentUpdateSolrServer.java:233)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1
145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
615)
        at java.lang.Thread.run(Thread.java:745)
2015-01-11 09:38:00.917 [qtp268575911-3616964] WARN
org.apache.solr.update.processor.DistributedUpdateProcessor  – Error
sending update
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessi
onInputBuffer.java:160)
        at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java
:84)
        at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSession
InputBuffer.java:273)
        at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpRe
sponseParser.java:140)
        at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpRe
sponseParser.java:57)
        at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.j
ava:260)
        at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(Abs
tractHttpClientConnection.java:283)
        at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(Def
aultClientConnection.java:251)
        at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader
(ManagedClientConnectionImpl.java:197)
        at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestE
xecutor.java:271)
        at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.ja
va:123)
        at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultReques
tDirector.java:682)
        at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDi
rector.java:486)
        at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient
.java:863)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:82)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:106)
        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient
.java:57)
        at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(Con
currentUpdateSolrServer.java:233)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1
145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
615)
        at java.lang.Thread.run(Thread.java:745)



It appears that the connection to the replica is being reset. The number
of file open files allowed on all Solr boxes is set to 65536.

There doesn’t appear to be any retries here and this triggers the recovery
of the replica. Is this expected behaviour?

Thanks,

Lindsay

On 2015-01-12, 11:11 AM, "Lindsay Martin" <lmar...@abebooks.com> wrote:

>Here are more details about our setup:
>
>Zookeeper:
>* 3 separate hosts in same rack as Solr cluster
>* Zookeeper hosts do not run any other processes
>
>Solr:
>* total servers: 24 (plus 2 cold standbys in case of host failure)
>* physical memory: 65931872 kB (62 GB)
>* max JVM heap size: -Xmx10880m ( 10 GB)
>* only one Solr per host
>
>On the Œindex¹ directory size front, I am seeing some differences in the
>disk usage between leaders and replicas.
>
>In 1 / 12 shards, there is no difference in size between the leader and
>replica.
>
>In 6 / 12 shards, there 1 1 G difference in size between the leader and
>replica. Both have one index directory.
>
>In 5 / 12 shards, the replica is a multiple of the leader size, due to
>multiple index directories on disk.
>
>For example, shard 1 leader has a directory named
>'index.20140624071707699¹ 30 G in size. The replica has two directories:
>'index.20150108052156468¹ at 31G and Œindex.20140624071556270¹ at 32G.
>
>Thanks,
>
>Lindsay
>
>On 2015-01-09, 5:01 PM, "Shawn Heisey" <apa...@elyograg.org> wrote:
>
>>On 1/9/2015 4:54 PM, Lindsay Martin wrote:
>>> I am experiencing a problem where Solr nodes go into recovery following
>>>an update cycle.
>>
>><snip>
>>
>>> For background, here are some details about our configuration:
>>> * Solr 4.10.2 (problem also observed with Solr 4.6.1)
>>> * 12 shards with 2 nodes per shard
>>> * a single updater running in a separate subnet is posting updates
>>>using the SolrJ CloudSolrServer client. Updates are triggered hourly.
>>> * system is under continuous query load
>>> * autoCommit is set to 821 seconds
>>> * autoSoftCommit is set to 303 seconds
>>
>>I would suspect some kind of performance problem that likely results in
>>the zkClientTimeout expiring.  I have a standard set of questions for
>>performance problems.
>>
>>Questions about zookeeper:
>>
>>How many ZK nodes?  Is zookeeper on separate hardware?  If it's on the
>>same hardware as Solr, is its database on the same disk spindles as the
>>Solr index, or separate spindles?  Is zookeeper standalone or embedded
>>in Solr?  If it's standalone, do you happen to know the java max heap
>>for the zookeeper processes?
>>
>>Questions about Solr and the hardware:
>>
>>How many total Solr servers?  How much RAM is installed on each one?
>>What is the max size of the Java heap?  Are you running more than one
>>Solr (JVM/container) instance per machine?
>>
>>If you add up all the "index" directories on a server, how much disk
>>space does it take?  Is the amount of disk space used similar on all of
>>the servers?
>>
>>Thanks,
>>Shawn
>>
>

Reply via email to