On 1/15/2014 2:43 PM, cwhi wrote:
I have a SolrCloud installation with about 2 million documents indexed in it.
It's been buzzing along without issue for the past 8 days, but today started
throwing errors on document adds that eventually resulted in out of memory
exceptions.  There is nothing funny going on.  There are a few infrequent
searches on the index every few minutes, and documents are being added in
batch (batches of 1000-5000) every few minutes as well.

The exceptions I'm receiving don't seem very informative.  The first
exception looks like this:

org.apache.solr.common.SolrException
        at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
        at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
-- snip --

I've now experienced this with two SolrCloud instances in a row.  The
SolrCloud instance has 3 shards, each on a separate machine (each machine is
also running Zookeeper).  Each of the machines have 4 GB of RAM, with ~1.5
GB allocated to Solr.  Solr seems to be maxing out the CPU on index, so I
don't know if that's related.

If anybody could help me in sorting out these issues, it would be greatly
appreciated.  I pulled the Solr log file and have uploaded it at
https://www.dropbox.com/s/co3r4esjnsas0tl/solr.log

Also, a short snippet of the first exception is available on pastebin at
http://pastebin.com/pWZrkGEr

I think the relevant part of your exception is this is:

Caused by: org.eclipse.jetty.io.EofException
<snip>
Caused by: java.net.SocketException: Connection reset

When Jetty throws the EofException, it's almost always caused by the client disconnecting the TCP connection before the HTTP transaction is complete. The "Connection reset" message pretty much confirms it, IMHO.

What I think *might* be happening here is that you have a low SO_TIMEOUT configured on whatever is making your HTTP connections, and the update requests are not completing before that timeout expires, so the client closes the TCP connection before transfer is done. Most of the time, SO_TIMEOUT should either be left at infinity or configured with an insanely high value measured in minutes, not seconds.

An potential underlying problem is that your index has gotten too big and the OS disk cache is no longer able to cache it effectively. When this happens, Solr performance will drop significantly. It's very common for Solr to be completely fine up to a certain threshold and then suffer horrible performance problems once that threshold is crossed.

Thanks,
Shawn

Reply via email to