Hi Martin; You can change your Java version from 1.6 to 1.7 u25 and test it again to see that whether it is related to version of Java.
Thanks; Furkan KAMACI 2013/11/24 Lance Norskog <goks...@gmail.com> > Yes, you should use a recent Java 7. Java 6 is end-of-life and no longer > supported by Oracle. Also, read up on the various garbage collectors. It is > a complex topic and there are many guides online. > > In particular there is a problem in some Java 6 releases that causes a > massive memory leak in Solr. The symptom is that memory use oscillates > (normally) from, say 1GB to 2GB. After the bug triggers, the ceiling of 2GB > becomes the floor, and memory use oscillates from 2GB to 3GB. I'm not > saying this is the problem you have. I'm just saying that is important to > read up on garbage collection. > > Lance > > > On 11/22/2013 05:27 AM, Martin de Vries wrote: > >> >> We did some more monitoring and have some new information: >> >> Before >> the issue happens the garbage collector's "collection count" increases a >> lot. The increase seems to start about an hour before the real problem >> occurs: >> >> http://www.analyticsforapplications.com/GC.png [1] >> >> We tried >> both the g1 garbage collector and the regular one, the problem happens >> with both of them. >> >> We use Java 1.6 on some servers. Will Java 1.7 be >> better? >> >> Martin >> >> Martin de Vries schreef op 12.11.2013 10:45: >> >> Hi, >> >>> We have: >>> >>> Solr 4.5.1 - 5 servers >>> 36 cores, 2 shards each, >>> >> 2 servers per shard (every core is on 4 >> >>> servers) >>> about 4.5 GB total >>> >> data on disk per server >> >>> 4GB JVM-Memory per server, 3GB average in >>> >> use >> >>> Zookeeper 3.3.5 - 3 servers (one shared with Solr) >>> haproxy load >>> >> balancing >> >>> Our Solrcloud is very unstable. About one time a week >>> >> some cores go in >> >>> recovery state or down state. Many timeouts occur >>> >> and we have to restart >> >>> servers to get them back to work. The failover >>> >> doesn't work in many >> >>> cases, because one server has the core in down >>> >> state, the other in >> >>> recovering state. Other cores work fine. When the >>> >> cloud is stable I >> >>> sometimes see log messages like: >>> - shard update >>> >> error StdNode: >> http://033.downnotifier.com:8983/solr/dntest_shard2_ >> replica1/:org.apache.solr.client.solrj.SolrServerException: >> >> IOException occured when talking to server at: >>> >>> http://033.downnotifier.com:8983/solr/dntest_shard2_replica1 >> >>> - >>> >> forwarding update to >> http://033.downnotifier.com:8983/solr/dn_shard2_replica2/ failed - >> retrying ... >> >>> - null:ClientAbortException: java.io.IOException: Broken >>> >> pipe >> >>> Before the the cloud problems start there are many large >>> >> Qtime's in the >> >>> log (sometimes over 50 seconds), but there are no >>> >> other errors until the >> >>> recovery problems start. >>> >>> Any clue about >>> >> what can be wrong? >> >>> Kinds regards, >>> >>> Martin >>> >> >> Links: >> ------ >> [1] >> http://www.analyticsforapplications.com/GC.png >> >> >